Skip to content
loading...

ORM vs. SQL?

Bertil Muth on May 29, 2018

What are you using in your application to access your relational datastores, object-relational mapping or a query languange like SQL? Why have y... [Read Full]
markdown guide
 

There's a better way: data mapping. It's a pattern first described by Martin Fowler that relies on a discrete persistence layer to translate application domain objects into relational records. Data mappers provide much more convenience than raw queries while avoiding many of the problems and bad habits O/RMs encourage: dual data models, complex state management, inattention to schema design.

The oldest data mapper implementation I'm aware of is MyBatis (formerly iBatis) for Java; I was using that back in the early-mid 2000s. My own project, Massive.js, also operates in this style.

 

I usually call them data stores instead of mappers, but I'm a huge fan of this pattern.

I want to write objects that make sense given the problem I'm solving, and all I want my domain layer to know about the persistence layer is "I can put stuff there and get it back later."

 

That's the pattern I've been using for many years.
If your schema can be persisted with an ORM, I guess it means that it's so simple and straightforward that you don't need no ORM.

 

Does ActiveRecord count as a data mapper? It seems to meet some of the criteria.

 

From what I know it's an O/RM; the "active record" pattern is in common usage in that field. That said, I've done about thirty minutes of Ruby work in my entire life and none of it with Rails. It looks like AR-the-library automates some of the modeling but entities have their own persistence methods (JavaScriptily, thing.save() instead of things.save(aThing)) and you're tracking state in your application code, which are the quintessential characteristics of O/RMs.

 

Interesting. Does that mean you manually need to implement a Mapper for every entity?

 

It can, but that's kind of doing it the hard way. iBatis/MyBatis, for example, used to require a lot of configuration -- you'd write your own XML mapping which translates columns in each query result to their corresponding class fields, and write SQL which inserts, updates, or deletes records in tables based on object properties (I am given to understand they've automated most of the basic one-table-at-a-time CRUD since I last used it). What's interesting about doing this is that you're not restricted to a one-to-one table:class mapping: you can source a domain object from a complex view, and decompose it into its constituent tables when you persist. It's immensely powerful and lets you do some really elegant things with your schema design.

Massive's secret sauce is full introspection: it builds an API that lets you work with tables, views, and functions as if you were querying them directly, just using JavaScript instead of SQL. You don't need any domain classes or models at all.

Hi @Dian,

It sounds quite interesting, do you have further resources that one can look into to learn more on this.

Fowler has a book about enterprise patterns in general including the data mapper, although I have to admit I haven't read it. Other than that, your best bet is probably to learn by doing. Massive has an interactive REPL and extensive documentation if you do Node, or you could look at some of the other implementations from the Wikipedia article.

Thanks for the pointes Dian, I really appreciate it.

 

Years ago I initially met ORMs with glee at the time they would save. However, after a while I found I hated them. They create at least as many problems as they solve when you try to go beyond the simple stuff. And nowadays the main reason I ever had to use one -- loading and saving entities with multiple levels of child objects -- is solved in most cases by storing things as JSON. Which gives a much simpler load and save story than ORMs could possibly accomplish.

I will use a simple mapper like Dapper which literally just translates rows to objects. It operates similar to the way that reflection-based JSON serializers work. So there is no db configuration to keep up to date like with most ORMs. I've previously written about how I do queries in this post.

The key (for us) to keeping SQL access simple is that every business decision generates an event (a simple read-only data object), which later gets translated into SQL statements. As we move our legacy systems to this style, the resulting SQL data is still the source of truth, and the events serve as a log.

Aside: Generating an event instead of directly saving an object also makes business logic very easy to test. Since there is no database access to mock. In general we move all IO-based dependencies out of the business logic. aka Dependency Rejection.

In our later systems, the events themselves are saved and become the source of truth -- aka Event Sourcing. Although we may consult the relational models to check for things like uniqueness or relational constraints. So basically, the SQL data is just used for queries and we can rebuild them at any time from events. In fact, changing the SQL models are quite easy. We literally drop the affected tables and create the new versions and replay all the events to repopulate them.

I don't profess to have as much experience with the Repository pattern, but I have tried it here and there and found that I didn't care for it. Although it seems like the least objectionable alternative when storing domain objects into a relational model.

 

Context: Rails monolith, with about 450 tables in PostgreSQL, on Heroku.

99.99999999etc% of database queries are generated by ActiveRecord.

Occasionally we use Arel for something like where pub_date >= ..., though accessed through a class method that would allow the code to express that as @client.books.pub_date_on_or_after Date.today.

We have a few that can't be expressed in vanilla ActiveRecord, and for which Arel would be more trouble than it's worth, and we try to do that by having the SQL statement in a dedicated file, in a dedicated folder.

Sometimes we will express complex queries in a view and place a read-only model over it. Mostly this is to get around limitations in efficient eager loading.

We have one 427 line SQL statement – an insert with 6 common table expressions, which is used to achieve in a bout one second what it would otherwise take Ruby about 30 minutes to do.

Further context: I was an Oracle data warehouse architect for about 20 years, and built systems with SQL embedded in in PL/SQL, so as a team we do not fear SQL. However, as an application developer I would rather use Rails + ActiveRecord – there are too many techniques I can use with Rails that would be impossible or just tedious to achieve in an application that relies on hand-coded SQL, mostly to do with DRY.

 

there are too many techniques I can use with Rails that would be impossible or just tedious to achieve in an application that relies on hand-coded SQL

Could you please provide an example or two? :-)

 

Sure – I think that scopes are a good example.

The ability to define something as simple as:

scope :forthcoming_or_active, -> {
  where(publishing_status: [Book::PUB_STATUS_FORTHCOMING, Book::PUB_STATUS_ACTIVE])
}

scope :publishing_within_months, ->(months) {
  forthcoming_or_active.
    pub_date_between(
      Time.zone.today,
      Time.zone.today + months.to_i.months
    )
}
scope :not_ebook, -> {
  where.not(product_form: Book::PRODUCT_FORM_EBOOK)
}
scope :sales_present, -> {
  where(Sale.belonging_to_book.exists)
}

So these define class methods that I can daisy chain in the application to be able to identify the records I want:

  • Book.not_ebook
  • Book.sales_present.not_ebook
  • Book.not_ebook.sales_present.publishing_within_months(1)

These all generate the correct SQL, and if I change the definition of not_ebook in the scope then every SQL is automatically changed to the new definition. There are 18 places in our system where #not_ebook is used, and not only would I not want to correct them all by hand, I would not fancy my chances of being sure that I can find them all.

So that's pretty hard to do with SQL, efficiently at least. You could conceive of it being done with a PL./SQL function in Oracle perhaps, but when you stray from executing pure SQL there's the risk of compromising performance.

 

I love SQL, but I understand the benefits of using an ORM.
During my career tough I sadly found out people not knowing SQL at all because of massive ORM usage.

The correct answer is in the middle, using ORMs but being able to use SQL as well.

 

I understand the benefits of using an ORM.

Please provide some examples!

 

NoSQL all the way. I know we are storing redundant data, but we have no reason to care anymore because storage is so cheap. Still a work in progress but this is a DBAL (github.com/ua1-labs/firesql) that abstracts the need for relational data. Just throw an object into the database.

 

What sort of use cases do you see for this, Joshua?

 

Every application I build for a client ends up implementing this. It's just an abstraction layer.

Is everything stored in two tables, or am I misunderstanding?

You understand it correctly! Everything is in 2 tables.

 

I prefer to always use SQL, mainly because I know SQL and I don't want to always re-learn a new tool from scratch every time I pick a different technology.

Benefits:

  • Fine performance tuning
  • Infinite possibilities for requests building
  • Clear/concrete and easy to debug
  • Very easy to maintain or migrate/convert

Downsides:

  • Require to know both SQL and your RDBMS
  • May be harder to set up properly (with stores, injection etc)

A good mix I like is raw SQL with a small query building factory, like easydb in PHP

 

My shop is almost entirely Perl on the back end and of course Javascript on the front end.

It was mostly through me that we went from flat files to databases for much of our data usage, and the peak, right now, is a wrapper around Perl's DBI which allows me to call $db->do('INSERT INTO table ( foo, bar ) VALUES (?,?)',{placeholders => [1,2] }); for CREATE, with similar do() commands for delete and $db->arrayref('SELECT * FROM table', {controls=>{}}) to get an array of objects. This is good for me because here, I know I can write the LEFT JOIN on the SELECTs to get exactly what I want.

Perl has DBIx::Class, often called DBIC, that works more like an ORM, and when I use it as the Model in MVC context of the web framework Dancer, which ties in other things like Template Toolkit as a View, it works. Problem is, I have not been able to bring Dancer into production. Nor Mojolicious. I have decades worth of legacy code I would have to re-implement, with authentication being the first problem.

So, I am trying to use DBIC and Template Toolkit together in a CGI-based context, which gives me a much smaller number of things I must rethink before progress can be made, and I find I cannot just dump DBIC results into Template's templates, but must make another step in order to get it into a form I can make template understand.

Additionally, I find it difficult to take tables I created long before I heard of DBIC and make schemas for them that allow me to get the deep data I can easily get via SELECT * FROM foo f LEFT JOIN bar b on f.bar_id = b.id. There's a body of best practices for working with ORMs I just don't know, and implementing those changes now would break production code and production databases.

Because I dove into the deep end with SQL and had to sink or swim, I am very comfortable with it. There are gaps I occasionally find, and I'm still befuddled by normal forms and denormalization, but I can usually make it do what I want.

Recently a friend asked "So what if you didn't have to worry about SQL at all(?)"

My response was: "Realize that, for me right now, that's akin to saying 'What if you didn't have to worry about your 1988 Tele you murdered out, or your pedalboard?' What if I didn't have to worry about the tools I love and trust?"

(I'm a guitarist, for context.)

 

ORM can drastically reduce the boiler plate code, especially when all you do is simple CRUD operations, Query operations. Most of the ORMs do provide a 'filter' sort of operation to add query params which would suffice the common requirements. They also provide the 'raw' query option to write complicated query if required.

We moved from SQL to ORM solution about 6 months ago. But we don't do any complicated queries at the moment. So I don't have any metrics on performance comparison.

 

I prefer to avoid straight SQL queries wherever possible. I agree with the sentiments towards data mappers though. I've worked with ORM/ActiveRecord implementations as well, and they are fine for simple models and relationships, but you will find as you grow and build larger applications that there are some severe limitations with ORM's when you get into complicated queries.

 

I use a combination of both. I built an ORM (STorm) that I use with Xojo. Essentially it is a dynamic way to map classes to database tables. In this way the database remains the schema of record and should you change the schema you do not need to update separate ORM classes, although you'd still need to update code. The ORM is super-handy for common database access. For more complicated stuff, I'll drop down to SQL.

 

The two are not mutually exclusive.
Most ORM frameworks also maintain their own query languages and allow you to write native queries when needed. You can also have both ways of working independently when it's suitable.
I am currently working as a Hybris developer. For those of you who are not familiar with Hybris, it's a Java E-commerce framework. It's build on the Spring Framework plus it's own ORM and CMS.
In Hybris you write queries for fetching objects while saving (creating and updating) is done through method calls alone.
Depending on the needs of a project, an ORM can be more trouble than it's worth in terms of performance and getting the exact behaviour you need.
However, when it's not an issue, an ORM can drastically cut down on writing boilerplate queries which, to me at least, can be boring. Once I'm bored, I tend to make mistakes.
In addition, most ORMs come with lazy loading which will, to a certain extent, alleviate performance issues.

 

I see a lot of "ORM for simplifying CRUD Logic, Raw SQL for specific business logic".

I wanted to get into using an ORM for the first reason mentioned. However, the "Read" part of CRUD logic was never a simple read. I was using Entity Framework. Creating complex queries with Linq expressions usually involved loading all of them into memory and filtering in memory. Performance was a big issue. Also all our legacy code had lots of business logic within SQL queries, which is usually a no-no for using an ORM.

Instead, we're working on creating our own slimmed down ORM for just the CRUD logic. Similar to the dataMapper solution that has been suggested.

 

The SilverStripe CMS framework (silverstripe.org/) I use to create websites has an ORM which works really well.

Benefits are that the CMS can be used with different databases without needing changes any SQL code, it escapes variables used in queries, lazy loads records, has lots of helper functions such as filter, sort, limit etc, and in general means we write less code than if raw SQL was required so this speeds up development.

Fortunately is still possible to do Active Record style queries and even raw SQL when needed, which I have found is very infrequently.

 

I use Hibernate as my weapon of choice here. With Spring Data I often don't need to manually write queries at all. When I do, the JPQL is pretty good and eliminates a lot of the hassle of plain SQL. In rare cases (e.g. when I need to call native DB functions) I do write manual SQL, but this is a last resort.

In general, I use Graph Databases a lot. The mapping between objects and graphs is usually trivial.

 

I find that using ORM is the way to go for simple/straight requests, on the other side for complicated advanced queries, an sql builder will help a lot, I've built my own github.com/sqlkata/querybuilder that play a good fit between raw sql queries and solid ORMs

 

Both, the orm for normal crud stuff and lazy loading. SQL when I need specific queries I can't find a way to map on the orm

 

I go simple, the repetitive stuff goes on the orm and the complex/really specific/performance stuff on raw SQL

 

ORM for the CRUD grind, and raw SQL when you have more complex queries or when performance is important

 
 
code of conduct - report abuse