Jim Hatcher

Posted on Oct 20, 2021

Stored Procedures - love or leave 'em?

#microservices #storedprocs #decoupling #database

Introduction

A stored procedure is a set of SQL statements that is stored on the database server and is available to be executed by name. Stored procedures are the cause of “religious wars” in the world of relational databases where some DB users live and die by them and other DB users consider them an anti-pattern. They are certainly a tool that has the potential to be mis-used and create performance issues and blockers to scalability. As a DBA/Developer/Architect, should you be using them or not? Let’s dive in and try to answer that question.

Common Usage Patterns

When I see stored procedures used in databases, they fall into one of the following categories:

A stored procedure is used for every single CRUD operation against the DB
Stored procedures are used in certain situations for special handling, including:
- Running multi-statement transactions so that commit/rollback logic can be controlled at the DB level
- Running operations that involve chatty transactions so that round-trips between the app and DB can be avoided
- Centralization of logic that is called by several systems
- Encapsulation of operations to which highly granular security permissions need to be granted
10,000-line monsters containing business reams of logic

Potential Pros and Cons

There are various arguments for and against the use of stored procedures. Let’s examine a few:

Pro: Caching of Execution Plans

When you write a stored procedure, the query execution plan is stored/cached on the server which saves time when the stored proc is executed. This has become less of a factor over the years as DB technology has progressed. SQL optimizers have gotten better at storing plans for “dynamic” SQL and storing those plans for re-use. By making use of prepared statements in data access code, the same benefit of execution plan caching can often be gained.

Pro: Granular Permission Grants

There are opportunities to grant an application or user permissions to execute a stored procedure but restrict access to the underlying table(s). Or, you could grant access to a procedure that INSERTs data into a table but deny access to a procedure that UPDATEs or DELETEs data from a table.

Pro: Query Pattern Tracking

If every CRUD operation performed against the database is contained in stored procedures, it’s relatively easy to have an understanding of the query patterns used in the database. As a DBA, getting your arms around the query patterns (both reads and writes) that are being executed against a database is a huge step in being able to manage and optimize a database -- especially when inheriting a new system with which you have little familiarity.

Pro: Protection against SQL Injection Attacks

SQL Injection Attacks are a classic security vulnerability. If you’re not familiar with them, google “little bobby tables” and check out the famous XKCD comic that pokes fun at them. Because stored procedures are typically parameterized, they provide a level of protection against unsanitized SQL inputs. To be fair, I have also seen stored procedures exploited with SQL Injection attacks; they’re not a magic fix for SQL injection attacks, and smart developers still take precautions to sanitize data and use parameterization correctly.

Pro: Cutting down on Roundtrips from the app layer

For certain database operations that require several roundtrips from the app to the database, there can be a performance boost by planting all the necessary logic within a stored procedure to handle the entire operation without ever “leaving the database.”

Pro: Making re-use of logic possible

By storing SQL statements together in an encapsulated, named element which can be executed by various processes, there is an opportunity to re-use logic by application code, reporting services, and other DB clients that may be simpler than encapsulating that logic in an application tier (for example, in a microservice). In a system that is embracing microservices, this argument becomes largely irrelevant.

Con: Business Logic in the DB

There is a subtle temptation when writing stored procedures to add little bits of logic that really don’t belong in the database. When there is business logic shared between the services code and the database (in stored procedures), it is harder to have a holistic view of a software system’s business logic; things can easily be missed which can lead to unexpected bugs.

In the case of 10,000-line monsters mentioned above, this embedding of business logic is taken to an extreme. This leads to highly unwieldy and difficult to maintain code. This is probably the most infamous use of stored procedures and one of the anti-patterns dissenters love to cite.

Con: Loss of Database Portability

The code used in stored procedures often moves away from standard ANSI-SQL and into implementation-specific syntax. This makes it harder to move between DB systems. If the interaction between applications and databases is all plain-vanilla SQL, it is relatively easy to migrate an app from using any database system to any other database system.

Con: Dependencies between stored procedures and app code

While it is good practice to keep database schemas (tables, indexes, views, etc.) in source control, when the schema contains stored proc definitions, the management of the schema becomes much more complex. Often, changes to stored procedures are tightly bound to Data Access code, so managing versions of schemas and their appropriate code builds can be challenging. And, doing seamless upgrades of applications can cause complexities as well.

Con: ORM Integration

An ORM (Object Relational Mapper) is a piece of software designed to abstract away the manual mapping of domain objects (used in code) with the corresponding table structures (used in databases). By employing an ORM, you (in theory) save yourself a lot of trouble writing boring, repetitive data access layer code. ORMs typically write SQL statements dynamically and execute them directly. It is often hard, or at least involves a decent investment in time and effort, to get ORMs to use stored procedures which subtracts from their intended utility.

Note: I’m not actually a huge fan of ORMs, but I understand their value. My personal rule when deploying ORMs is to do so in a way where it’s easy to let the ORM do the data mapping when it’s efficient, but to allow the insertion of manual data-access logic for cases when the ORM is making poor choices. In other words, if the ORM makes your life easier 80% of the time, then leverage it for that 80% -- but don’t tie yourself to the ORM for that other painful 20%.

Conclusion

I admit that in the part of my career where I was writing apps using the Microsoft stack and SQL Server that I was a huge proponent and user of stored procs. I was also a huge Microsoft snob and had no plans of moving off of SQL Server to any other database platform. Fifteen years later, with a more mellow attitude and a little more hard-won experience under my belt, I am much more of a believer in building systems that use the right tools for the right jobs. The database is there to provide durability, handle concurrency, provide consistency, and generally take away the stress of storing data. Put the data in the database. Put logic that surrounds the data access in a data API and expose that to your applications.

In general, I believe that the trend regarding stored procedures is to move away from them -- especially when architecting systems that need to be highly available and massively scalable. The trends you will find around these types of architectures are the use of microservices, the adoption of various types of horizontally scalable data platforms (Distributed SQL, NoSQL, ElasticSearch, Spark, Snowflake, etc.), and a general desire to decouple business logic from data operations.

I also believe that the potential pros that can be gained from the use of stored procedures can also be gained in other ways -- through the use of microservices and through good coding practices; however, the potential cons are much harder to overcome.

I am hesitant to go so far as to call stored procedures a “crutch.” But I do think that if a system architect were to adopt a policy of explicitly barring the use of stored procedures in favor of other mechanisms that support reuse and decoupling of data and logic concerns that it would be a decision I would consider healthy.

Disclaimer: I work for Cockroach Labs and in CockroachDB, we don’t support stored procedures. The discussion around whether we should support stored procedures comes up fairly often. It would certainly make data migrations easier. And, for some of the reasons noted above in this article, stored procedures can certainly add some value. But for the most part, we seem to take the stance that most modern systems are not adopting stored procedures, and so, this feature never seems to get high enough prioritization to get done. Instead, we seem to take on features that enable adoption of the more forward-thinking trends (cloud native, microservices, containerization, serverless). We may support them in the future, and if we do, I hope we will put some guardrails around their usage to gently nudge users away from using them in potentially limiting ways.

Since the use of stored procedures is, as noted above, a bit of a religious war, I’d love to hear any comments you have on the pros and cons of stored procs and whether you think they are a tool that has a place in modern application architectures.

Top comments (11)

John @thefactorygrows@mastodon.world • Apr 22 '23

Bit of a bump, but liked this article and wanted to add that having all your SQL in stored procedure files (and hence having all those SQL create scripts in source control) will make it much easier to setup static analysis of the SQL as part of your CI/CD. DBAs may also find it easier to grok.

Jim Hatcher • Apr 24 '23

Yes, good points.

Franck Pachot • Oct 22 '21

On the "love or leave 'em?" question, I easily apply a three-valued logic: nothing here is plain wrong or right. It is not about choosing one or another, but just having all possibilites. You may not want to deploy all procedural code into the database. But limiting the database service to SQL-92 statements ignores the past 20 years requirements from business and solutions from database vendors. Reducing a RDBMS to a simple row store goes to the same limits we have seen with NoSQL. Network and context switch latency still matters, even more in distributed databases. And not everything can be coded in declarative SQL without reducing readability amd maintainability. That's the reason the procedural extension has been added to SQL standard, for OLTP applications that needed to scale.

Let's take an example. Imagine a data generation microservice. Run pgbench initialization with -g (client-side generation) or -G (server-side generation, introduced in PostgreSQL 13). There's a huge difference in resources used. This means response time on user side, compute cost on the cloud bill, or energy saving if you care about green computing.

There's a common confusion between procedural language support and stored procedures. You can create stored procedures in declarative SQL only, if you want. And you can run PL/pgSQL from the application code, without deploying anything in the database. Look at the pgbench server-side generation code: client-side deployment, server-side execution.

You mentioned micro-services. This is where I see a case for stored procedures. Micro-services should be light, small, simple, rather than miniatures of client-server monoliths. Some microservices could easily be a REST API (PostgREST), with code in any language of choice (PL/pgSQL, Python, Javascript...). Easy to deploy, to test. to run.

Jim Hatcher • Oct 22 '21

Hi Franck, thanks for the note. I agree -- it's not a binary, 100% of the time decision. (Any decision that's so cut-and-dry quickly becomes a non-decision and we stop having conversations around it!)

I used to be way over on the "always stored procs!" side of the debate and now I'm closer to the other end of the spectrum.

On your point about Microservices, I do feel pretty strongly that having a DB with very little biz logic, then a Data API, then various microservices that consume from the Data API is a very healthy way to go about this. The logic that clearly belongs to the microservice goes in the microservice. The logic that is data-access related goes in the Data API.

Appreciate that input!

Samuel Nitsche • Oct 22 '21

This!
In my observation, a lot of reluctancy to use stored procedures has historical and cultural reasons from a time where a relational database, once installed, was very hard to change (a lot of "change processes" around DBs were actually "prevent change processes").
That has changed, and so have the development tools around databases and, as you pointed out, Franck, the features of the databases itself.

Oskars • Oct 22 '21

Change management complexity, nearly impossible seamless deployments and portability are arguments I had on my list too.
I would add that unit testing is much harder for procedures and missing language features like parallel method execution that you could use if logic is within the service.
I think logic in service also makes the tech stack simplier. So having all logic in the service but one procedure seems just not worth it. Then better pl/sql all the way with a thin rest proxy.
I've seen much too pain with stored procedures to consider them in a new software.
Thanks for an argumented article, there is no place for emotions in these dillemas :)

darwinpasco25 • Feb 6 '22

Re: Loss of Database Portability

I find this to be a non-issue most of the time. I think it's more common to rewrite an application in another programming language than to move into another DB system. More often, this is an issue for those who are having database related performance issues but in reality, don't understand proper database design that is why their queries are performing poorly.

darwinpasco25 • Feb 6 '22 • Edited

Proper normalization is key. If you don't understand normalization, then it doesn't matter if you use stored procedures or ORM. No amount of optimization will work. You will write bad queries and thousands of line of SQL.
You don't need to write hundreds of lines of code in stored procedures if your tables are properly normalized. Better normalization means writing less SQL / application codes.
Business rules are enforced in the DATA. Not in the database.
The better normalized your tables are, the less conditional logic you will need to write. The number one source of bugs in any application is conditional logic. I try to avoid them as much as I can that is why I spend lots of time in designing and normalizing my tables so that I can enforce business rules in the DATA (not in the stored procedures). That allows me minimize conditional logic.

Whenever you find yourself writing lots of conditional logic, that means your need to go back to your database design.

Gabriel Aramburu • Feb 5 '23

Hi, good article.

About the point: "Pro: Making re-use of logic possible."

The re-use of logic can be beneficial at a short-term, but once the services/processes start to evolve in different ways you will have a serious maintainece problem.

Corey Alix • Feb 11 '23

Have their use but it's 2% of the time. I think intelligent streaming of data from and to the database is almost always the right choice. Why make the database do all the work?

raselldev • May 7 '23

leave. Why i choose leave? Because untracking, everyone can be imposter is stored procedure fail to execute.

View full discussion (11 comments)