DEV Community

Mustafa ERBAY
Mustafa ERBAY

Posted on • Originally published at mustafaerbay.com.tr

Using ORMs in Side Projects: Is Control Sacrificed for Speed?

Developing side projects is both a hobby and a playground for new technologies for me. I usually want to get quick results, because what keeps my motivation alive is being able to get ideas working as soon as possible. In this quest for speed, especially in database interactions, ORM (Object-Relational Mapping) tools always create a dilemma: while they offer tremendous initial speed, they also come with the risk of losing control over time. The question "Is control sacrificed for speed?" constantly revolves in my mind with every new side project.

When writing the backend for my Android spam application or creating data models for my financial calculators, the answer to this question varies depending on the project's scope and my expectations. Sometimes immediate speed is critical, while other times long-term performance and flexibility outweigh it. In this post, I will share my personal experiences with ORM usage in my side projects, when I make which choice, and the consequences of these decisions.

Time Pressure and the Allure of ORMs in Side Projects

In side projects, time is the most valuable resource. Most of the time, I steal from my personal time to run these projects, and therefore I want to use every second efficiently. ORMs can truly be life-savers in this regard. Defining a table's model and handling CRUD (Create, Read, Update, Delete) operations with a few lines of code is much faster than writing manual SQL. Especially when writing a backend with FastAPI, integrating an ORM like SQLAlchemy can automate database interactions like a magician.

Recently, when developing the backend for a simple note-taking application I added to my site, I needed to quickly get an API up and running. With the combination of sqlalchemy-pydantic and FastAPI, I combined model definitions and created a Note model and all the necessary API endpoints for it within minutes. This eliminated hours of manual SQL schema definition and query writing. It was enough to just define the Note model and call ready-made methods for simple CRUD operations.

# models.py
from sqlalchemy import Column, Integer, String, Text
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Note(Base):
    __tablename__ = "notes"
    id = Column(Integer, primary_key=True, index=True)
    title = Column(String, index=True)
    content = Column(Text)

# main.py (FastAPI endpoint)
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from database import SessionLocal, engine
from models import Base, Note

Base.metadata.create_all(bind=engine)

app = FastAPI()

def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

@app.post("/notes/", response_model=NoteSchema)
def create_note(note: NoteCreate, db: Session = Depends(get_db)):
    db_note = Note(**note.dict())
    db.add(db_note)
    db.commit()
    db.refresh(db_note)
    return db_note
Enter fullscreen mode Exit fullscreen mode

In this example, after defining the Note model, adding a new note with the create_note function is handled almost in a single line with Note(**note.dict()). The ORM takes this Pydantic model and converts it directly into an SQL INSERT query that can be written to the database. This kind of automation incredibly increases motivation and speed in the initial steps of a side project. When schema changes occur, managing migrations with tools like Alembic is also much less cumbersome than writing manual ALTER TABLE commands.

The Irreversible Face of Loss of Control: Performance and Debugging

The speed brought by ORMs comes at a cost: loss of control. Especially in complex queries or performance-critical scenarios, it becomes difficult to predict how optimized the SQL queries generated by the ORM in the background are. In my experience, this situation usually manifests as the "N+1 problem" or unnecessary JOINs. In my own side product's financial calculators, when listing a user's past transactions and their associated categories, a simple ORM query caused the page load time to jump from 2 seconds to 8 seconds.

To find the source of the problem, I had to examine database logs and EXPLAIN ANALYZE outputs. The ORM was trying to fetch category details separately for each transaction, running more than 100 separate queries for 100 transactions. This situation crippled performance due to database connection overhead and network latency. At that moment, I realized that where I thought the ORM was acting "smart," it had actually gone out of my control and done an inefficient job.

⚠️ The Insidious Trap of ORMs

ORMs increase development speed while hiding the complexity of the SQL queries they generate in the background. This can lead to performance bottlenecks, especially in scenarios with heavy data fetching or complex relationships, and make the debugging process much more difficult. It's always a good idea to check query logs to understand what the ORM is doing.

An ORM query like the following can lead to the N+1 problem:

# Inefficient ORM query example (N+1 problem)
transactions = db.query(Transaction).all() # Fetch all transactions
for transaction in transactions:
    # Separate category query for each transaction
    category = db.query(Category).filter_by(id=transaction.category_id).first()
    print(f"Transaction: {transaction.amount}, Category: {category.name}")
Enter fullscreen mode Exit fullscreen mode

This code block, if there are 100 transactions, will run 1 (to fetch all transactions) + 100 (to fetch the category of each transaction) = 101 queries. However, this operation could be done much more efficiently with a single JOIN. The EXPLAIN ANALYZE output in PostgreSQL helped me identify the problem by showing how costly these queries were and their index usage. The hundreds of SELECT queries I saw there pushed me to be more careful about this.

The Power and Burden of Raw SQL

In moments when performance becomes critical or the ORM's capabilities are insufficient, resorting to raw SQL becomes inevitable for me. Especially when fetching data for complex reporting or AI-driven production planning algorithms in a production ERP, the queries generated by the ORM would never be enough. To use features like PostgreSQL's advanced window functions, CTEs (Common Table Expressions), or special index strategies (GIN, BRIN), raw SQL was the only way.

The biggest advantage of raw SQL is that it provides full control over the database. I can optimize the query I need with millisecond precision and leverage the full power of the database. When creating monthly financial summary reports in my own side product, writing a complex query involving functions like GROUP BY, SUM, and LAG with an ORM would be both very difficult and not performant enough. In such cases, I run the query manually using a direct database connection.

-- Raw SQL example: Monthly financial summary report (PostgreSQL)
WITH MonthlySummary AS (
    SELECT
        DATE_TRUNC('month', transaction_date) AS month_start,
        SUM(CASE WHEN type = 'income' THEN amount ELSE 0 END) AS total_income,
        SUM(CASE WHEN type = 'expense' THEN amount ELSE 0 END) AS total_expense
    FROM transactions
    WHERE user_id = :user_id
    GROUP BY month_start
    ORDER BY month_start
),
PreviousMonth AS (
    SELECT
        month_start,
        total_income,
        total_expense,
        LAG(total_income, 1, 0) OVER (ORDER BY month_start) AS prev_month_income,
        LAG(total_expense, 1, 0) OVER (ORDER BY month_start) AS prev_month_expense
    FROM MonthlySummary
)
SELECT
    month_start,
    total_income,
    total_expense,
    (total_income - prev_month_income) AS income_delta,
    (total_expense - prev_month_expense) AS expense_delta
FROM PreviousMonth;
Enter fullscreen mode Exit fullscreen mode

This query calculates a user's monthly income and expenses and shows the changes compared to the previous month. Creating a query of this complexity with an ORM often leads to unreadable and hard-to-maintain code by straining the ORM's own DSL (Domain Specific Language). However, raw SQL also brings some burdens: manually managing SQL injection risk, tracking schema changes, and manually mapping query results to objects in the programming language. Especially regarding security, using parameterized queries is essential.

Hybrid Approach: Getting the Best of Both Worlds

In my side projects, the ideal solution is usually a hybrid approach. I use an ORM for simple CRUD operations and rapid prototyping, which constitute the majority of the application, while resorting to raw SQL for parts where performance is critical or I need the database's special capabilities. This provides me with both development speed and performance control.

For example, in the ERP module of one of my side products, simple data entry and exit for operator screens are managed with an ORM, while complex stock optimization queries required by the production planning engine were written directly with raw SQL. Running raw SQL queries using the ORM's connection pool saves me the trouble of rewriting existing database connection management. In FastAPI, it's quite practical to get the SessionLocal object and use it as db.execute(text("...")).

from sqlalchemy import text
from fastapi import Depends
from sqlalchemy.orm import Session
from database import get_db

@app.get("/advanced_report/")
def get_advanced_report(user_id: int, db: Session = Depends(get_db)):
    # Raw SQL query
    query = text("""
        SELECT
            p.product_name,
            SUM(oi.quantity * oi.unit_price) AS total_sales,
            COUNT(DISTINCT o.order_id) AS total_orders
        FROM products p
        JOIN order_items oi ON p.product_id = oi.product_id
        JOIN orders o ON oi.order_id = o.order_id
        WHERE o.customer_id = :user_id
        GROUP BY p.product_name
        ORDER BY total_sales DESC
    """)
    result = db.execute(query, {"user_id": user_id}).fetchall()

    # Process results (e.g., convert to dict)
    report_data = []
    for row in result:
        report_data.append({
            "product_name": row[0],
            "total_sales": float(row[1]), # Convert Decimal to float
            "total_orders": row[2]
        })
    return report_data
Enter fullscreen mode Exit fullscreen mode

This approach offers the flexibility to delve into the depths of the database when needed, without abandoning the conveniences brought by ORMs. For me, this balance allows for both rapid iteration throughout a project's life and preparedness for performance issues I might encounter. I start with rapid prototyping, but when a query in the logs exceeds 500ms or a SELECT statement contains more JOINs than I expect, alarm bells ring, and I consider switching to raw SQL.

Security and Sustainability Perspective

Security is a topic I never overlook, even in side projects. ORMs provide an automatic layer of protection against common security vulnerabilities like SQL injection. They automatically sanitize parameters, which reduces the developer's chance of making manual errors in this regard. This automation provides great relief in a public API like the backend of my Android spam application. When writing fail2ban rules, my ORM-using endpoints are generally more resilient against the frequent SQL injection attempts I see.

When using raw SQL, this responsibility is entirely mine. I always use parameterized queries (prepared statements) and never build queries with string concatenation. This is the most fundamental way to minimize the risk of SQL injection.

ℹ️ Secure Coding Practices

When using raw SQL, avoid string concatenation methods like db.execute(text("SELECT * FROM users WHERE username = '" + username + "'")). Instead, eliminate the risk of SQL injection by using parameterized queries like db.execute(text("SELECT * FROM users WHERE username = :username"), {"username": username}).

From a sustainability perspective, ORMs work integrated with migration tools (like Alembic) to manage schema changes. This makes it easier to track the evolution of the database schema and synchronize across different environments. In the early stages of one of my side projects, when I was constantly adding new features and changing the schema, Alembic was an invaluable tool for me. Instead of writing manual ALTER TABLE commands, I could automatically generate and apply migration files with the alembic revision --autogenerate command.

Raw SQL queries, on the other hand, tend to be more static. When the schema changes, these queries need to be updated manually. This can create a significant maintenance burden depending on the project's size and frequency of change. Therefore, I generally try to use critical and performance-focused raw SQL pieces in well-tested and rarely changing areas. In places where security and sustainability standards are very high, such as a bank's internal platform, these types of trade-offs must be managed more carefully.

My Decision-Making Process and Conclusion

My decision to use an ORM in side projects depends on many factors, such as the project's initial stage, size, expected traffic, and performance requirements. I generally follow these steps:

  1. Rapid Prototyping Phase: When starting a project, I usually prefer an ORM to quickly test ideas. At this stage, the speed provided by the ORM for basic CRUD operations is invaluable. Delivering a working MVP (Minimum Viable Product) within a few days keeps my motivation high.
  2. Performance Monitoring: When the application starts to attract some users or the data volume increases, I begin performance monitoring. I actively track database query logs. When I see any query slowing down or becoming unnecessarily complex, it's time to intervene.
  3. Transition to Raw SQL: If a section's performance is not satisfactory or the ORM's DSL is insufficient to express a complex query, I switch to raw SQL for that section. I try to make this transition as smooth as possible by using the ORM's connection pool.
  4. Security Checks: I always avoid SQL injection by using parameterized queries. If necessary, I monitor database interactions with system tools like auditd.

In conclusion, the answer to the question "Is control sacrificed for speed?" in side projects is, for me, "yes, but in a controlled manner." Initially, I sacrifice some control for speed, but as the project matures, I develop strategies to regain control. This is a balancing act, and every project has its own dynamics. The important thing is to be aware of these trade-offs and to use the right tool at the right time.

I had a similar trade-off during a VPS migration process; while I used ready-made images to get it up and running quickly, over time I had to tinker with kernel settings to optimize performance. This is a common situation in every area of technology. In my next post, I will talk about the "container disk fire" problem I experienced in the backend of my own side product and how I solved it.

Top comments (0)