DEV Community

Cover image for JPA: The Good, The Bad, and The Ugly
Stephen Flavin
Stephen Flavin

Posted on

JPA: The Good, The Bad, and The Ugly

A walk through what Spring Data JPA actually does at runtime, why it gets uncomfortable during incidents, and what living without an ORM looks like in modern Spring Boot. Examples use Postgres, but the concepts and trade-offs apply to any relational database, the specific syntax for things like guarded upserts and set-based bulk operations will differ on MySQL, SQL Server, or Oracle, but the underlying ideas are the same.

TL;DR: Three things make a SQL-first approach worth considering. First, databases like Postgres can guard writes with IS DISTINCT FROM so no-op upserts skip the UPDATE entirely; no new tuple, no WAL, no replication traffic, which JPA can’t express without dropping to native SQL. Second, set-based bulk operations (e.g. via unnest) collapse 1,000-row batches into one round trip, sidestepping the per-row pre-SELECTs and IDENTITY-disabled batching that limit saveAll. Third, the cognitive load shifts from learning a framework’s flush planner to learning SQL and your database in more depth.

Disclosure: I used AI assistance throughout this article — for research, drafting, and review. The ideas, structure, and final editorial decisions are mine.

1. The Good: Six lines of magic

Here’s an upsert in Spring Data JPA.

@Service
public class UserService {

    private final UserRepository userRepository;

    public UserService(UserRepository userRepository) {
        this.userRepository = userRepository;
    }

    @Transactional
    public User upsertUser(String email, String name) {
        User user = new User();
        user.setEmail(email);
        user.setName(name);
        return userRepository.save(user);
    }

    @Entity
    @Table(name = "users")
    public static class User {
        @Id private String email;
        private String name;
        // getters / setters / no-arg ctor
    }

    @Repository
    public interface UserRepository extends JpaRepository<User, String> {}
}
Enter fullscreen mode Exit fullscreen mode

Look at that:

  • No SQL.
  • No connection management.
  • No PreparedStatement.
  • No row mappers.

The repository is an interface, Spring Data generates the implementation at startup. Need findByEmail? Add the method signature; the framework parses the name and writes JPQL.

A few more ergonomic wins worth naming:

  • Schema validation at boot (with spring.jpa.hibernate.ddl-auto=validate) catches missing columns and type mismatches before traffic hits. The default is none, so you opt in. Validation only checks columns and types, not nullability, defaults, or check constraints.
  • Implicit persistence — mutate a managed entity inside a transaction and the change is written at commit, no save call required.
  • Database portability - the same code runs on Postgres, MySQL, H2, and Oracle.
  • Object graph traversal - @OneToMany and @ManyToOne let you walk relationships in Java; save a parent and the children persist with it.

For a CRUD app over a clean domain model, this is genuinely hard to beat. The framework has been refined for two decades, and the developer ergonomics — the “Good” - are undeniably real.

So what’s the problem?

2. The Bad: Hidden complexity and leaky abstractions

The bad part of JPA isn’t that it does a lot of work for you; it’s that it does this work silently. That six-line method hides a massive amount of architectural complexity.

2.1 The proxy, the transaction, and the held connection

UserService is not the class you wrote. Spring wraps it in a CGLIB proxy at startup. Every external call enters the proxy first, which then:

  1. Asks the PlatformTransactionManager for a transaction.
  2. Acquires a connection from HikariCP and sets autoCommit = false.
  3. Creates an EntityManager and binds it to the thread.
  4. Runs your method.
  5. Flushes the persistence context and commits.

The connection is held for the entire method body, including any non-database work.

Spring Boot also enables Open Session In View (OSIV) by default, which extends the persistence session to the end of the HTTP request. Spring’s default JPA dialect doesn’t release the connection until the EntityManager closes, so with OSIV on, the connection is genuinely held for the entire request lifecycle — including template rendering, JSON serialisation walking lazy collections, and any outbound HTTP calls the controller still needs to make. The pool can be drained by slow downstream work long after the database work is done. OSIV is the single biggest real-world cause of mysterious connection-pool exhaustion in Spring Boot apps. Spring Boot does log a warning at startup recommending you make a deliberate choice — most teams ignore it.

You can disable OSIV with spring.jpa.open-in-view=false. Now LazyInitializationExceptions start firing in your serializers, because the session closes when the transaction does. The fixes are:

  • @EntityGraph on repository methods.
  • JOIN FETCH in JPQL.
  • @BatchSize on collections.
  • DTO projections that don’t carry lazy associations at all.

Each is its own annotation, its own failure mode, and its own thing to remember. Disabling OSIV is the right call for production, and it visibly multiplies the JPA surface area you have to learn.

A related fix worth knowing about for long-running transactions: by default, Hibernate acquires a JDBC connection at transaction start and holds it. To delay acquisition until the first statement, you need both:

spring.datasource.hikari.auto-commit=false
spring.jpa.properties.hibernate.connection.provider_disables_autocommit=true
Enter fullscreen mode Exit fullscreen mode

Miss either one and Hibernate falls back to eager acquisition.

2.2 save(), merge(), and what actually runs

save() is not “INSERT”. It’s “persist if new, merge otherwise”, and “new” is decided by a heuristic: is the @Id null?

ID strategy ID on a fresh entity save() routes to
@GeneratedValue null persist (INSERT)
Assigned (email, UUID, idempotency token) non-null merge (SELECT then UPDATE)

merge() does a hidden SELECT. It looks up the row by ID, hydrates a managed copy, then overwrites it from your argument. That’s SELECT * FROM users WHERE email = ?, one round trip you didn’t write. (If the entity is already in the persistence context, the SELECT is skipped — but in stateless web request handlers, it almost never is.) The managed copy returned is a different Java object from the one you passed in. Mutations to your original user after this point do not persist.

The persistence context (L1 cache) anchors all of this. When an entity enters it, Hibernate snapshots its fields, then diffs at flush time and emits UPDATEs for any changes. This is dirty checking, and it’s why mutating an entity you didn’t mean to update silently writes to the database. Auto-flush also reorders your statements: before any JPQL query and at commit, Hibernate walks the persistence context and emits SQL grouped into INSERT, UPDATE, and DELETE batches, in an order that is not the order of your Java code.

Native SQL queries are a particularly sharp edge. In JPA-compliant mode — which is what Spring Boot bootstraps by default — FlushModeType.AUTO flushes the session before a native query whenever Hibernate can’t determine which tables the query touches, because JPA requires that pending changes be visible to the query. To avoid the flush, you have to explicitly register the query’s tables via addSynchronizedEntityClass (only available by unwrapping the Hibernate Session, which loses your EntityManager ergonomics). If instead Hibernate is bootstrapped via its native API, the legacy default is the opposite: native queries do not flush unless you’ve registered query spaces. Same code, opposite behaviour, depending on how the framework was started. Both modes have surprised people in production.

Reflection runs the show throughout. JPA needs entities to have a no-arg constructor it can invoke through Constructor.newInstance(), then writes fields through setters or directly via Field.setAccessible(true). Lazy proxies are generated as bytecode subclasses at runtime, which is why declaring a method final silently breaks lazy loading and why entities cannot be Java records. Spring Data repositories are runtime proxies generated against your interface; the implementation does not exist as source code you can read.

A single save() call can produce any of:

  • A no-op (snapshot matches, no UPDATE emitted).
  • A full-column UPDATE for a one-field change (default behaviour).
  • A trimmed UPDATE touching only changed columns (with @DynamicUpdate).
  • A SELECT-before-UPDATE skipped (with @SelectBeforeUpdate(false), if you’ve decided dirty checking on detached entities is more expensive than always writing).
  • An INSERT (entity was transient, persist path).
  • An UPDATE preceded by a SELECT (assigned ID, merge path).

The Java code looks identical in all six cases. To know which one will run, you mentally simulate the flush planner: which annotations are on the entity, what state it’s in, whether the ID was assigned or generated, what’s in the persistence context, whether @DynamicUpdate or @SelectBeforeUpdate is set, whether a JPQL query just triggered a flush. Or you turn on SQL logging and read the output after the fact.

For comparison, here is the equivalent guard in SQL:

UPDATE users SET name = :name
WHERE email = :email
  AND name IS DISTINCT FROM :name
Enter fullscreen mode Exit fullscreen mode

The condition under which the write happens is one line, in the file you’re reading. You don’t simulate anything.

So your six lines of JPA actually run, in the common assigned-ID case:

BEGIN
SELECT email, name FROM users WHERE email = ?    -- merge() pre-check
UPDATE users SET name = ? WHERE email = ?        -- if dirty check fires
COMMIT                                           -- fsync wait
Enter fullscreen mode Exit fullscreen mode

Plus a CGLIB proxy invocation, session creation, entity snapshot, dirty check, and a Map lookup that you wrote zero lines of code for.

3. The Ugly: 2 a.m. production incidents

The complexity above is fine when everything works. It gets ugly at 2 a.m. with production on fire.

Symptoms in production

Symptom Cause Typical fix
Connection pool exhaustion under load Long @Transactional methods, or OSIV holding sessions across slow renders Disable OSIV; shrink transactions; delay connection acquisition
LazyInitializationException in views Touching a lazy field after the session closed @EntityGraph, JOIN FETCH, @BatchSize, or DTO projections
N+1 queries flooding the slow log Iterating a collection with lazy @ManyToOne or @OneToMany Same as above — pick the right one for the access pattern
Mystery UPDATEs in audit logs Code path mutating a managed entity without an explicit save. Even the maintainers forget about implicit persistence Detach the entity, or use a DTO projection that isn’t managed
WAL volume spiking on no-op replays save() always writes a new tuple version, even when values didn’t change Drop to native query with IS DISTINCT FROM
Constraint violations not in your test suite Hibernate reordering INSERTs/UPDATEs at flush time Manual flush(), or split the transaction
Self-invocation not starting a transaction this.foo() from within the same class bypasses the proxy Restructure code, or self-inject the proxy
Optimistic lock failures on quiet rows Two transactions both touched the same entity and flushed UPDATEs Narrow the transaction, or accept and retry

None of these are JPA bugs. They’re JPA features whose interaction with your code wasn’t obvious. Every fix in the right column is another concept on the pile and often in my experience end up pointing to a native query as a fix anyway.

A note on write amplification: in Postgres MVCC, any UPDATE creates a new tuple version with all columns, regardless of how much SQL you sent. @DynamicUpdate reduces the bytes on the wire from the app to the database, but it does not reduce WAL volume or replication traffic. The only way to avoid that cost is to skip the UPDATE entirely — which IS DISTINCT FROM does, and JPA’s dirty check does on full-row no-ops, but only after a pre-SELECT.

When you try to scale

saveAll() is essentially a for loop that delegates to save() per element (in SimpleJpaRepository, which is what you get out of the box). Hibernate has JDBC batching (hibernate.jdbc.batch_size), but several constraints kneecap it:

  • With assigned IDs, the pre-SELECT inside merge() runs once per row, sequentially, and isn’t batchable.
  • With auto-generated IDs, batching is silently disabled by @GeneratedValue(strategy = IDENTITY) because Hibernate needs each generated ID before sending the next row.
  • For mixed workloads, you also need hibernate.order_inserts and hibernate.order_updates set to true for batching to take effect.

For a 1,000-row saveAll on assigned IDs, you’re looking at roughly 1,000 pre-SELECTs (one per row from merge), the actual INSERTs/UPDATEs (batched in groups of 20–50 if everything aligns), plus the BEGIN and COMMIT — north of a thousand round trips for what should be a single statement.

Experienced JPA devs will point out you can switch to @GeneratedValue(strategy = SEQUENCE) with a pooled optimiser to fix the IDENTITY problem which is true — for inserts where you control the ID strategy. It does nothing for the pre-SELECT on natural keys, and it illustrates the pattern rather than breaking it: every JPA performance fix is another annotation, another behavioural rule, another thing to remember. Under load, you stop debugging Postgres and start debugging Hibernate’s flush planner. That’s the cost the six-line method was hiding.

When you ask how to fix this, the advice gets stranger. The canonical answer - see Vlad Mihalcea’s “The best Spring Data JpaRepository” - is that saveAll() is essentially the wrong API to use for real workloads, and that you should reach for StatelessSession or @Modifying JPQL queries instead. Mihalcea’s recommended pattern goes further: deprecate save/saveAll outright in your repositories and replace them with explicit persist/merge/update methods.

So to get real performance out of bulk operations in JPA, the advice is to drop the EntityManager, abandon the L1 cache, turn off dirty checking, almost bypassing the framework entirely. If the solution to the framework’s performance bottlenecks is to memorise arcane ways to avoid the framework, the framework is a liability for that workload.

4. A Fistful of SQL: Living without an ORM

Spring boot 3.2 introduced JdbcClient, a fluent facade over NamedParameterJdbcTemplate. It auto-configures from any DataSource. There’s nothing to install, and everything is on the screen.

@Service
public class UserService {

    private final UserRepository userRepository;

    public UserService(UserRepository userRepository) {
        this.userRepository = userRepository;
    }

    public User upsertUser(String email, String name) {
        return userRepository.upsert(new User(email, name));
    }

    public record User(String email, String name) {}

    @Repository
    public static class UserRepository {
        private final JdbcClient jdbc;
        public UserRepository(JdbcClient jdbc) { this.jdbc = jdbc; }

        private static final String UPSERT_SQL = """
            WITH upserted AS (
                INSERT INTO users (email, name)
                VALUES (:email, :name)
                ON CONFLICT (email) DO UPDATE
                    SET name = EXCLUDED.name
                    WHERE users.name IS DISTINCT FROM EXCLUDED.name
                RETURNING email, name
            )
            SELECT email, name FROM upserted
            UNION ALL
            SELECT email, name FROM users
             WHERE email = :email
               AND NOT EXISTS (SELECT 1 FROM upserted)
            """;

        public User upsert(User user) {
            return jdbc.sql(UPSERT_SQL)
                    .param("email", user.email())
                    .param("name",  user.name())
                    .query(User.class)
                    .single();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Notice what’s missing — the same list, with one deliberate exception:

  • No SQL. The SQL is on the screen, and that’s the point.
  • No connection management.
  • No PreparedStatement.
  • No row mappers.

And a few new things that are gone too:

  • No @Transactional: a single statement is atomic at the database.
  • No @Entity, no reflection, no proxies.
  • No no-arg constructor, no getters, no setters — the data is a Java record.
  • No repository magic — the implementation is right there in the class.

The real magic is in the SQL.

ON CONFLICT (email) uses the unique index for both conflict detection and lock acquisition in a single probe. Upsert is atomic without explicit locks.

WHERE users.name IS DISTINCT FROM EXCLUDED.name is the line where the real money is. This filter checks if the UPDATE actually needs to fire. If false, the UPDATE is skipped entirely: no new tuple version, no WAL record, no replication traffic. IS DISTINCT FROM is a NULL-safe inequality. On a workload where half the upserts arrive with values that match what’s already there, this guard roughly halves WAL volume. JPA’s save() cannot do this without dropping to native SQL.

RETURNING produces the row that was actually written. The gotcha: when the WHERE filter on DO UPDATE evaluates to false, the UPDATE doesn’t fire, so RETURNING produces zero rows. The CTE plus UNION ALL patches this: the second branch selects the existing row when the CTE was empty, so the caller always gets exactly one row back regardless of which path ran.

Batch operations with unnest

Postgres’s unnest() expands parallel arrays into a relation. We can process a 1,000-row batch in one statement:

INSERT INTO users (email, name)
SELECT email, name
  FROM unnest(:emails::text[], :names::text[]) AS t(email, name)
 ORDER BY email
ON CONFLICT (email) DO UPDATE
    SET name = EXCLUDED.name
    WHERE users.name IS DISTINCT FROM EXCLUDED.name
RETURNING email, name
Enter fullscreen mode Exit fullscreen mode
public List<User> upsertAll(List<User> users) {
    String[] emails = new String[users.size()];
    String[] names = new String[users.size()];
    for (int i = 0; i < users.size(); i++) {
        User u = users.get(i);
        emails[i] = u.email();
        names[i] = u.name();
    }

    return jdbc.sql(BULK_UPSERT_SQL)
            .param("emails", emails)
            .param("names",  names)
            .query(User.class)
            .list();
}
Enter fullscreen mode Exit fullscreen mode

Two arrays sent as native Postgres array values. One query plan, one execution. Connection hold time drops from seconds to milliseconds. Lock-hold time drops from “the whole loop” to “the duration of one statement”.

The ORDER BY is not just style. Postgres acquires row-level locks in the order it processes rows. Two concurrent bulk upserts touching overlapping rows in different orders will deadlock, and Postgres will kill one of them. Sorting by the conflict key forces a consistent lock acquisition order across all callers. This applies equally to a JPA saveAll loop touching the same rows, and is the kind of detail that’s invisible until production traffic finds it.

The schema decision you didn’t know you were making

There’s a quieter consequence of choosing an ORM: it shapes your schema before you’ve thought about your workload.

Take a simple choice: storing tags on a user.

user_tags join table text[] column
Natural in JPA yes no
Index for membership queries B-tree on (user_id, tag) GIN on the array
Write to add/remove a tag INSERT or DELETE on join table UPDATE rewriting the row
Lock granularity per-tag row whole user row
Storage row per tag TOAST-compressed array

Both work. Which is right depends on read/write ratio, average tags per user, and whether you need transactional add-one-tag semantics. A SQL-first developer weighs that. A JPA-first developer often doesn’t register that there was a decision to make.

The same applies to JSONB for semi-structured data, range types for time intervals, and partial indexes for filtered queries. None of these are first-class in JPA, so they tend not to appear in JPA-shaped schemas — even when they’d be the better fit.

5. The Showdown: Performance and Complexity

Single-row performance

The upsert where the incoming value matches what’s already stored:

JPA JdbcClient
Round trips for work 2 (SELECT, UPDATE) 1
New tuple versions written 0 if dirty check skips; 1 otherwise 0
WAL records emitted 0 if dirty check skips; full row otherwise none
Replication traffic follows WAL none

The upsert that genuinely changes the row:

JPA JdbcClient
Round trips for work 2 (SELECT, UPDATE) 1
New tuple versions written 1 1
WAL records emitted yes yes
Replication traffic yes yes

On a real change the gap is mostly one round trip; both versions write a tuple, both emit WAL, both fsync. On a value-match call, JPA can skip the UPDATE via dirty checking, but only after the pre-SELECT has already cost a round trip. Workloads dominated by replays, retries, and periodic syncs amplify this gap.

A note on @DynamicUpdate: it narrows full-column UPDATEs to changed-column UPDATEs, which reduces SQL bytes on the wire to the database. It does not reduce WAL volume, because Postgres MVCC writes a new tuple version with all columns regardless of how many the SQL touched. The only way to avoid that cost is to skip the UPDATE entirely. @SelectBeforeUpdate(false) skips the merge pre-SELECT but commits you to always writing on detached-entity updates. Each annotation trades one cost for another; none of them give you the unconditional, single-statement guard that IS DISTINCT FROM does.

Bulk performance

1,000-row bulk upsert:

JPA saveAll, batched JdbcClient unnest
Round trips ~1,000 SELECTs + batched writes 1
Connection hold time seconds milliseconds
Lock-hold pattern held across the loop released within the statement
Per-row no-op skipping requires native query native to the pattern

The structural advantage isn’t that JdbcClient is faster than JPA. It’s that the SQL pattern (single-statement guarded upsert, set-based bulk operations) is fundamentally faster than the access pattern JPA’s save() produces.

The cognitive load shift

The headline JPA service method is six lines, but the version that actually compiles and runs needs entity annotations, a no-arg constructor, getters and setters, and a repository interface — roughly 35 lines of supporting boilerplate. The JdbcClient version is around 40, and they’re SQL.

The line counts are close. The difference is what kind of lines they are, and that points at the real metric, which isn’t lines of code at all. It’s what your team has to know to use the system safely.

To use JPA safely, every developer must know:

  • Entity states (transient, managed, detached, removed) and the persist/merge routing rules.
  • L1 cache semantics and dirty checking.
  • Auto-flush ordering and how it interacts with JPQL and native queries.
  • Proxy lifecycles, including CGLIB self-invocation gotchas.
  • OSIV trade-offs and connection-acquisition tuning.
  • The annotation knobs that change SQL generation: @DynamicUpdate, @SelectBeforeUpdate, @BatchSize, @EntityGraph, @Fetch.
  • How to read Hibernate’s SQL log.
  • SQL for debugging and your database.

During an incident, you translate between Java → JPA → SQL → Postgres under pressure.

To use JdbcClient safely, every developer has to know SQL.

The cognitive load doesn’t disappear, it shifts. The trade is that the work goes into learning SQL and your database in depth rather than learning your framework. SQL is the same surface your DBAs speak, the same surface pg_stat_statements and EXPLAIN speak, and the same surface your replicas understand. Postgres’s behaviour is visible at runtime and stable across your career. Hibernate’s behaviour is invisible at runtime and transferable mainly to other Hibernate jobs.

6. My take: SQL first, ORM maybe

Everything above is the analytical case. Here’s the opinionated one.

I think SQL should come first — especially for juniors. Not because ORMs are bad, but because starting with one means learning the shortcut without learning what it shortcuts. You won’t know what you’re missing: the COALESCE across a LEFT JOIN, the array operator, the window function, the recursive CTE, the partial index, the LATERAL join. The ORM solves the surface problem and the deeper option never enters the conversation. Worse, when something goes wrong in production, you’re debugging two things at once — your code and the framework’s interpretation of it — when you haven’t yet built fluency in either.

There’s also a career argument that doesn’t get made enough. A Java developer with strong SQL is more marketable than a Java developer who only knows databases through a specific ORM. SQL is portable across languages, jobs, and decades. Postgres in 2026 is still recognisably the Postgres of 2010, and will still be recognisable in 2040. Hibernate knowledge transfers mainly to other Hibernate jobs, and even then, every major version churns the annotation surface and the flush semantics. The skill with the longer half-life is the one worth investing in first.

The other thing I’ll say plainly: I don’t buy that JPA is the “easy” choice. It’s the fast-to-start choice, which is a different thing. Six lines compile in minutes; understanding what those six lines do takes years, and the gap shows up at exactly the wrong moments. Every senior Java developer I know has at least one war story that boils down to “I didn’t realise Hibernate would do that.” I have several. JdbcClient with hand-written SQL is harder to start and easier to operate. Over the lifetime of a service, I’ll take that trade.

This applies to CRUD APIs too, which I know is the contrarian position. CRUD is where JPA looks best on the slide and worst in production: every endpoint is a thin wrapper over a write, every write hits the merge/dirty-check/flush machinery, and the workload is exactly the shape that benefits most from IS DISTINCT FROM guards and set-based bulk operations. The “but it’s just CRUD” framing is what gets you a service that’s six months old and already has three native queries bolted on for the paths where JPA hurt.

None of this means never use JPA. It means earn it. Learn the database first, hit a problem JPA genuinely solves better, then reach for it knowing what you’re trading. That’s a different posture from the default I see most often, which is reaching for JPA because it’s what the tutorial used.

7. Conclusion: Choose Deliberately

JPA is genuinely powerful when used within its sweet spot. It eliminates a massive amount of boilerplate, and it particularly shines for dynamic queries. Spring Data Specifications and the Criteria API give you a typesafe, composable convention for building complex, on-the-fly WHERE clauses. If your application heavily features dynamic filtering, multi-faceted search grids, or complex user-driven query building, JPA gives you an elegant, structured way to handle it that raw SQL string concatenation cannot match.

The “Bad” and the “Ugly” are the other side of that coin: runtime opacity, schema constraints, database features left on the table, and debugging difficulty when something fails. You can’t realise the advantage without understanding the cost. The trade-off depends entirely on the service. A few things to actually weigh:

  • How much SQL fluency does the team have, and how fast do incidents need to be diagnosed?
  • How comfortable is the team with the wider JPA surface — @EntityGraph, @BatchSize, @DynamicUpdate, StatelessSession, OSIV trade-offs, connection-acquisition tuning? You will need most of it eventually.
  • How critical is the workload? An internal admin tool tolerates more framework opacity than a checkout path.
  • How performance-sensitive is it, especially under writes, retries, replication, or batch load?
  • What database features matter to the workload? If the natural solution leans on arrays, JSONB, window functions, recursive CTEs, or LATERAL joins, the ORM will fight you more than help.
  • How long will this service live? JPA’s complexity bites hardest in the system you’re still operating five years from now.

The hybrid that ends up worst-of-both-worlds isn’t the result of mixing tools. It’s the result of unmade decisions: defaulting to JPA because it’s the tutorial choice, then bolting on native queries when JPA hurts. A team that deliberately picks per service is in a fine place. A team that defaults without thinking isn’t.

The framework is supposed to serve you. You should know enough to tell when it isn’t.

Top comments (0)