DEV Community

Anshul Bansal
Anshul Bansal

Posted on

Best Possible Hibernate Configuration for Batch Inserts

Problem

In general, the hibernate entities (domains) are set to use database sequence as Id generator.
In such a case, for every insert hibernate makes two round trips to the database. It'll first make a round trip to get the next value of the sequence to set the identifier of the record. Then make another round trip to insert the record.

Assume, we want to insert 1000 records in the database. It'll result in a total of 2000 round trips to the database (1000 round trips each, to get the next value of a sequence and insert the records).

Even in a case of very low network latency, performing a few thousand inserts may require a significant amount of time.

The first major problem is to perform each insert separately.
The second major problem is to make a round trip to get the next value of the sequence every time.

Theoretical Solution

We should try to reduce the network round trips to the database for bulk inserts by batch processing of inserts.
Along with that, reduce the network round trips to get the next value of the sequence for every insert.

Hibernate Solution

First problem:

We know the obvious.

Use the JDBC batching provided by Hibernate.

Set the following properties in the hibernate configuration file.

<property name="hibernate.jdbc.batch_size" value="100"/>
<property name="hibernate.order_inserts" value="true"/>
Enter fullscreen mode Exit fullscreen mode

Now, hibernate will make a batch of 100 inserts and orders them. Then, it will make a single network round trip to the database to insert 100 records.

Therefore, the initial 1000 round trips to insert the records will reduce to 10.

Second problem:

We'll use enhance sequence identifier with an optimizer strategy like pooled or pooled-lo, which provides in-memory identifiers. Therefore, they reserve ids in memory to be used later.

Let's see what a 'pooled' optimizer strategy can do!

To enable it, we'll require to set the 'INCREMENT BY' of the database sequence to 100.

Then, set the pooled optimizer strategy with increment_size = 100 in the entity:

@Id
@GenericGenerator(
    name = "sequenceGenerator",
    strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
    parameters = {
        @Parameter(name = "sequence_name", value = "hibernate_sequence"),
        @Parameter(name = "optimizer", value = "pooled"),
        @Parameter(name = "initial_value", value = "1"),
        @Parameter(name = "increment_size", value = "100")
    }
)
@GeneratedValue(
    strategy = GenerationType.SEQUENCE,
    generator = "sequenceGenerator"
)
private Long id;
Enter fullscreen mode Exit fullscreen mode

Now, hibernate will make a round trip to the database and set the sequence nextval to next 100 value. And, reserve the 100 values in memory to set the ids for the inserts.
With this approach, we'll make 1 round trip to fetch the next value of the sequence for every 100 records to insert.

Hence, the initial 1000 round trips to get the next value of the sequence will reduce to 10.

Therefore applying both of the solutions, we can reduce the round trips to just 20 for 1000 inserts.

We all may know about batch insert optimization. But, the sequence optimizer is quite a winner.
Not many of us know this kind of optimization strategy available already.

Isn't it sound cool and smart.

Guys, Please do share your thoughts with me.

Top comments (8)

Collapse
 
mubasherusman profile image
Mubasher Usman

Will 'pooled' optimizer strategy work with MySql database and Hibernate (Spring-Data-Jpa) as ORM

Collapse
 
smartyansh profile image
Anshul Bansal • Edited

Yes it will work.
[Edit] Sorry, it will not work.

Collapse
 
nikjain10 profile image
nikjain10

Hi,

I am using spring batch and inserting records in batch. I have set 1000 as allocation size and seq increment value. However sometimes when job fails and restarts,
I am getting unique constraint violation when i am using pooled optimizer. Could you please let me know the cause and solution ?
Also is there any way we can clear the in memory sequence values whenever job fails or in any condition ?

Thanks.

 
smartyansh profile image
Anshul Bansal
Thread Thread
 
digvijayah02 profile image
djyo02
Thread Thread
 
mwaseemmnnit profile image
Mohd Waseem

why " 'pooled' optimizer strategy work with MySql database and Hibernate " does not work?
I tried this example and it worked below are the versions that i have used:

  • mysql Ver 14.14 Distrib 5.7.30, for Linux (x86_64) using EditLine wrapper
  • hibernate: hibernate-core:jar:5.4.8.Final:compile Hibernate is using a table hibernate_sequence and first getting id and caching it, then using those ids to insert record. I had enabled hibernate statistics:

Session Metrics {
1720780 nanoseconds spent acquiring 2 JDBC connections;
15750 nanoseconds spent releasing 1 JDBC connections;
110738 nanoseconds spent preparing 3 JDBC statements;
1044388 nanoseconds spent executing 2 JDBC statements;
13444485 nanoseconds spent executing 1 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
44990162 nanoseconds spent executing 1 flushes (flushing a total of 50 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
This looks line Table generator but i use the generator code from this post.

Thread Thread
 
mwaseemmnnit profile image
Mohd Waseem

Hibernate is created hibernate_sequence table when i used sequence generator. Since mysql doesnt support sequence generator hibernate fall back to table generator due to which i was able to run example mentioned in the post.

 
smartyansh profile image
Anshul Bansal

Thanks for pointing it out. My bad, I didn't think of that.
You are right, it will not work. However, we can use IDENTITY and handle the batch inserts via jOOQ instead of Hibernate.