Michel

Posted on Feb 4, 2020 • Originally published at blog.pagesd.info

Optimize multiple INSERTs (for Oracle)

#sql

In order to check the integrity of some data, I had to copy the contents of a large customer table from a DB2 database into an Oracle database. I know SQL*Loader exists, but this is not the time I'm gonna learn how to use it.

One INSERT command for each row to process

To begin with, I made tests with a subset of the data and tried to make it as simple as possible. So simple that I generated and executed one INSERT query per row to be inserted.



INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_1', 'Two_1', 'Three_1');
then
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_2', 'Two_2', 'Three_2');
then
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_3', 'Two_3', 'Three_3');
etc...

=> 12.07 seconds to insert 4579 lines (i.e. 379 requests per second).

INSERT commands inside one BEGIN / END

Rather than making a connection.Execute() for each line, I grouped the INSERT commands into a BEGIN ... INSERT expression (a bit like a stored procedure) so as to hit the server for only one connection.Execute().



BEGIN
  INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_1', 'Two_1', 'Three_1');
  INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_2', 'Two_2', 'Three_2');
  INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_3', 'Two_3', 'Three_3');
  etc...
END;

=> 6.37 seconds to insert 4579 lines => already twice as fast.

Note: with Dapper (and ADO.NET in general I suppose), Oracle doesn't support line breaks in the SQL command: I get an error @@PLS-00103: Encountered the symbol "" when expecting one of the following@.

To avoid this problem, just don't add a line break when generating the big query :



BEGIN INSERT INTO MyTable (...) VALUES (...); INTO MyTable (...) VALUES (...); ...; END;

An INSERT command from several SELECTs

This time, instead of making 1 access to the server to still make several INSERTs, I optimized even more by sending a single INSERT request. To do this, I turned "INSERT ... VALUES ..." into "INSERT ... SELECT FROM ..." :)



INSERT INTO MyTable (Column1, Column2, Column3)
SELECT ('One_1', 'Two_1', 'Three_1') FROM DUAL
UNION SELECT ('One_2', 'Two_2', 'Three_2') FROM DUAL
UNION SELECT ('One_3', 'Two_3', 'Three_3') FROM DUAL
etc...

=> 2.84 seconds to insert 4579 lines (i.e. 1612 requests per second) => 4 times faster.

Test results

It was useful to run a few tests, since it allows to go from 379 inserts in one second to more than 1600! And as I have nearly 300,000 lines to process, it will take 3 minutes and not 15 minutes.

Implementation

Actually, I copy the entire data with block of 500 customers to avoid overloading memory:



public int SaveCustomers(IEnumerable<Customer> customers)
{
    var sql = @"UNION SELECT '{0}', '{1}', '{2}', '{3}', '{4}' FROM DUAL ";

    int count = 0;
    try
    {
        connexion.Open();
        var batch = new StringBuilder();
        foreach (var c in customers)
        {
            batch.Append(string.Format(sql, c.Agency, c.Code, c.Siret, c.Type));
            count++;
            if ((count % 500) == 0)
            {
                connexion.Execute(Sql_FromSelect(batch));
                batch = new StringBuilder();
            }
        }
        connexion.Execute(Sql_FromSelect(batch));
    }
    catch (Exception ex)
    {
        throw ex;
    }
    finally
    {
        connexion.Close();
    }

    return count;
}

private string Sql_FromSelect(StringBuilder batch)
{
    // Au départ :
    // "UNION SELECT '...', '...', '...', '...', '...' FROM DUAL UNION SELECT ... "

    batch.Remove(0, 5);
    // => " SELECT '...', '...', '...', '...', '...' FROM DUAL UNION SELECT ... "

    var start = @"BEGIN INSERT INTO Customers (Agency, Code, Siret, Type) ";
    batch.Insert(0, start);
    // => "BEGIN INSERT INTO Customers (...) SELECT '...', '...', ... "

    batch.Append("; END;");
    // => "BEGIN INSERT INTO Customers (...) SELECT ... FROM DUAL; END;"

    return batch.ToString();
}

This post was originally published on blog.pagesd.info.
Cover image : Rusty old truck in a pumpkin patch - Priscilla Du Preez.

Top comments (4)

yyvess • Feb 4 '20 • Edited

Good trick but why not simply use jdbc batch update?
It should must faster and easily to use.
Note that in general you should commit each 1'000 - 10'000 otherwise performance decrease

docs.oracle.com/cd/E11882_01/java....

Michel • Feb 5 '20

Indeed :)
But it's C# code and I used what I know best: ADO.NET and Dapper.
Otherwise, it's a translation of an old post. At this time, the batch update had to go through DataAdapter that I didn't use.

Mohan • Nov 13 '20

Good One. it really helped to boost performance of an INSERT operation. is there any way for UPDATE as well similar to it?

Michel • Nov 14 '20

Yes, it works for everything. The BEGIN / END block can contain all kinds of queries:

BEGIN
  UPDATE MyTable SET Column4 = Column4 + 1 WHERE Column1 = 'One_1';
  UPDATE MyTable SET Column4 = Column4 + 2 WHERE Column1 = 'One_2';
  UPDATE MyTable SET Column4 = Column4 + 3 WHERE Column1 = 'One_3';
  etc...
END;