In order to check the integrity of some data, I had to copy the contents of a large customer table from a DB2 database into an Oracle database. I know SQL*Loader exists, but this is not the time I'm gonna learn how to use it.
One INSERT command for each row to process
To begin with, I made tests with a subset of the data and tried to make it as simple as possible. So simple that I generated and executed one INSERT query per row to be inserted.
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_1', 'Two_1', 'Three_1');
then
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_2', 'Two_2', 'Three_2');
then
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_3', 'Two_3', 'Three_3');
etc...
=> 12.07 seconds to insert 4579 lines (i.e. 379 requests per second).
INSERT commands inside one BEGIN / END
Rather than making a connection.Execute()
for each line, I grouped the INSERT commands into a BEGIN ... INSERT expression (a bit like a stored procedure) so as to hit the server for only one connection.Execute()
.
BEGIN
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_1', 'Two_1', 'Three_1');
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_2', 'Two_2', 'Three_2');
INSERT INTO MyTable (Column1, Column2, Column3) VALUES ('One_3', 'Two_3', 'Three_3');
etc...
END;
=> 6.37 seconds to insert 4579 lines => already twice as fast.
Note: with Dapper (and ADO.NET in general I suppose), Oracle doesn't support line breaks in the SQL command: I get an error @@PLS-00103: Encountered the symbol "" when expecting one of the following@
.
To avoid this problem, just don't add a line break when generating the big query :
BEGIN INSERT INTO MyTable (...) VALUES (...); INTO MyTable (...) VALUES (...); ...; END;
An INSERT command from several SELECTs
This time, instead of making 1 access to the server to still make several INSERTs, I optimized even more by sending a single INSERT request. To do this, I turned "INSERT ... VALUES ..." into "INSERT ... SELECT FROM ..." :)
INSERT INTO MyTable (Column1, Column2, Column3)
SELECT ('One_1', 'Two_1', 'Three_1') FROM DUAL
UNION SELECT ('One_2', 'Two_2', 'Three_2') FROM DUAL
UNION SELECT ('One_3', 'Two_3', 'Three_3') FROM DUAL
etc...
=> 2.84 seconds to insert 4579 lines (i.e. 1612 requests per second) => 4 times faster.
Test results
It was useful to run a few tests, since it allows to go from 379 inserts in one second to more than 1600! And as I have nearly 300,000 lines to process, it will take 3 minutes and not 15 minutes.
Implementation
Actually, I copy the entire data with block of 500 customers to avoid overloading memory:
public int SaveCustomers(IEnumerable<Customer> customers)
{
var sql = @"UNION SELECT '{0}', '{1}', '{2}', '{3}', '{4}' FROM DUAL ";
int count = 0;
try
{
connexion.Open();
var batch = new StringBuilder();
foreach (var c in customers)
{
batch.Append(string.Format(sql, c.Agency, c.Code, c.Siret, c.Type));
count++;
if ((count % 500) == 0)
{
connexion.Execute(Sql_FromSelect(batch));
batch = new StringBuilder();
}
}
connexion.Execute(Sql_FromSelect(batch));
}
catch (Exception ex)
{
throw ex;
}
finally
{
connexion.Close();
}
return count;
}
private string Sql_FromSelect(StringBuilder batch)
{
// Au départ :
// "UNION SELECT '...', '...', '...', '...', '...' FROM DUAL UNION SELECT ... "
batch.Remove(0, 5);
// => " SELECT '...', '...', '...', '...', '...' FROM DUAL UNION SELECT ... "
var start = @"BEGIN INSERT INTO Customers (Agency, Code, Siret, Type) ";
batch.Insert(0, start);
// => "BEGIN INSERT INTO Customers (...) SELECT '...', '...', ... "
batch.Append("; END;");
// => "BEGIN INSERT INTO Customers (...) SELECT ... FROM DUAL; END;"
return batch.ToString();
}
This post was originally published on blog.pagesd.info.
Cover image : Rusty old truck in a pumpkin patch - Priscilla Du Preez.
Top comments (4)
Good trick but why not simply use jdbc batch update?
It should must faster and easily to use.
Note that in general you should commit each 1'000 - 10'000 otherwise performance decrease
docs.oracle.com/cd/E11882_01/java....
Indeed :)
But it's C# code and I used what I know best: ADO.NET and Dapper.
Otherwise, it's a translation of an old post. At this time, the batch update had to go through DataAdapter that I didn't use.
Good One. it really helped to boost performance of an INSERT operation. is there any way for UPDATE as well similar to it?
Yes, it works for everything. The BEGIN / END block can contain all kinds of queries: