DEV Community

Daniel Mutu
Daniel Mutu

Posted on

What is faster? Read file CSV or Oracle table?

Hi guys,

I need an advice. I need to Load a huge amout of data(50 M). From Oracle to HBase.
In this moment there is an Job wrote in Talend (ETL System) that read data from CSV and load to HBase.

Oracle -> CSV File -> Talend Job -> HBase Database

Can I get better upload performance if I connect to the oracle database?

Is reading from a table Oracle faster than reading from a file CSV?

Thanks,
Daniel

Top comments (2)

Collapse
 
frosnerd profile image
Frank Rosner • Edited

Can I get better upload performance if I connect to the oracle database?

Most likely, as you are anyway "connecting to the Oracle DB" to generate the CSV file. By reading directly from Oracle you save the CSV generation and parsing step. This step not only takes time but is also error-prone as all schema information is lost and all variables are converted to String.

Collapse
 
rhymes profile image
rhymes

It might depend on the load and frequency of these ETL jobs and data format.

Dumping a table allows you to decouple the extraction and insertion steps, which means extraction could be done by a serial job and insertion to the destination DB could be done in parallel. Granted this can also be accomplished by using an intermediate programming language but ETL tools are normally equipped at handling massive CSV.

Depending on how the data is it might not matter to have data type conversion in place.