Skip to content
loading...

What is faster? Read file CSV or Oracle table?

github logo ・1 min read  

Hi guys,

I need an advice. I need to Load a huge amout of data(50 M). From Oracle to HBase.
In this moment there is an Job wrote in Talend (ETL System) that read data from CSV and load to HBase.

Oracle -> CSV File -> Talend Job -> HBase Database

Can I get better upload performance if I connect to the oracle database?

Is reading from a table Oracle faster than reading from a file CSV?

Thanks,
Daniel

twitter logo DISCUSS (2)
markdown guide
 

Can I get better upload performance if I connect to the oracle database?

Most likely, as you are anyway "connecting to the Oracle DB" to generate the CSV file. By reading directly from Oracle you save the CSV generation and parsing step. This step not only takes time but is also error-prone as all schema information is lost and all variables are converted to String.

 

It might depend on the load and frequency of these ETL jobs and data format.

Dumping a table allows you to decouple the extraction and insertion steps, which means extraction could be done by a serial job and insertion to the destination DB could be done in parallel. Granted this can also be accomplished by using an intermediate programming language but ETL tools are normally equipped at handling massive CSV.

Depending on how the data is it might not matter to have data type conversion in place.

Classic DEV Post from Mar 24

whats the best source of website templates?

I'm looking for some website templates for a basic product page, whats the best source of website tem...

Daniel Mutu profile image