loading...

Loading data into Postgresql using Nodejs and Minio

dviejopomata profile image Dviejo Updated on ・2 min read

I was recently loading a lot of files into PostgreSQL, and i had the need of doing it without many resources and to be fast.

The requirements I had were the following:

  1. Low footprint, not reading all lines of a file at once
  2. Easy to load data
  3. Write data efficiently into Postgres using "COPY TO" sentence

With these requirements, I came up with the idea of doing with streams in NodeJS because:

  1. It's efficient
  2. It's simple

Let's say I have the following file

first_name, last_name, country
Karie,Zorn,Ukraine
Thebault,Flicker,France
Danya,Atcock,Canada
Lorne,Westmorland,Russia
Page,Greeve,Canada
Rene,Riccardini,Serbia
Adair,Mullin,Philippines
Della,Gumb,China
Charlie,Swadlinge,Cambodia
Celka,Karlowicz,Canada

And the following table in PostgreSQL:

CREATE EXTENSION "uuid-ossp";   
CREATE TABLE PERSON(
    id uuid default uuid_generate_v4(),
    first_name varchar(30) NOT NULL,
    last_name varchar(30) NOT NULL,
    country varchar(30) NOT NULL
);

As we're going to use curl in the PostgreSQL sentence in order to get the file, we're going to extend the Postgresql Docker image with the following:

FROM postgres:10.4
RUN apt-get update && apt-get install -y curl

I named this file postgresql.Dockerfile, when you have that file, we're going to get all running with the following commands:

docker build -f postgresql.Dockerfile -t postgres-with-curl .
docker run --restart always --name postgres --network host -d postgres-with-curl
docker run --network host -e MINIO_ACCESS_KEY=user -e MINIO_SECRET_KEY=password --restart  always --name minio -d minio/minio:RELEASE.2018-06-22T23-48-46Z server /data

So let's do it, first we need to create the connections for PostgreSQL and Minio:

const pg = require("pg")
const minio = require("minio")

const minioClient = new minio.Client({
    accessKey: "user",
    secretKey: "password",
    secure: false,
    endPoint: "localhost",
    port: 9000,
    region: "us-east-1",
})
const pgPool = new pg.Pool({
    user: "postgres",
    password: "postgres",
    host: "localhost",
    port: 5432,
    database: "postgres",
})

Then we're going to do basically 3 things:

  1. Read the file
  2. Upload to minio
  3. Execute SQL in PostgreSQL instance
const fs = require("fs")
const path = require("path")

const TABLE_NAME = "public.people"

async function main() {
    const input = fs.createReadStream(path.join(__dirname, "people.csv"), {
        encoding: "utf-8",
    })
    const bucketName = "tmp"
    const objectName = "new.csv"
    await minioClient.putObject(bucketName, objectName, input)
    const csvUrl = await minioClient.presignedGetObject(
        bucketName,
        objectName,
        10000, // duration in seconds of the url
    )
    const copyDataSql = `
  COPY ${TABLE_NAME} (FIRST_NAME, LAST_NAME, COUNTRY)
    FROM PROGRAM 'curl "${csvUrl}"'
    WITH (FORMAT CSV , HEADER, QUOTE '"', DELIMITER ',' );`
    const client = await pgPool.connect()
    await client.query(copyDataSql)
    await client.release()
    await pgPool.end()
}

main()

If we execute the following code in nodejs we should see the rows in the database with the following query:

SELECT * FROM PUBLIC.PEOPLE;

The script can fail because of the following reasons:

  • Minio is not up
  • Postgresql is not up
  • Postgresql table has not been created
  • Minio bucket "tmp" has not been created

This technique can be useful when we have no control over the disk in postgres and we have to perform a batch operation.

Thanks for reading.

Discussion

pic
Editor guide