DEV Community

Dviejo
Dviejo

Posted on β€’ Edited on

2 4

Loading data into Postgresql using Nodejs and Minio

I was recently loading a lot of files into PostgreSQL, and i had the need of doing it without many resources and to be fast.

The requirements I had were the following:

  1. Low footprint, not reading all lines of a file at once
  2. Easy to load data
  3. Write data efficiently into Postgres using "COPY TO" sentence

With these requirements, I came up with the idea of doing with streams in NodeJS because:

  1. It's efficient
  2. It's simple

Let's say I have the following file

first_name, last_name, country
Karie,Zorn,Ukraine
Thebault,Flicker,France
Danya,Atcock,Canada
Lorne,Westmorland,Russia
Page,Greeve,Canada
Rene,Riccardini,Serbia
Adair,Mullin,Philippines
Della,Gumb,China
Charlie,Swadlinge,Cambodia
Celka,Karlowicz,Canada

And the following table in PostgreSQL:

CREATE EXTENSION "uuid-ossp";   
CREATE TABLE PERSON(
    id uuid default uuid_generate_v4(),
    first_name varchar(30) NOT NULL,
    last_name varchar(30) NOT NULL,
    country varchar(30) NOT NULL
);

As we're going to use curl in the PostgreSQL sentence in order to get the file, we're going to extend the Postgresql Docker image with the following:

FROM postgres:10.4
RUN apt-get update && apt-get install -y curl

I named this file postgresql.Dockerfile, when you have that file, we're going to get all running with the following commands:

docker build -f postgresql.Dockerfile -t postgres-with-curl .
docker run --restart always --name postgres --network host -d postgres-with-curl
docker run --network host -e MINIO_ACCESS_KEY=user -e MINIO_SECRET_KEY=password --restart  always --name minio -d minio/minio:RELEASE.2018-06-22T23-48-46Z server /data

So let's do it, first we need to create the connections for PostgreSQL and Minio:

const pg = require("pg")
const minio = require("minio")

const minioClient = new minio.Client({
    accessKey: "user",
    secretKey: "password",
    secure: false,
    endPoint: "localhost",
    port: 9000,
    region: "us-east-1",
})
const pgPool = new pg.Pool({
    user: "postgres",
    password: "postgres",
    host: "localhost",
    port: 5432,
    database: "postgres",
})

Then we're going to do basically 3 things:

  1. Read the file
  2. Upload to minio
  3. Execute SQL in PostgreSQL instance
const fs = require("fs")
const path = require("path")

const TABLE_NAME = "public.people"

async function main() {
    const input = fs.createReadStream(path.join(__dirname, "people.csv"), {
        encoding: "utf-8",
    })
    const bucketName = "tmp"
    const objectName = "new.csv"
    await minioClient.putObject(bucketName, objectName, input)
    const csvUrl = await minioClient.presignedGetObject(
        bucketName,
        objectName,
        10000, // duration in seconds of the url
    )
    const copyDataSql = `
  COPY ${TABLE_NAME} (FIRST_NAME, LAST_NAME, COUNTRY)
    FROM PROGRAM 'curl "${csvUrl}"'
    WITH (FORMAT CSV , HEADER, QUOTE '"', DELIMITER ',' );`
    const client = await pgPool.connect()
    await client.query(copyDataSql)
    await client.release()
    await pgPool.end()
}

main()

If we execute the following code in nodejs we should see the rows in the database with the following query:

SELECT * FROM PUBLIC.PEOPLE;

The script can fail because of the following reasons:

  • Minio is not up
  • Postgresql is not up
  • Postgresql table has not been created
  • Minio bucket "tmp" has not been created

This technique can be useful when we have no control over the disk in postgres and we have to perform a batch operation.

Thanks for reading.

AWS GenAI LIVE image

Real challenges. Real solutions. Real talk.

From technical discussions to philosophical debates, AWS and AWS Partners examine the impact and evolution of gen AI.

Learn more

Top comments (0)

Postgres on Neon - Get the Free Plan

No credit card required. The database you love, on a serverless platform designed to help you build faster.

Get Postgres on Neon