DEV Community

aziz abdullaev
aziz abdullaev

Posted on

Parsing data from CSV into PostgreSQL in Phoenix Framework project with Elixir

In this article, I will discuss how to populate database (PostgreSQL in my case) from CSV file.

My configs are:

  • Elixir 1.15.7
  • OTP 26
  • LiveView 0.20.1
  • Phoenix 1.7.10
  • Phoenix LiveView 0.18.18
  • Ecto 3.11
  • Dockerized PostgreSQL
  • All running on my MacBook Air M2 8GB Ram

First, let’s install CSV parser. I am using NimbleCSV — created and maintained by the creator of Elixir, Jose Valim.

In mix.exs deps, we need to add:

def deps do
  [
    {:nimble_csv, "~> 1.1"}
  ]
end
Enter fullscreen mode Exit fullscreen mode

Let’s create a context and schema which we are going to populate:

mix phx.gen.live Projects Investment investments name funder type status
Enter fullscreen mode Exit fullscreen mode

Now, let’s go to the actual implementation which is really easy and straightforward.

In our seeds.exs file, we add:

alias NimbleCSV.RFC4180, as: CSV

"/Users/user/Documents/investments.csv"
|> File.stream!()
|> CSV.parse_stream()
|> Stream.each(fn [
                   project_name,
                   funder,
                   type,
                   status
                 ] ->
  %{
    project_name: project_name,
    funder: funder,
    type: type,
    status: status,
  }
  |> MyApp.Projects.create_investment()
end)
|> Stream.run()

Enter fullscreen mode Exit fullscreen mode

First, we are importing NimbleCSV.RFC4180 (the most common type of CSV encoding — works well with Google Sheet exported CSVs) as CSV (make sure to double check your CSV file’s encoding). Then, we are creating a file stream, parsing it through the CSV parser. Then on each row, we are going to create an entry in our database.

Function that we pass to Stream.each takes a row as an array in which each element is a separate column of data. Then, we are going to create a map and pass it to our Context function (that was auto-generated for us).

Then, all we have to do is to run this script in terminal:

mix run priv/repo/seeds.exs
Enter fullscreen mode Exit fullscreen mode

For the sake of intellectual debate, there is probably a lot of ways this can be improved.

We can add some asynchronous parallel spices into the script. One can create a few parallel processes that will asynchronously iterate the rows from certain indices. 1st process will iterate from 0th to 100th row, second will iterate from 100th to 200th, so on. Each process will create an array of data which we can then insert to DB with Ecto.Repo.insert_all.

Top comments (0)