DEV Community 👩‍💻👨‍💻

Cover image for Loading a large json file into elixir's ETS (Erlang Term Storage) Cache using Jaxon
Karthik D
Karthik D

Posted on

Loading a large json file into elixir's ETS (Erlang Term Storage) Cache using Jaxon

Original article: https://kdnanmaga.xyz/how-to-load-a-large-json-file-into-elixir-ets-cache/

Elixir's ETS cache is an in-memory data store that can be accessed across processes. It can be used to build lookup tables such as geocoders, translators etc. in web applications.

For an application I was building, I had to load a large dataset (~500k rows) of geocodes into an ETS table before the webserver endpoint starts. This data can be shared across all the processes that handle incoming requests.

First attempt: Load the file and then parse

At first, I attempted loading the file into memory, then parsing it with Jason and...

# here my json is one single root object with key-value pairs
def load_file(filename, tablename) do      
  :ets.new(tablename, [:named_table])
  with {:ok, body} <- File.read(filename), {:ok, json} <- Jason.decode(body),
  do: load_from_map(json, tablename)
end

defp load_from_map(parsed_map, tablename) do
  :ets.new(tablename, [:named_table])
  for {k,v} <- parsed_map do
    :ets.insert(tablename, {k,v})
  end
end              
Enter fullscreen mode Exit fullscreen mode

It worked, but it took quite a while and hogged quite some RAM. My machine with 4GB RAM froze for about a minute.

Streaming to the rescue

At this point, I thought there could be a better way to do this, may be something that doesn't involve reading the entire file into memory. That's when I found Jaxon, a streaming JSON Parser. So now the file is opened as a stream and the JSON is parsed as the stream is being read. Pretty neat right?

# here my json is an array of objects {"k":<key>,"v":<value>}
def load_file(filename, tablename) do      
  :ets.new(tablename, [:named_table])
  filename
  |> File.stream!()
  |> Jaxon.Stream.from_enumerable()
  |> Jaxon.Stream.query([:root, :all])
  |> Stream.each(fn (kv) -> :ets.insert(tablename, {kv["k"],kv["v"]}) end)
  |> Stream.run()
end
Enter fullscreen mode Exit fullscreen mode

At first this didn't seem to work and I was disappointed until I realized I my JSON wasn't pretty and was just a single line. I generated a multi-line pretty JSON and voila! It worked!

Find me on twitter: @kdnanmaga

Top comments (0)

DEV

Thank you.

 
Thanks for visiting DEV, we’ve worked really hard to cultivate this great community and would love to have you join us. If you’d like to create an account, you can sign up here.