DEV Community

Franz Wong
Franz Wong

Posted on

7

Process large json with limited memory

Sometimes, we need to process big json file or stream but we don't need to store all contents in memory.

For example, when we count the number of items in a big array, we just need to load 1 item, increment the count, throw it away and repeat until the whole array is counted.

I found big json file from this git repository https://github.com/zemirco/sf-city-lots-json (~190MB).

The file looks this and I want to count the number of features.

{
  "type": "FeatureCollection",
  "features": [ /* lots of feature objects */ ]
}
Enter fullscreen mode Exit fullscreen mode

This is how feature object looks like if you are interested.

{
  "type": "Feature",
  "properties": {
    "MAPBLKLOT": "0001001",
    "BLKLOT": "0001001",
    "BLOCK_NUM": "0001",
    "LOT_NUM": "001",
    "FROM_ST": "0",
    "TO_ST": "0",
    "STREET": "UNKNOWN",
    "ST_TYPE": null,
    "ODD_EVEN": "E"
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -122.422003528252475,
          37.808480096967251,
          0.0
        ],
        [
          -122.422076013325281,
          37.808835019815085,
          0.0
        ],
        [
          -122.421102174348633,
          37.808803534992904,
          0.0
        ],
        [
          -122.421062569067274,
          37.808601056818148,
          0.0
        ],
        [
          -122.422003528252475,
          37.808480096967251,
          0.0
        ]
      ]
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Let's say my application can only allocate 50MB and I try to load the whole file into memory.

Path filePath = Path.of("/src/sf-city-lots-json/citylots.json");
String content = Files.readString(filePath);
Enter fullscreen mode Exit fullscreen mode

Obviously, we can't load it to memory.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Enter fullscreen mode Exit fullscreen mode

Gson provides JsonReader which allows reading data stream.

public int getFeatureCount(Path filePath) throws Exception {
    int count = 0;
    try (JsonReader reader = new JsonReader(Files.newBufferedReader(filePath))) {
        reader.beginObject();
        while (reader.hasNext()) {
            String name = reader.nextName();
            if ("features".equals(name)) {
                count = getFeatureCountFromArray(reader);
            } else {
                reader.skipValue();
            }
        }
        reader.endObject();
    }
    return count;
}

private int getFeatureCountFromArray(JsonReader reader) throws Exception {
    int count = 0;
    reader.beginArray();
    while (reader.hasNext()) {
        count++;
        reader.beginObject();
        while (reader.hasNext()) {
            reader.skipValue();
        }
        reader.endObject();
    }
    reader.endArray();
    return count;
}
Enter fullscreen mode Exit fullscreen mode

Greater power comes with greater responsibility. Unlike Gson.fromJson, we need to call begin*, end* and skipValue in the right timing (according to the structure of the json object) to let it process the data correctly, otherwise it will throw exception. So it should be used only when you have restriction on memory footprint or performance.

Image of Datadog

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

Top comments (1)

Collapse
 
shreyasht profile image
Shreyash

This helped a lot. Thanks!

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay