Athreya aka Maneshwar

Posted on Oct 4

Handling Large JSON Files in Python: Efficient Read, Write, and Update Strategies

#python #webdev #beginners #programming

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building *one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.

Handling Large JSON Files in Python: Efficient Read, Write, and Update Strategies

Working with JSON is common in data engineering, logging, and APIs.

But what happens when your JSON file isn’t a neat 2 KB config, but a monster with 14 lakh lines (1.4 million LOC)?

If you try to load it in Python with json.load(), you’ll likely run into memory errors. If you attempt a direct seek + write, you risk corrupting the structure.

Large JSON files demand a different strategy.

Let’s explore the best ways to handle massive JSON files in Python.

Why Large JSON Files Are Tricky

JSON is not append-friendly – It’s usually one giant object/array. Changing a single element can shift the rest of the file.
Memory consumption – Parsing the entire file at once may exceed system memory.
In-place edits are fragile – Unless your update is the same length in bytes, you’ll break the JSON structure.

1. Use Streaming Read + Rewrite

If you need to modify values in a large JSON file, don’t load everything into memory. Instead, read line by line, process, and write to a new file.

Example with JSON Lines (JSONL) format:

import json

with open("big.jsonl", "r") as infile, open("new.jsonl", "w") as outfile:
    for line in infile:
        obj = json.loads(line)
        # modify object as needed
        obj["status"] = "processed"
        outfile.write(json.dumps(obj) + "\n")

This approach works because each line is an independent JSON object.

2. Convert to JSONL (Line-Delimited JSON)

If your data is one giant array of objects, consider converting it to JSONL format:

{"id": 1, "value": "A"}
{"id": 2, "value": "B"}
{"id": 3, "value": "C"}

Why JSONL?

Easy to append new records.
Updates can be done line by line.
Tools like jq, pandas, and Spark handle JSONL natively.

3. Use a Database Instead of JSON

If you’re frequently updating or querying, JSON is the wrong tool. Databases are optimized for this.

SQLite for lightweight, file-based storage.
PostgreSQL / MongoDB for scalable solutions.

Load your JSON once, then perform updates via SQL or queries. You’ll save time and reduce risk of corruption.

👉 Don’t want to set up a database?
You can also convert the file into CSV if your data is mostly tabular. CSV is:

Faster to parse than JSON.
Easy to append new rows.
Compatible with Excel, pandas, and most data tools.

4. In-Place Updates (Not Recommended)

If you absolutely must write inside the file without rewriting:

Only possible if the new data has the exact same length as the old data.
Otherwise, the JSON structure will break.

Example:

with open("big.json", "r+b") as f:
    f.seek(1024)         # go to offset
    f.write(b"12345")    # must be same byte length

This is brittle and not recommended for large files.

Best Practices

✅ If you only need additions → switch to JSONL and append.
✅ If you need batch updates → stream read and rewrite.
✅ If you need frequent random updates → use a database.
❌ Don’t rely on seek + write for JSON unless data length is fixed.

Final Thoughts

JSON is great for portability, but not for frequent updates at scale. If your file is growing into the millions of lines, treat it like a dataset, not just a file.

Use JSONL or a database to save yourself from painful crashes and corrupt files.

I’ve been building for FreeDevTools.

A collection of UI/UX-focused tools crafted to simplify workflows, save time, and reduce friction in searching tools/materials.

Any feedback or contributors are welcome!

It’s online, open-source, and ready for anyone to use.

👉 Check it out: FreeDevTools
⭐ Star it on GitHub: freedevtools

Top comments (1)

tourist • Oct 4

This is awesome! Do you have any benchmarks comparing JSONL vs SQLite for large-scale updates? I’m curious how performance scales once you hit a few million rows.