Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
# Handling Large JSON Files in Python: Efficient Read, Write, and Update StrategiesWorking withJSON is common in data engineering, logging, and APIs.
But what happens when your JSON file isn’t a neat 2 KB config, but a monster with 14 lakh lines (1.4 million LOC)?
If you try to load it in Python with json.load(), you’ll likely run into memory errors. If you attempt a direct seek + write, you risk corrupting the structure.
Large JSON files demand a different strategy.
Let’s explore the best ways to handle massive JSON files in Python.
Why Large JSON Files Are Tricky
- JSON is not append-friendly – It’s usually one giant object/array. Changing a single element can shift the rest of the file.
- Memory consumption – Parsing the entire file at once may exceed system memory.
- In-place edits are fragile – Unless your update is the same length in bytes, you’ll break the JSON structure.
1. Use Streaming Read + Rewrite
If you need to modify values in a large JSON file, don’t load everything into memory. Instead, read line by line, process, and write to a new file.
Example with JSON Lines (JSONL) format:
import json
with open("big.jsonl", "r") as infile, open("new.jsonl", "w") as outfile:
for line in infile:
obj = json.loads(line)
# modify object as needed
obj["status"] = "processed"
outfile.write(json.dumps(obj) + "\n")
This approach works because each line is an independent JSON object.
2. Convert to JSONL (Line-Delimited JSON)
If your data is one giant array of objects, consider converting it to JSONL format:
{"id": 1, "value": "A"}
{"id": 2, "value": "B"}
{"id": 3, "value": "C"}
Why JSONL?
- Easy to append new records.
- Updates can be done line by line.
- Tools like
jq, pandas, and Spark handle JSONL natively.
3. Use a Database Instead of JSON
If you’re frequently updating or querying, JSON is the wrong tool. Databases are optimized for this.
- SQLite for lightweight, file-based storage.
- PostgreSQL / MongoDB for scalable solutions.
Load your JSON once, then perform updates via SQL or queries. You’ll save time and reduce risk of corruption.
👉 Don’t want to set up a database?
You can also convert the file into CSV if your data is mostly tabular. CSV is:
- Faster to parse than JSON.
- Easy to append new rows.
- Compatible with Excel, pandas, and most data tools.
4. In-Place Updates (Not Recommended)
If you absolutely must write inside the file without rewriting:
- Only possible if the new data has the exact same length as the old data.
- Otherwise, the JSON structure will break.
Example:
with open("big.json", "r+b") as f:
f.seek(1024) # go to offset
f.write(b"12345") # must be same byte length
This is brittle and not recommended for large files.
Best Practices
- ✅ If you only need additions → switch to JSONL and append.
- ✅ If you need batch updates → stream read and rewrite.
- ✅ If you need frequent random updates → use a database.
- ❌ Don’t rely on seek + write for JSON unless data length is fixed.
Final Thoughts
JSON is great for portability, but not for frequent updates at scale. If your file is growing into the millions of lines, treat it like a dataset, not just a file.
Use JSONL or a database to save yourself from painful crashes and corrupt files.
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
- 🔗 Why git? Git is universal. Every editor, every IDE, every AI…
Top comments (1)
This is awesome! Do you have any benchmarks comparing JSONL vs SQLite for large-scale updates? I’m curious how performance scales once you hit a few million rows.