DEV Community

Max Klein
Max Klein

Posted on

How to Store Scraped Data: CSV vs JSON vs Database

Web scraping is a powerful tool for extracting valuable information from the web, but the real challenge lies in what happens after the data is collected. How you store scraped data determines how easy it is to analyze, query, and reuse that information. In this tutorial, we'll walk through three popular storage options—CSV, JSON, and databases—and help you choose the best approach for your project.

Why Storage Matters

Storing scraped data is more than just saving files to your computer. The right storage method can:

  • Improve data integrity and reusability
  • Enable efficient querying and scalability
  • Reduce duplicate data and processing overhead

Let's break down the three options and see how they stack up.

Saving Scraped Data to CSV

CSV is the simplest format. Here's how to save scraped book data:

import csv

books = [
    {"title": "Python Crash Course", "author": "Eric Matthes", "price": "$29.99"},
    {"title": "Automate the Boring Stuff", "author": "Al Sweigart", "price": "$19.99"},
    {"title": "Clean Code", "author": "Robert C. Martin", "price": "$39.99"},
]

with open("books.csv", "w", newline="", encoding="utf-8") as csvfile:
    fieldnames = ["title", "author", "price"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for book in books:
        writer.writerow(book)
Enter fullscreen mode Exit fullscreen mode

Tip: Always use newline="" when writing CSV files in Python to avoid extra blank lines.

When to use CSV: Simple, flat data with consistent columns. Quick exports for spreadsheets.

Saving Scraped Data to JSON

JSON handles nested structures much better:

import json

books = [
    {
        "title": "Python Crash Course",
        "author": "Eric Matthes",
        "price": "$29.99",
        "categories": ["Programming", "Education"]
    },
    {
        "title": "Automate the Boring Stuff",
        "author": "Al Sweigart",
        "price": "$19.99",
        "categories": ["Automation", "Scripting"]
    }
]

with open("books.json", "w", encoding="utf-8") as jsonfile:
    json.dump(books, jsonfile, indent=4)
Enter fullscreen mode Exit fullscreen mode

When to use JSON: Nested or hierarchical data, API responses, config files.

Saving Scraped Data to a Database

For large-scale projects, databases are the way to go. Here's SQLite:

import sqlite3

conn = sqlite3.connect("books.db")
cursor = conn.cursor()

cursor.execute("""
CREATE TABLE IF NOT EXISTS books (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT,
    author TEXT,
    price TEXT
)
""")

books = [
    ("Python Crash Course", "Eric Matthes", "$29.99"),
    ("Automate the Boring Stuff", "Al Sweigart", "$19.99"),
]

cursor.executemany("INSERT INTO books (title, author, price) VALUES (?, ?, ?)", books)
conn.commit()
conn.close()
Enter fullscreen mode Exit fullscreen mode

When to use databases: Large datasets, need for querying, deduplication, multi-user access.

Quick Comparison

  • CSV: Simple, universal, but no nesting, no querying
  • JSON: Flexible, nested data, but large files are slow
  • Database: Scalable, queryable, but more setup overhead

Conclusion

CSV, JSON, and databases each have their place. CSV for quick exports, JSON for structured data, databases for production workloads. Choose based on your project's scale and complexity.


Need professional web scraping with clean, structured data delivery? Check out N3X1S INTELLIGENCE on Fiverr — we handle scraping, cleaning, and delivery in any format you need.

Top comments (0)