DEV Community

Cover image for Designing a Scalable Logging System for Web Scrapers: How to Prevent a Database Meltdown
M.Amin Mashayekhan
M.Amin Mashayekhan

Posted on • Originally published at Medium

Designing a Scalable Logging System for Web Scrapers: How to Prevent a Database Meltdown

Introduction

A web crawler is only as useful as its stability.

When a scraper sends a log every 7 seconds — and dozens or even hundreds of users are using it simultaneously — things can go wrong, fast.

How do you stop:

  • The database from exploding?
  • The system from slowing to work?
  • The support team from drowning in unmanageable logs?

To prevent this, you need a logging architecture that scales.

This article walks through a real-world implementation of a logging system purpose-built for a scalable web scraper, focusing on performance, durability, and developer experience.


Two Hidden Enemies of Real-Time Logging Systems

1- Unbounded Data Growth

Without structure and filtering, logs quickly saturate the database and make analytics almost impossible.

2- Performance Degradation

Poorly designed logging directly impacts frontend responsiveness and backend throughput.


Frontend Logging Strategy (React): Store Less, Show More

In the React frontend, I chose to keep only the latest 50 logs in memory and display them in the UI.

const MAX_LOGS = 50;

const handleLog = useCallback((log: ScrapLog) => {
  setLogs((prev) => [log, ...prev.slice(0, MAX_LOGS - 1)]);
}, []);
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • Users see only fresh, relevant logs
  • DOM and memory stay lightweight
  • Logs can be exported as CSV for support teams

Backend Architecture: Separate the Signals from the Noise

1- Summary Logs

Each scraping session generates one summary record containing:

  • Number of list pages scraped (pagination)
  • Number of product items extracted
  • URL of the first and last list pages scraped (pagination)
  • First and last items extracted
  • Final status (success or failure)
  • Total execution time

Retention: Permanent
Use case: Dashboard analytics and long-term monitoring


2- Fine-Grained Logs (Triggered by Unexpected Errors)

If the scraper encounters an unexpected error (i.e., a type of error not seen in the past 24 hours), the frontend sends the last 50 logs to the server. These include:

  • URLs of visited pages
  • Actions performed
  • Any captured errors

Format: Lightweight, structured JSON
Retention: 7 days (configurable)
Cleanup: Automatically via scheduled job


Smart Cleanup in Laravel + MySQL

Efficient storage isn't enough-you must clean up intelligently.

🔍 Why Indexing Matters

To speed up deletion of old logs, we index the created_at field. This drastically improves the performance of time-based queries.

CREATE INDEX idx_created_at ON scrap_logs (created_at);
Enter fullscreen mode Exit fullscreen mode

🧹 Scheduled Cleanup Job in Laravel

// App\Console\Commands\DeleteOldLogs.php
class DeleteOldLogs extends Command {
    protected $signature = 'logs:cleanup';

    public function handle() {
        ScrapLog::where('created_at', '<', now()->subDays(7))->delete();
        $this->info('Old logs cleaned successfully!');
    }
}
Enter fullscreen mode Exit fullscreen mode

And register it in the scheduler:

protected function schedule(Schedule $schedule) {
    $schedule->command('logs:cleanup')->dailyAt('01:00');
}
Enter fullscreen mode Exit fullscreen mode

Why This Architecture Works

✅ Only essential logs are stored
✅ Tables stay clean and queryable
✅ Debugging becomes painless
✅ Data analysis remains performant


Conclusion

A well-designed logging system isn't just for debugging -it's a critical survival mechanism. With a scalable, performance-conscious architecture, your system can remain:

  • Stable under load
  • Transparent when things go wrong
  • Insightful for business and technical teams
  • User-friendly.

Since implementing this system:

  • Debugging is fast
  • User behavior is easy to analyze
  • We retain full traceability when incidents occur

I hope these insights help you on your journey.


📣 Let's Talk

How do you handle logging in production system?
Share your thoughts in the comments - I'd love to hear your approach. If this article helped, consider clapping 👏 and following for more insights on web development, browser automation, and software engineering.


📬 Get in Touch

Let's build better tools, faster!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.