DEV Community

Cover image for How We Saved 7.6GB and Fixed 99.95% Database Bloat in Our Django Logistics Platform
Syed Muhammad Haris
Syed Muhammad Haris

Posted on

How We Saved 7.6GB and Fixed 99.95% Database Bloat in Our Django Logistics Platform

A Weekend Crisis That Turned Into a Masterclass in Database Optimization

Keywords: Django Simple History, PostgreSQL optimization, database bloat, Django performance tuning, production debugging, ORM optimization, VACUUM FULL, historical records cleanup, freight management system


The Saturday Morning That Changed Everything

It was a regular Saturday morning when our production freight management platform suddenly crashed. No warnings. No obvious errors. Just... down.

Our DigitalOcean managed PostgreSQL database was struggling, CPU at 99.9%, queries timing out, and our logistics operations grinding to a halt. Shipments worth millions were in limbo, and we had no idea why.

This is the story of how we diagnosed and fixed one of the most insidious database problems in Django applications: 99.95% duplicate historical records causing massive database bloat.


The Initial Symptoms

Database Stats That Made Us Panic:

  • Database size: 10.2 GB (and growing)

  • CPU usage: Constant 99.9%

  • Load average: 7.37 (on a 2-core instance)

  • Query timeouts: Every few seconds

  • Application status: Completely down

We knew something was catastrophically wrong, but what?


The Investigation Begins

Step 1: Identifying the Culprit

Running a simple query to check table sizes revealed the smoking gun:

SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS size_bytes
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY size_bytes DESC;
Enter fullscreen mode Exit fullscreen mode

The Results Were Shocking:

Table Total Size Records
historical_shipment_containers 3.7 GB 20.7M
historical_shipment_events 2.2 GB 11.2M
historical_tracking_data 1.7 GB 6.1M
historical_customs_records 763 MB 2.8M

Issue

Total: 38 million historical records consuming around 8GB!

Our actual shipment data? about 1,500 shipments. Something was very, very wrong.


The Root Cause: Django Simple History Gone Wild

We use django-simple-history to track changes to shipment data - essential for freight operations where audit trails are critical. But our implementation had a fatal flaw.

What Went Wrong

Our system syncs shipment data from another service every 20 minutes via Celery background tasks. For each container, we were calling:

# The problematic code
container, created = ShipmentContainer.objects.update_or_create(
    shipment=shipment,
    container_number=container_number,
    defaults={
        'container_type': container_type,
        'gate_out_date': gate_out_date,
        'empty_return_date': empty_return_date,
    }
)
Enter fullscreen mode Exit fullscreen mode

The Hidden Problem:

update_or_create() calls .save() EVERY TIME, even when data hasn't changed!

With django-simple-history installed, every .save() creates a new historical record.

The Math:

  • 150 containers being tracked

  • Synced every 20 minutes

  • 72 syncs per day

  • 10,800 duplicate history records PER DAY

  • Running for months...

Result: 6.8 million duplicate container history records!


The Failed First Attempt

Django Simple History provides a built-in cleanup command:

python manage.py clean_duplicate_history --auto
Enter fullscreen mode Exit fullscreen mode

We tried it. And waited. And waited.

Removed 2432 historical records for ShipmentContainer
Removed 16741 historical records for ShipmentContainer
Removed 7877 historical records for ShipmentContainer
...
(1 hour 43 minutes later, still running)
Enter fullscreen mode Exit fullscreen mode

Locally, this would take days. In production? Unacceptable.

The command processes records one-by-one, which is fine for thousands of duplicates. For millions? We needed a different approach.


The SQL Solution: PostgreSQL Window Functions to the Rescue

We needed something faster. Much faster. Enter: PostgreSQL window functions with batched deletion.

The Strategy

Instead of processing records one-by-one, we used SQL to identify and delete duplicates in batches:

-- Identify duplicates using window functions
WITH ranked_history AS (
    SELECT
        history_id,
        ROW_NUMBER() OVER (
            PARTITION BY shipment_id, container_number, container_type,
                         gate_out_date, empty_return_date
            ORDER BY history_date DESC
        ) as rn
    FROM historical_shipment_containers
)
DELETE FROM historical_shipment_containers
WHERE history_id IN (
    SELECT history_id FROM ranked_history WHERE rn > 1
);
Enter fullscreen mode Exit fullscreen mode

This query:

  1. Groups identical records by all tracked fields

  2. Numbers them chronologically (newest = 1)

  3. Deletes everything except the newest record

Time to delete 6.8M records: ~2 hours (vs. days with the Django command)

How Window Functions Work

For those unfamiliar, ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) is like saying:

"Group records by these fields, then number each record within its group by date, keeping the newest as #1"

This lets us delete all records numbered 2, 3, 4, etc. (the duplicates) while keeping #1 (the most recent).


The Space Reclamation: VACUUM FULL

After deleting millions of rows, we discovered another gotcha: PostgreSQL doesn't automatically reclaim disk space!

The deleted rows were marked as "dead tuples" but still occupying disk space.

The Solution: VACUUM FULL

VACUUM FULL historical_tracking_data;
VACUUM FULL historical_shipment_containers;
VACUUM FULL historical_shipment_events;
Enter fullscreen mode Exit fullscreen mode

⚠️ Warning: VACUUM FULL locks the table during operation. We put the application in maintenance mode for 15 minutes.

The Results:

Table Before After Space Saved
Milestone 1.7 GB 13 MB 99.94%
Containers 3.7 GB 1.6 MB 99.97%
Events 2.2 GB 13 MB 99.41%

Success

Total Saved: 7.6 GB


Preventing Future Bloat: The Real Fix

Cleaning the database was only half the battle. We needed to prevent this from happening again.

Problem 1: Unnecessary Saves in update_or_create()

Before:

# This saves EVERY TIME, even if nothing changed!
container, created = ShipmentContainer.objects.update_or_create(
    shipment=shipment,
    container_number=container_number,
    defaults={
        'container_type': container_type,
        'gate_out_date': gate_out_date,
    }
)
Enter fullscreen mode Exit fullscreen mode

After:

def smart_update_container(shipment, container_data):
    """
    Only save if fields have actually changed.
    Prevents unnecessary history records.
    """
    try:
        container = ShipmentContainer.objects.get(
            shipment=shipment,
            container_number=container_data['number']
        )

        # Check if ANY field has actually changed
        needs_update = False

        if container.container_type != container_data['type']:
            container.container_type = container_data['type']
            needs_update = True

        # ONLY save if something actually changed
        if needs_update:
            container.save()

        return container
Enter fullscreen mode Exit fullscreen mode

Impact: Reduced saves by 99%! API returns the same data 99% of the time, so only 1% of syncs actually need to save.


Problem 2: Type Mismatches Causing False Changes

We discovered the tracking API returns status codes as integers (50), but our database stores them as strings ("50").

# API returns: {"status_code": 50}
# Database has: "50"

# This comparison ALWAYS fails!
if tracking.status_code != api_data.get("status_code"):  # 50 != "50" → True!
    tracking.status_code = api_data.get("status_code")
    tracking.save()  # Unnecessary save every time!
Enter fullscreen mode Exit fullscreen mode

Fix:

# Always convert to the same type before comparing
new_status = str(api_data.get("status_code")) if api_data.get("status_code") is not None else None

if tracking.status_code != new_status:
    tracking.status_code = new_status
    needs_update = True
Enter fullscreen mode Exit fullscreen mode

Lesson: API integrations often return different types than your database stores. Always normalize types before comparison!


Problem 3: Timezone-Aware vs Timezone-Naive Datetime Comparison

Our date parsing function was returning timezone-naive datetimes, but PostgreSQL stores them as timezone-aware with Django's USE_TZ=True.

# API returns: "2025-08-22"
# Our parser created: datetime(2025, 8, 22, 0, 0, 0)  # Naive!
# Database stores: datetime(2025, 8, 22, 0, 0, 0, tzinfo=UTC)  # Aware!

# These are NEVER equal, triggering saves every sync!
Enter fullscreen mode Exit fullscreen mode

This caused Django to log warnings:

RuntimeWarning: DateTimeField received a naive datetime while time zone support is active.
Enter fullscreen mode Exit fullscreen mode

Fix:

def parse_api_date(date_value):
    """Parse date from API and return timezone-aware datetime."""
    from django.utils import timezone as tz

    if isinstance(date_value, dict) and date_value.get("Date"):
        date_str = date_value["Date"]
    elif isinstance(date_value, str):
        date_str = date_value
    else:
        return None

    try:
        # Try ISO format first
        dt = parser.isoparse(date_str)
        if dt.tzinfo is None:
            dt = tz.make_aware(dt, tz.get_default_timezone())
        return dt
    except ValueError:
        # Fall back to simple date format
        try:
            date_obj = datetime.strptime(date_str, "%Y-%m-%d")
            # Convert to timezone-aware datetime
            return tz.make_aware(date_obj, tz.get_default_timezone())
        except ValueError:
            return None
Enter fullscreen mode Exit fullscreen mode

Impact: Eliminated all timezone-related false changes and warnings!


Problem 4: Latitude/Longitude Causing Perpetual History Bloat

For shipment tracking, we store GPS coordinates showing where vessels are. These coordinates change constantly as ships move across the ocean - sometimes every few minutes!

We were tracking every single position change in history, which was pointless. We don't need an audit trail of "vessel moved 0.5 nautical miles" thousands of times.

Solution: Exclude frequently-changing, non-critical fields from history tracking.

class ShipmentEvent(models.Model):
    # ... fields ...
    latitude = models.DecimalField(max_digits=30, decimal_places=15, ...)
    longitude = models.DecimalField(max_digits=30, decimal_places=15, ...)

    # Exclude lat/long from history - they change too frequently!
    # We still store current position, just don't track every change
    history = HistoricalRecords(excluded_fields=['latitude', 'longitude'])
Enter fullscreen mode Exit fullscreen mode

Impact: Prevents thousands of unnecessary history records as vessels sail!


The Final Results

Before

  • Database size: 10.2 GB

  • Historical records: 28 million

  • Duplicate records: 24.2 million (99.95%)

  • CPU usage: 99.9% constant

  • Application status: Down

  • Sync performance: Timing out

  • Query response time: 2-5 seconds

After

  • Database size: 1.6 GB (saved 8.6 GB)

  • Historical records: 1.8 million

  • Duplicate records: 0

  • CPU usage: <10% average

  • Application status: Stable

  • Sync performance: Sub-second

  • Query response time: 50-200ms

10x performance improvement across the board!


Key Lessons Learned

1. Monitor Your Historical Tables

Historical tables can silently grow to dwarf your actual data. Set up monitoring:

-- Weekly table size check
SELECT
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    (SELECT COUNT(*) FROM tablename) as row_count
FROM pg_tables
WHERE schemaname = 'public' AND tablename LIKE '%historical%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Enter fullscreen mode Exit fullscreen mode

Add this to your weekly ops checklist or set up automated alerts when historical tables exceed expected sizes.

2. Never Blindly Use update_or_create() with History Tracking

Always check if fields have actually changed before saving:

# Bad - saves every time
obj, created = Model.objects.update_or_create(pk=1, defaults={'field': value})

# Good - only saves when changed
obj = Model.objects.get(pk=1)
if obj.field != new_value:
    obj.field = new_value
    obj.save()
Enter fullscreen mode Exit fullscreen mode

3. Type Consistency Matters in API Integrations

API integrations often return different types than your database stores. Always normalize:

# Convert types before comparison
api_value = str(api_response.get("field"))  # Ensure string
db_value = str(obj.field)  # Ensure string

if api_value != db_value:
    obj.field = api_value
    obj.save()
Enter fullscreen mode Exit fullscreen mode

4. Timezone Awareness is Critical

With USE_TZ=True in Django settings, always return timezone-aware datetimes from parsers:

from django.utils import timezone as tz

dt = datetime.strptime(date_str, "%Y-%m-%d")
dt = tz.make_aware(dt, tz.get_default_timezone())  # Make aware!
Enter fullscreen mode Exit fullscreen mode

5. Not Everything Needs History

Exclude frequently-changing non-critical fields from history:

# Good candidates for exclusion:
# - updated_at timestamps
# - last_sync timestamps
# - GPS coordinates
# - view counts
# - Any high-frequency changing data

history = HistoricalRecords(
    excluded_fields=['updated_at', 'last_sync', 'latitude', 'longitude']
)
Enter fullscreen mode Exit fullscreen mode

6. PostgreSQL Window Functions Are Powerful

For bulk operations on millions of rows, SQL outperforms ORM by orders of magnitude.

Django ORM is great for application logic, but for data cleanup, bulk updates, and analytics, raw SQL with window functions is often 100x+ faster.

7. VACUUM FULL After Large Deletions

Don't forget to reclaim disk space after deleting millions of rows!

-- Regular VACUUM (non-blocking, safe for production)
VACUUM ANALYZE table_name;

-- VACUUM FULL (blocking, use in maintenance window)
VACUUM FULL table_name;
Enter fullscreen mode Exit fullscreen mode

Regular VACUUM is safe to run anytime. VACUUM FULL requires downtime but reclaims more space.


Tools and Technologies Used

  • Django 5.0 with django-simple-history

  • PostgreSQL 15 (DigitalOcean Managed Database)

  • Celery for background task processing

  • Redis as Celery broker

  • pgAdmin 4 for database management

  • DigitalOcean infrastructure and monitoring

  • Python 3.12


Performance Impact

Before the Fix:

  • Database queries: 2-5 seconds average

  • Background sync tasks: 45-60 seconds

  • Admin page loads: 8-12 seconds

  • API response times: 1-3 seconds

  • CPU: Constant 99.9%

  • Database connections: Maxed out

After the Fix:

  • Database queries: 50-200ms average

  • Background sync tasks: 1-2 seconds

  • Admin page loads: 500ms-1s

  • API response times: 100-300ms

  • CPU: 5-10% average

  • Database connections: Healthy pool

10x performance improvement across the board!


Monitoring and Maintenance

We now run weekly checks to catch any regression early:

-- Check for duplicate accumulation
WITH ranked AS (
    SELECT
        COUNT(*) as total,
        COUNT(DISTINCT (shipment_id, container_number, container_type, gate_out_date)) as unique_records
    FROM historical_shipment_containers
)
SELECT
    total,
    unique_records,
    (total - unique_records) as duplicates,
    ROUND(100.0 * (total - unique_duplicates) / total, 2) as duplicate_percentage
FROM ranked;
Enter fullscreen mode Exit fullscreen mode

Healthy Result:

total: 13,812
unique_records: 13,802
duplicates: 10
duplicate_percentage: 0.07%
Enter fullscreen mode Exit fullscreen mode

A small number of duplicates (< 1%) is normal due to legitimate updates. If this crosses 5%, we investigate.


The Human Element

This incident happened on a Saturday morning. Our team:

  • Immediately put the platform in maintenance mode

  • Communicated transparently with stakeholders

  • Debugged collaboratively (huge props to the team!)

  • Implemented fixes incrementally

  • Tested thoroughly before reopening

  • Documented everything for future reference

Time to resolution: 14 hours (Saturday 7 AM → Saturday 9 PM) Total downtime: 2 hours (maintenance mode during VACUUM operations)

We follow agile practices, which helped us:

  • Prioritize fixes (clean data first, prevent recurrence second)

  • Test in small increments

  • Roll back quickly if needed

  • Collaborate effectively under pressure


Conclusion

What started as a catastrophic Saturday morning crash turned into a masterclass in:

  • Database optimization

  • Django ORM performance tuning

  • PostgreSQL internals

  • Production debugging under pressure

  • Collaborative problem-solving

Key Takeaway: The most dangerous bugs are the ones that accumulate silently over months. Regular monitoring, performance testing, and database audits aren't optional - they're essential.

Our freight management platform is now faster, leaner, and ready to scale. And we learned valuable lessons that will inform our engineering practices for years to come.

If you're using django-simple-history (or any audit logging system), review your update patterns today. You might be sitting on a ticking time bomb!


Additional Resources


About the Author: Software Engineer at Maalbardaar, building scalable freight management solutions. Passionate about database optimization, Django performance, and solving real-world logistics challenges.

Tech Stack: Django, PostgreSQL, Celery, Docker, DigitalOcean, React

Follow for more: Production war stories, Django tips, and database optimization techniques.


Have you faced similar database bloat issues? How did you solve them? Share your story in the comments!

Django #PostgreSQL #DatabaseOptimization #SoftwareEngineering #ProductionDebugging #PerformanceTuning #FreightTech #LogisticsTechnology #WebDevelopment #BackendEngineering #Python #DevOps

Top comments (0)