<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Syed Muhammad Haris</title>
    <description>The latest articles on DEV Community by Syed Muhammad Haris (@syedharis).</description>
    <link>https://dev.to/syedharis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1052894%2Fc0f6f9b7-58f9-4849-8f21-1f6503865ddd.png</url>
      <title>DEV Community: Syed Muhammad Haris</title>
      <link>https://dev.to/syedharis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/syedharis"/>
    <language>en</language>
    <item>
      <title>How We Saved 7.6GB and Fixed 99.95% Database Bloat in Our Django Logistics Platform</title>
      <dc:creator>Syed Muhammad Haris</dc:creator>
      <pubDate>Sun, 19 Oct 2025 11:32:11 +0000</pubDate>
      <link>https://dev.to/syedharis/how-we-saved-76gb-and-fixed-9995-database-bloat-in-our-django-logistics-platform-4e84</link>
      <guid>https://dev.to/syedharis/how-we-saved-76gb-and-fixed-9995-database-bloat-in-our-django-logistics-platform-4e84</guid>
      <description>&lt;p&gt;&lt;strong&gt;A Weekend Crisis That Turned Into a Masterclass in Database Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Keywords: Django Simple History, PostgreSQL optimization, database bloat, Django performance tuning, production debugging, ORM optimization, VACUUM FULL, historical records cleanup, freight management system&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Saturday Morning That Changed Everything
&lt;/h2&gt;

&lt;p&gt;It was a regular Saturday morning when our production freight management platform suddenly crashed. No warnings. No obvious errors. Just... down.&lt;/p&gt;

&lt;p&gt;Our DigitalOcean managed PostgreSQL database was struggling, CPU at 99.9%, queries timing out, and our logistics operations grinding to a halt. Shipments worth millions were in limbo, and we had no idea why.&lt;/p&gt;

&lt;p&gt;This is the story of how we diagnosed and fixed one of the most insidious database problems in Django applications: &lt;strong&gt;99.95% duplicate historical records&lt;/strong&gt; causing massive database bloat.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Initial Symptoms
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Database Stats That Made Us Panic:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Database size: &lt;strong&gt;10.2 GB&lt;/strong&gt; (and growing)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU usage: &lt;strong&gt;Constant 99.9%&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Load average: &lt;strong&gt;7.37&lt;/strong&gt; (on a 2-core instance)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Query timeouts: Every few seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Application status: &lt;strong&gt;Completely down&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We knew something was catastrophically wrong, but &lt;em&gt;what&lt;/em&gt;?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Investigation Begins
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Identifying the Culprit
&lt;/h3&gt;

&lt;p&gt;Running a simple query to check table sizes revealed the smoking gun:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_total_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;table_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_total_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pg_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;size_bytes&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;schemaname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'public'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;size_bytes&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Results Were Shocking:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;Total Size&lt;/th&gt;
&lt;th&gt;Records&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;historical_shipment_containers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20.7M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;historical_shipment_events&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.2 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11.2M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;historical_tracking_data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6.1M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;historical_customs_records&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;763 MB&lt;/td&gt;
&lt;td&gt;2.8M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvw0purytvov53go6z0e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvw0purytvov53go6z0e.png" alt="Issue" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total: 38 million historical records consuming around 8GB!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our actual shipment data? about 1,500 shipments. Something was very, very wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Root Cause: Django Simple History Gone Wild
&lt;/h2&gt;

&lt;p&gt;We use &lt;code&gt;django-simple-history&lt;/code&gt; to track changes to shipment data - essential for freight operations where audit trails are critical. But our implementation had a fatal flaw.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Went Wrong
&lt;/h3&gt;

&lt;p&gt;Our system syncs shipment data from another service every &lt;strong&gt;20 minutes&lt;/strong&gt; via Celery background tasks. For each container, we were calling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The problematic code
&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ShipmentContainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_or_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;defaults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;container_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;container_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gate_out_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gate_out_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;empty_return_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;empty_return_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Hidden Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;update_or_create()&lt;/code&gt; calls &lt;code&gt;.save()&lt;/code&gt; &lt;strong&gt;EVERY TIME&lt;/strong&gt;, even when data hasn't changed!&lt;/p&gt;

&lt;p&gt;With django-simple-history installed, every &lt;code&gt;.save()&lt;/code&gt; creates a new historical record.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Math:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;150 containers being tracked&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Synced every 20 minutes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;72 syncs per day&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;10,800 duplicate history records PER DAY&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Running for months...&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result: 6.8 million duplicate container history records!&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Failed First Attempt
&lt;/h2&gt;

&lt;p&gt;Django Simple History provides a built-in cleanup command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python manage.py clean_duplicate_history &lt;span class="nt"&gt;--auto&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We tried it. And waited. And waited.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Removed 2432 historical records for ShipmentContainer
Removed 16741 historical records for ShipmentContainer
Removed 7877 historical records for ShipmentContainer
...
(1 hour 43 minutes later, still running)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Locally, this would take days. In production? Unacceptable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The command processes records one-by-one, which is fine for thousands of duplicates. For millions? We needed a different approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  The SQL Solution: PostgreSQL Window Functions to the Rescue
&lt;/h2&gt;

&lt;p&gt;We needed something faster. Much faster. Enter: &lt;strong&gt;PostgreSQL window functions with batched deletion&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Strategy
&lt;/h3&gt;

&lt;p&gt;Instead of processing records one-by-one, we used SQL to identify and delete duplicates in batches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Identify duplicates using window functions&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked_history&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;history_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;shipment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                         &lt;span class="n"&gt;gate_out_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;empty_return_date&lt;/span&gt;
            &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;history_date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rn&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;historical_shipment_containers&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;historical_shipment_containers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;history_id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;history_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked_history&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;rn&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This query:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Groups identical records by all tracked fields&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Numbers them chronologically (newest = 1)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deletes everything except the newest record&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Time to delete 6.8M records: ~2 hours&lt;/strong&gt; (vs. days with the Django command)&lt;/p&gt;

&lt;h3&gt;
  
  
  How Window Functions Work
&lt;/h3&gt;

&lt;p&gt;For those unfamiliar, &lt;code&gt;ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)&lt;/code&gt; is like saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Group records by these fields, then number each record within its group by date, keeping the newest as #1"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This lets us delete all records numbered 2, 3, 4, etc. (the duplicates) while keeping #1 (the most recent).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Space Reclamation: VACUUM FULL
&lt;/h2&gt;

&lt;p&gt;After deleting millions of rows, we discovered another gotcha: &lt;strong&gt;PostgreSQL doesn't automatically reclaim disk space!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The deleted rows were marked as "dead tuples" but still occupying disk space.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: VACUUM FULL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="n"&gt;historical_tracking_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="n"&gt;historical_shipment_containers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="n"&gt;historical_shipment_events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;⚠️ Warning:&lt;/strong&gt; &lt;code&gt;VACUUM FULL&lt;/code&gt; locks the table during operation. We put the application in maintenance mode for 15 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Results:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Space Saved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Milestone&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;13 MB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.94%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Containers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.6 MB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.97%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Events&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.2 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;13 MB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.41%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o6ul2gw0moqfwgfkxir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3o6ul2gw0moqfwgfkxir.png" alt="Success" width="783" height="575"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total Saved: 7.6 GB&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Preventing Future Bloat: The Real Fix
&lt;/h2&gt;

&lt;p&gt;Cleaning the database was only half the battle. We needed to &lt;strong&gt;prevent this from happening again&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 1: Unnecessary Saves in update_or_create()
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This saves EVERY TIME, even if nothing changed!
&lt;/span&gt;&lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ShipmentContainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_or_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;defaults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;container_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;container_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gate_out_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gate_out_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_update_container&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Only save if fields have actually changed.
    Prevents unnecessary history records.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ShipmentContainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shipment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;container_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Check if ANY field has actually changed
&lt;/span&gt;        &lt;span class="n"&gt;needs_update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;container_type&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;container_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;container_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;container_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;needs_update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# ONLY save if something actually changed
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;needs_update&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; Reduced saves by &lt;strong&gt;99%&lt;/strong&gt;! API returns the same data 99% of the time, so only 1% of syncs actually need to save.&lt;/p&gt;




&lt;h3&gt;
  
  
  Problem 2: Type Mismatches Causing False Changes
&lt;/h3&gt;

&lt;p&gt;We discovered the tracking API returns status codes as &lt;strong&gt;integers&lt;/strong&gt; (50), but our database stores them as &lt;strong&gt;strings&lt;/strong&gt; ("50").&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# API returns: {"status_code": 50}
# Database has: "50"
&lt;/span&gt;
&lt;span class="c1"&gt;# This comparison ALWAYS fails!
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;api_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# 50 != "50" → True!
&lt;/span&gt;    &lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Unnecessary save every time!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Always convert to the same type before comparing
&lt;/span&gt;&lt;span class="n"&gt;new_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;new_status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tracking&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_status&lt;/span&gt;
    &lt;span class="n"&gt;needs_update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; API integrations often return different types than your database stores. Always normalize types before comparison!&lt;/p&gt;




&lt;h3&gt;
  
  
  Problem 3: Timezone-Aware vs Timezone-Naive Datetime Comparison
&lt;/h3&gt;

&lt;p&gt;Our date parsing function was returning &lt;strong&gt;timezone-naive&lt;/strong&gt; datetimes, but PostgreSQL stores them as &lt;strong&gt;timezone-aware&lt;/strong&gt; with Django's &lt;code&gt;USE_TZ=True&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# API returns: "2025-08-22"
# Our parser created: datetime(2025, 8, 22, 0, 0, 0)  # Naive!
# Database stores: datetime(2025, 8, 22, 0, 0, 0, tzinfo=UTC)  # Aware!
&lt;/span&gt;
&lt;span class="c1"&gt;# These are NEVER equal, triggering saves every sync!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This caused Django to log warnings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RuntimeWarning: DateTimeField received a naive datetime while time zone support is active.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_api_date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Parse date from API and return timezone-aware datetime.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;date_value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;date_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;date_value&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;date_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;date_value&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Try ISO format first
&lt;/span&gt;        &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isoparse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tzinfo&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;make_aware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_default_timezone&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Fall back to simple date format
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;date_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strptime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Convert to timezone-aware datetime
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;make_aware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_default_timezone&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; Eliminated all timezone-related false changes and warnings!&lt;/p&gt;




&lt;h3&gt;
  
  
  Problem 4: Latitude/Longitude Causing Perpetual History Bloat
&lt;/h3&gt;

&lt;p&gt;For shipment tracking, we store GPS coordinates showing where vessels are. These coordinates change &lt;strong&gt;constantly&lt;/strong&gt; as ships move across the ocean - sometimes every few minutes!&lt;/p&gt;

&lt;p&gt;We were tracking every single position change in history, which was pointless. We don't need an audit trail of "vessel moved 0.5 nautical miles" thousands of times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Exclude frequently-changing, non-critical fields from history tracking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ShipmentEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ... fields ...
&lt;/span&gt;    &lt;span class="n"&gt;latitude&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DecimalField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_digits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decimal_places&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
    &lt;span class="n"&gt;longitude&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DecimalField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_digits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decimal_places&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;

    &lt;span class="c1"&gt;# Exclude lat/long from history - they change too frequently!
&lt;/span&gt;    &lt;span class="c1"&gt;# We still store current position, just don't track every change
&lt;/span&gt;    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HistoricalRecords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;excluded_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;latitude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;longitude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; Prevents thousands of unnecessary history records as vessels sail!&lt;/p&gt;




&lt;h2&gt;
  
  
  The Final Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Database size: &lt;strong&gt;10.2 GB&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Historical records: &lt;strong&gt;28 million&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Duplicate records: &lt;strong&gt;24.2 million (99.95%)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU usage: &lt;strong&gt;99.9% constant&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Application status: &lt;strong&gt;Down&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sync performance: &lt;strong&gt;Timing out&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Query response time: &lt;strong&gt;2-5 seconds&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  After
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Database size: &lt;strong&gt;1.6 GB&lt;/strong&gt; (saved 8.6 GB)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Historical records: &lt;strong&gt;1.8 million&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Duplicate records: &lt;strong&gt;0&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU usage: &lt;strong&gt;&amp;lt;10% average&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Application status: &lt;strong&gt;Stable&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sync performance: &lt;strong&gt;Sub-second&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Query response time: &lt;strong&gt;50-200ms&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;10x performance improvement across the board!&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Monitor Your Historical Tables&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Historical tables can silently grow to dwarf your actual data. Set up monitoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Weekly table size check&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_total_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;row_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;schemaname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'public'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%historical%'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;pg_total_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schemaname&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add this to your weekly ops checklist or set up automated alerts when historical tables exceed expected sizes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Never Blindly Use update_or_create() with History Tracking&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Always check if fields have actually changed before saving:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bad - saves every time
&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_or_create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;defaults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;field&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Good - only saves when changed
&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_value&lt;/span&gt;
    &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. &lt;strong&gt;Type Consistency Matters in API Integrations&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;API integrations often return different types than your database stores. Always normalize:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Convert types before comparison
&lt;/span&gt;&lt;span class="n"&gt;api_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;field&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Ensure string
&lt;/span&gt;&lt;span class="n"&gt;db_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Ensure string
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_value&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;db_value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api_value&lt;/span&gt;
    &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. &lt;strong&gt;Timezone Awareness is Critical&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With &lt;code&gt;USE_TZ=True&lt;/code&gt; in Django settings, always return timezone-aware datetimes from parsers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;

&lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strptime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;make_aware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_default_timezone&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# Make aware!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. &lt;strong&gt;Not Everything Needs History&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Exclude frequently-changing non-critical fields from history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Good candidates for exclusion:
# - updated_at timestamps
# - last_sync timestamps
# - GPS coordinates
# - view counts
# - Any high-frequency changing data
&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HistoricalRecords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;excluded_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;updated_at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;last_sync&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;latitude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;longitude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. &lt;strong&gt;PostgreSQL Window Functions Are Powerful&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For bulk operations on millions of rows, SQL outperforms ORM by orders of magnitude.&lt;/p&gt;

&lt;p&gt;Django ORM is great for application logic, but for data cleanup, bulk updates, and analytics, raw SQL with window functions is often 100x+ faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. &lt;strong&gt;VACUUM FULL After Large Deletions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Don't forget to reclaim disk space after deleting millions of rows!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Regular VACUUM (non-blocking, safe for production)&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- VACUUM FULL (blocking, use in maintenance window)&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;FULL&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Regular VACUUM is safe to run anytime. VACUUM FULL requires downtime but reclaims more space.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools and Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Django 5.0&lt;/strong&gt; with django-simple-history&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostgreSQL 15&lt;/strong&gt; (DigitalOcean Managed Database)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Celery&lt;/strong&gt; for background task processing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis&lt;/strong&gt; as Celery broker&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;pgAdmin 4&lt;/strong&gt; for database management&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DigitalOcean&lt;/strong&gt; infrastructure and monitoring&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Python 3.12&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance Impact
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before the Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Database queries: 2-5 seconds average&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Background sync tasks: 45-60 seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Admin page loads: 8-12 seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API response times: 1-3 seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU: Constant 99.9%&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database connections: Maxed out&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;After the Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Database queries: 50-200ms average&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Background sync tasks: 1-2 seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Admin page loads: 500ms-1s&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API response times: 100-300ms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CPU: 5-10% average&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Database connections: Healthy pool&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;10x performance improvement across the board!&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Monitoring and Maintenance
&lt;/h2&gt;

&lt;p&gt;We now run weekly checks to catch any regression early:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check for duplicate accumulation&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shipment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gate_out_date&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;unique_records&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;historical_shipment_containers&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;unique_records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;unique_records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;duplicates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;unique_duplicates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;duplicate_percentage&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Healthy Result:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;total: 13,812
unique_records: 13,802
duplicates: 10
duplicate_percentage: 0.07%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small number of duplicates (&amp;lt; 1%) is normal due to legitimate updates. If this crosses 5%, we investigate.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Human Element
&lt;/h2&gt;

&lt;p&gt;This incident happened on a &lt;strong&gt;Saturday morning&lt;/strong&gt;. Our team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Immediately put the platform in maintenance mode&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Communicated transparently with stakeholders&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Debugged collaboratively (huge props to the team!)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implemented fixes incrementally&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tested thoroughly before reopening&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documented everything for future reference&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Time to resolution:&lt;/strong&gt; 14 hours (Saturday 7 AM → Saturday 9 PM) &lt;strong&gt;Total downtime:&lt;/strong&gt; 2 hours (maintenance mode during VACUUM operations)&lt;/p&gt;

&lt;p&gt;We follow agile practices, which helped us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Prioritize fixes (clean data first, prevent recurrence second)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test in small increments&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Roll back quickly if needed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Collaborate effectively under pressure&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;What started as a catastrophic Saturday morning crash turned into a masterclass in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Database optimization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Django ORM performance tuning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PostgreSQL internals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Production debugging under pressure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Collaborative problem-solving&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt; The most dangerous bugs are the ones that accumulate silently over months. Regular monitoring, performance testing, and database audits aren't optional - they're essential.&lt;/p&gt;

&lt;p&gt;Our freight management platform is now faster, leaner, and ready to scale. And we learned valuable lessons that will inform our engineering practices for years to come.&lt;/p&gt;

&lt;p&gt;If you're using django-simple-history (or any audit logging system), review your update patterns today. You might be sitting on a ticking time bomb!&lt;/p&gt;




&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://django-simple-history.readthedocs.io/" rel="noopener noreferrer"&gt;Django Simple History Documentation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/sql-vacuum.html" rel="noopener noreferrer"&gt;PostgreSQL VACUUM Documentation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/tutorial-window.html" rel="noopener noreferrer"&gt;Window Functions in PostgreSQL&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://docs.djangoproject.com/en/stable/topics/i18n/timezones/" rel="noopener noreferrer"&gt;Django Timezone Handling Best Practices&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/performance-tips.html" rel="noopener noreferrer"&gt;PostgreSQL Performance Optimization&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;About the Author:&lt;/strong&gt; Software Engineer at Maalbardaar, building scalable freight management solutions. Passionate about database optimization, Django performance, and solving real-world logistics challenges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack:&lt;/strong&gt; Django, PostgreSQL, Celery, Docker, DigitalOcean, React&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Follow for more:&lt;/strong&gt; Production war stories, Django tips, and database optimization techniques.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you faced similar database bloat issues? How did you solve them? Share your story in the comments!&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Django #PostgreSQL #DatabaseOptimization #SoftwareEngineering #ProductionDebugging #PerformanceTuning #FreightTech #LogisticsTechnology #WebDevelopment #BackendEngineering #Python #DevOps
&lt;/h1&gt;

</description>
      <category>programming</category>
      <category>python</category>
      <category>django</category>
      <category>postgres</category>
    </item>
    <item>
      <title>What is google's nano banana and how it solved the biggest problem in ai image editing</title>
      <dc:creator>Syed Muhammad Haris</dc:creator>
      <pubDate>Thu, 28 Aug 2025 22:00:02 +0000</pubDate>
      <link>https://dev.to/syedharis/what-is-googles-nano-banana-and-how-it-solved-the-biggest-problem-in-ai-image-editing-28ih</link>
      <guid>https://dev.to/syedharis/what-is-googles-nano-banana-and-how-it-solved-the-biggest-problem-in-ai-image-editing-28ih</guid>
      <description>&lt;p&gt;so google just dropped nano banana (officially called gemini 2.5 flash image) and it actually fixes the thing that makes most ai photo editing tools useless - character consistency.&lt;br&gt;
here's the problem: you upload a photo, ask the ai to "change my shirt color," and boom - suddenly you don't look like yourself anymore. face is different, lighting's off, whole vibe is wrong. makes the tools feel like party tricks instead of actual useful software.&lt;/p&gt;

&lt;p&gt;nano banana solved this shit, and if you're building anything with visual content, this matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  what actually makes it different
&lt;/h2&gt;

&lt;p&gt;nano banana (technically gemini 2.5 flash image) has unrivaled consistency across edits - it can actually remember the details instead of rolling the dice every time you make a change.&lt;br&gt;
while other tools spin for 10–15 seconds per image, nano banana often responds in 1–2 seconds. sometimes even faster. it feels like working in real time, not batch mode.&lt;/p&gt;

&lt;p&gt;people have been going bananas over it already in early previews - it's the top-rated image editing model in the world.&lt;/p&gt;

&lt;h2&gt;
  
  
  a fun way to use it
&lt;/h2&gt;

&lt;p&gt;i have this original image below of me smiling. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lia3ugij3nwk7inox1o.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lia3ugij3nwk7inox1o.jpeg" alt=" " width="800" height="981"&gt;&lt;/a&gt;&lt;br&gt;
i will tell google nano banana to edit something like this, prompt: Make a fun post as google nano banana is launched and edit my pic so that Im holding a big big banana in my hand above my head, banana should have google nano banana written on it. also In the img my hands are closed you can closely fix them, only 1 hand should hold the banana on top of my head&lt;/p&gt;

&lt;p&gt;here is what i get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgq9khonp4fik8srr2w4o.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgq9khonp4fik8srr2w4o.jpeg" alt=" " width="800" height="965"&gt;&lt;/a&gt;&lt;br&gt;
AI GeneratedPretty cool isn't &lt;/p&gt;

&lt;h2&gt;
  
  
  stuff you can actually do with it
&lt;/h2&gt;

&lt;p&gt;consistent character placement: put the same person/product in different backgrounds, outfits, scenarios without losing their identity. perfect for social content, marketing campaigns, product shots.&lt;/p&gt;

&lt;p&gt;multi-image fusion: combine different images into one seamless new visual. you can use multiple reference images to create a single, unified image for use cases such as marketing, training, or advertising.&lt;/p&gt;

&lt;p&gt;precision editing: targeted transformation and precise local edits with natural language. for example, the model can blur the background of an image, remove a stain in a t-shirt, remove an entire person from a photo, alter a subject's pose, add color to a black and white photo.&lt;/p&gt;

&lt;p&gt;multi-turn editing: keep making changes and it maintains consistency throughout. paint a room, add furniture, change lighting - all while keeping everything coherent.&lt;/p&gt;

&lt;h2&gt;
  
  
  why this matters if you're building stuff
&lt;/h2&gt;

&lt;p&gt;if you're working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;social apps: users can place themselves consistently across different posts/scenarios&lt;/li&gt;
&lt;li&gt;e-commerce: customers visualize products in their spaces without quality loss&lt;/li&gt;
&lt;li&gt;marketing tools: maintain brand character consistency across campaigns&lt;/li&gt;
&lt;li&gt;content creation: generate consistent visual stories, comics, tutorials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;with gemini 2.5 flash image, the model benefits from gemini's world knowledge, which unlocks new use cases. it's not just prettier pictures - it actually understands context and handles complex instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  how to start using it
&lt;/h2&gt;

&lt;p&gt;test it first: try it in the gemini app (free 100 edits/day) to see if it handles your use cases&lt;br&gt;
for developers: hit the gemini api or google ai studio to integrate&lt;br&gt;
enterprise: vertex ai has the enterprise-grade access&lt;/p&gt;

&lt;p&gt;all images created or edited with gemini 2.5 flash image will include an invisible synthid digital watermark, so they can be identified as ai-generated or edited.&lt;br&gt;
real talk&lt;/p&gt;

&lt;p&gt;this isn't just another ai image tool - it's the first one that actually works reliably for real use cases. nano banana, if it really is from google, marks a shift. it's not about just generating pretty images. it's about replacing the entire workflow of editing.&lt;/p&gt;

&lt;p&gt;if you've been waiting for ai image editing to be good enough for production, this might be it.&lt;/p&gt;

&lt;p&gt;building something that needs reliable image editing? i help founders implement ai tools and scale technical products from idea to launch. if you're considering integrating visual ai or need someone who's navigated these technical challenges, &lt;br&gt;
let's talk.&lt;/p&gt;

&lt;p&gt;&lt;a href="mailto:smharisofficial@gmail.com"&gt;smharisofficial@gmail.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>nanobanana</category>
    </item>
    <item>
      <title>how infinite scrolling on social media actually works (and why most devs get it wrong) -2025</title>
      <dc:creator>Syed Muhammad Haris</dc:creator>
      <pubDate>Mon, 25 Aug 2025 15:54:40 +0000</pubDate>
      <link>https://dev.to/syedharis/how-infinite-scrolling-on-social-media-actually-works-and-why-most-devs-get-it-wrong-2025-m9i</link>
      <guid>https://dev.to/syedharis/how-infinite-scrolling-on-social-media-actually-works-and-why-most-devs-get-it-wrong-2025-m9i</guid>
      <description>&lt;p&gt;I was scrolling twitter at 2am last night (as one does when you should be sleeping), and i started wondering... how the fk does this thing just keep going forever?&lt;/p&gt;

&lt;p&gt;like seriously, i've been scrolling for what feels like hours, millions of people are posting every second, and somehow i never see the same tweet twice, never get a "loading error", never hit some weird gap where posts are missing. it just... works?&lt;/p&gt;

&lt;p&gt;turns out there's some actually clever engineering behind this that most people (including me until recently) have no clue about, and it's one of those things that separates real production systems from the mvp prototypes most of us start with.&lt;/p&gt;

&lt;h2&gt;
  
  
  the obvious approach (that doesn't work)
&lt;/h2&gt;

&lt;p&gt;when you're building your first app with infinite scrolling, you probably do what i did - offset pagination. it's the most obvious thing in the world:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SQL
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 0;  -- page 1
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 20; -- page 2  
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 40; -- page 3 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;works great when you have 1000 users and a few thousand posts. feels like magic when you first implement it and see that smooth scrolling action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kquunv1fnvvxv01dbnh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kquunv1fnvvxv01dbnh.png" alt="Offset pagination" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
img by Dan Hollick&lt;/p&gt;

&lt;p&gt;but this approach doesn't work at scale, and here's why:&lt;/p&gt;

&lt;p&gt;performance gets worse with every page - when you're on page 1000, the database has to count through 20,000 rows just to find your starting point. it's like counting to 20,000 every time someone scrolls down.&lt;/p&gt;

&lt;p&gt;inconsistent results - if someone posts a new tweet while you're scrolling, your offset gets messed up. you might see the same post twice or miss posts entirely. super confusing for users.&lt;/p&gt;

&lt;p&gt;memory usage - databases hate large offsets, they literally have to load and skip all those rows in memory.&lt;/p&gt;
&lt;h2&gt;
  
  
  how the big players actually do it &lt;em&gt;&lt;strong&gt;(cursor pagination)&lt;/strong&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;platforms like twitter, reddit, instagram, tiktok all use something called cursor pagination instead, and it's honestly way more elegant once you understand it.&lt;/p&gt;

&lt;p&gt;instead of saying "give me page 5", you say "give me 20 posts after this exact post id/timestamp". the api looks something like:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;GET /posts?after=1692123456789&amp;amp;limit=20&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;where that after parameter is called a cursor - it's like leaving a bookmark instead of counting pages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiow0ookhmvt26kqj10eb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiow0ookhmvt26kqj10eb.png" alt="Cursor Pagination" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
img by Dan Hollick&lt;/p&gt;

&lt;p&gt;on the backend, your query becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sql
SELECT * FROM posts 
WHERE created_at &amp;lt; '2023-08-15 12:34:56' 
ORDER BY created_at DESC 
LIMIT 20;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  this is lightning fast because:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;constant time lookups - databases can jump directly to your cursor position using indexes&lt;/li&gt;
&lt;li&gt;stable results - new posts don't affect your current scroll position&lt;/li&gt;
&lt;li&gt;scales infinitely - doesn't matter if you have 1000 posts or 1 billion&lt;/li&gt;
&lt;li&gt;the frontend just stores the last post's timestamp/id and uses that for the next request. boom, perfect infinite scrolling that works exactly like the big platforms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the devil's in the implementation details&lt;br&gt;
but here's where it gets interesting - each platform has their own flavor of cursors depending on their needs:&lt;/p&gt;

&lt;p&gt;twitter uses tweet ids (which are chronologically ordered snowflake ids), reddit combines score and timestamp for their ranking algorithm, instagram has to handle both chronological and algorithmic feeds.&lt;/p&gt;

&lt;p&gt;some use opaque cursor tokens (encrypted strings that hide the implementation), others expose the actual values. some encode multiple values into a single cursor for complex sorting.&lt;/p&gt;

&lt;p&gt;there's also edge cases like handling deleted posts, dealing with ties in timestamps, making sure cursors work across different timezones, pagination limits to prevent abuse... lot of little details that make the difference between a prototype and production-ready infinite scrolling.&lt;/p&gt;

&lt;h2&gt;
  
  
  what's next in this series
&lt;/h2&gt;

&lt;p&gt;this is the first post in my "how shit actually works" series because i keep using these tools every day and realizing i have no idea what's happening under the hood, and apparently neither do a lot of other people.&lt;/p&gt;

&lt;p&gt;next post i'm gonna dive deeper into cursor token implementation - how to handle complex sorting, deal with edge cases, and build cursors that actually work in production. might even show some real code examples from projects i've worked on.&lt;/p&gt;

&lt;h2&gt;
  
  
  building your own infinite scroll?
&lt;/h2&gt;

&lt;p&gt;if you're working on something that needs this kind of functionality, honestly the jump from prototype pagination to production-ready cursor systems can be tricky. there's a lot of little details that aren't obvious until you hit scale or real user patterns.&lt;/p&gt;

&lt;p&gt;i help founders and indie hackers turn their ideas into actual working products all the time - from mvp to scale, full-stack development, team building, all that stuff. if you're building something cool and want someone who's been through these scaling challenges before, hit me up.&lt;/p&gt;

&lt;p&gt;anyway, that's how your favorite apps keep you scrolling forever without breaking. pretty neat engineering hidden behind something that just feels like magic.&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>pagination</category>
      <category>socialmedia</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
