<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matthew Segal</title>
    <description>The latest articles on DEV Community by Matthew Segal (@mattdsegal).</description>
    <link>https://dev.to/mattdsegal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F412071%2Fe5f02b87-1c51-4d08-99fc-08fed789fe68.jpg</url>
      <title>DEV Community: Matthew Segal</title>
      <link>https://dev.to/mattdsegal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattdsegal"/>
    <language>en</language>
    <item>
      <title>How to find what you want in the Django documentation</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Fri, 26 Jun 2020 09:50:14 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-find-what-you-want-in-the-django-documentation-3b7k</link>
      <guid>https://dev.to/mattdsegal/how-to-find-what-you-want-in-the-django-documentation-3b7k</guid>
      <description>&lt;p&gt;Many beginner programmers find the &lt;a href="https://docs.djangoproject.com/en/3.0/"&gt;Django documentation&lt;/a&gt; overwhelming.&lt;/p&gt;

&lt;p&gt;Let's say you want to learn how to perform a login for a user. Seems like it would be pretty simple: logins are a core feature of Django. If you &lt;a href="https://www.google.com/search?q=django+login"&gt;google for "django login"&lt;/a&gt; or &lt;a href="https://docs.djangoproject.com/en/3.0/search/?q=login"&gt;search the docs&lt;/a&gt; you see a few options, with "Using the Django authentication system" as the most promising result. You click the link, happily anticipating that your login problems will soon be over, and you get smacked in the face with &lt;a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/"&gt;thirty nine full browser pages of text&lt;/a&gt;. This is way too much information!&lt;/p&gt;

&lt;p&gt;Alternatively, you find your way to the reference page on &lt;a href="https://docs.djangoproject.com/en/3.0/ref/contrib/auth/"&gt;django.contrib.auth&lt;/a&gt;, because that's where all the auth stuff is, right? If you browse this page you will see an endless enumeration of all the different authentication models and fields and functions, but no explanation of how they're supposed to fit together.&lt;/p&gt;

&lt;p&gt;At this stage you may want to close your browser tab in despair and reconsider your decision to learn Django. It turns out the info that you wanted was somewhere in that really long page &lt;a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/#how-to-log-a-user-in"&gt;here&lt;/a&gt; and &lt;a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/#django.contrib.auth.authenticate"&gt;here&lt;/a&gt;. Why was it so hard to find? Why is this documentation so fragmented?&lt;/p&gt;

&lt;p&gt;God forbid that you should complain to anyone about this struggle. Experienced devs will say things like "you are looking in the wrong place" and "you need more experience before you try Django". This response begs the question though: how does anyone know where the "right place" is? The table of contents in the Django documentation &lt;a href="https://docs.djangoproject.com/en/3.0/contents/"&gt;is unreadably long&lt;/a&gt;. Meanwhile, you read other people raving about how great Django docs are: what are they talking about? You may wonder: am I missing something?&lt;/p&gt;

&lt;p&gt;Wouldn't it be great if you could go from having a question to finding the answer in a few minutes or less? A quick Google and a scan, and boom: you know how to solve your Django problem. This is possible. As a professional Django dev I do this daily. I rarely remember how to do anything from heart and I am constantly scanning the docs to figure out how to solve problems, and you can too.&lt;/p&gt;

&lt;p&gt;In this post I will outline how to find what you want in the Django documentation, so that you spend less time frustrated and stuck, and more time writing your web app. I also include a list of key references that I find useful.&lt;/p&gt;

&lt;p&gt;Experienced devs can be dismissive when you complain about documentation, but they're right about one thing: knowing how to read docs is a really important skill for a programmer, and being good at this will save you lots of time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Find the right section
&lt;/h2&gt;

&lt;p&gt;Library documentation is almost always written with distinct sections. If you do not understand what these sections are for, then you will be totally lost.&lt;br&gt;
If you have time, watch &lt;a href="https://www.youtube.com/watch?v=t4vKPhjcMZg"&gt;Daniele Procida's excellent talk&lt;/a&gt; how documentation should be structured. In the talk he describes four different sections of documentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tutorials&lt;/strong&gt;: lessons that show you how to complete a small project (&lt;a href="https://docs.djangoproject.com/en/3.0/intro/install/"&gt;example&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How-to guides&lt;/strong&gt;: guide with steps on how to solve a common problem (&lt;a href="https://docs.djangoproject.com/en/3.0/howto/custom-management-commands/"&gt;example&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API References&lt;/strong&gt;: detailed technical descriptions of all the bits of code (&lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/"&gt;example&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explanations&lt;/strong&gt;: high level discussion of design decisions (&lt;a href="https://docs.djangoproject.com/en/3.0/topics/templates/#module-django.template"&gt;example&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition to these, there's also commonly a &lt;strong&gt;Quickstart&lt;/strong&gt; (&lt;a href="http://whitenoise.evans.io/en/stable/#quickstart-for-django-apps"&gt;example&lt;/a&gt;), which is the absolute minimum steps you need to to do get started with the library.&lt;/p&gt;

&lt;p&gt;The Django Rest Framework docs use a structure similar to this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WEFPfvxG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/drf-sections.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WEFPfvxG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/drf-sections.png" alt="django rest framework sections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The ReactJS docs use a structure similar to this&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--P0kVKfX---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/react-sections.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--P0kVKfX---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/react-sections.png" alt="react sections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Django docs use a &lt;a href="https://docs.djangoproject.com/en/3.0/#how-the-documentation-is-organized"&gt;structure similar to this&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vj-AhYVE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/django-sections.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vj-AhYVE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/django-sections.png" alt="django sections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hopefully you see the pattern here: all these docs have been split up into distinct sections. Learn this structure once and you can quickly navigate most documentation.&lt;br&gt;
Now that you understand that library documentation is usually structured in a particular way, I will explain how to navigate that structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do the tutorial first
&lt;/h2&gt;

&lt;p&gt;This might seem obvious, but I have to say it. If there is a tutorial in the docs and you are feeling lost, then do the tutorial. It is a place where the authors may have decided to introduce concepts that are key to understanding everything else. If you're feeling like a badass, then don't "do" the tutorial, but at the very least skim read it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Find an example, guide or overview
&lt;/h2&gt;

&lt;p&gt;Avoid the &lt;a href="https://docs.djangoproject.com/en/3.0/ref/"&gt;API reference&lt;/a&gt; section, unless you already know &lt;em&gt;exactly&lt;/em&gt; what you're looking for. You will recognise that you are in an API reference section because the title will have "reference" in it, and the content will be very detailed with few high-level explanations. For example, &lt;a href="https://docs.djangoproject.com/en/3.0/ref/contrib/auth/"&gt;django.contrib.auth&lt;/a&gt; is a reference section - it is not a good place to learn how "Django login" works.&lt;/p&gt;

&lt;p&gt;You need to understand how the bits of code fit together before looking at an API reference. This can be hard since most documentation, even the really good stuff, is incomplete. Still, the best thing to try is to look for overviews and explanations of framework features.&lt;/p&gt;

&lt;p&gt;Find and scan the list of &lt;a href="https://docs.djangoproject.com/en/3.0/howto/"&gt;how-to guides&lt;/a&gt;, to see if they solve your exact problem. This will save you a lot of time if the guide directly solves your problem. Using our login example, there is no "how to log a user in" guide, which is bad luck.&lt;/p&gt;

&lt;p&gt;If there is no guide, then quickly scan the &lt;a href="https://docs.djangoproject.com/en/3.0/topics/"&gt;topic list&lt;/a&gt; and try and find the topic that you need. If you do not already understand the topic well, then read the overview. &lt;strong&gt;Google terms that you do not understand&lt;/strong&gt;, like "authentication" and "authorization" (they're different, specific things). In our login case, "&lt;a href="https://docs.djangoproject.com/en/3.0/topics/auth/"&gt;User authentication in Django&lt;/a&gt;" is the topic that we want from the list.&lt;/p&gt;

&lt;p&gt;Once you think you sort-of understand how everything should fit together, then you can move to the detailed API reference, so that you can ensure that you're using the code correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Find and remember key references
&lt;/h2&gt;

&lt;p&gt;Once you understand what you want to do, you will need to use the API reference pages to figure out exactly what code you should write. It's good to remember key pages that contain the most useful references. Here's my personal favourites that I use all the time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.djangoproject.com/en/3.0/ref/settings/"&gt;&lt;strong&gt;Settings reference&lt;/strong&gt;&lt;/a&gt;: A list of all the settings and what they do&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.djangoproject.com/en/3.0/ref/templates/builtins/"&gt;&lt;strong&gt;Built-in template tags&lt;/strong&gt;&lt;/a&gt;: All the template tags with examples&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/"&gt;&lt;strong&gt;Queryset API reference&lt;/strong&gt;&lt;/a&gt;: All the different tools for using the ORM to access the database&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.djangoproject.com/en/3.0/ref/models/fields/"&gt;&lt;strong&gt;Model field reference&lt;/strong&gt;&lt;/a&gt;: All the different model fields&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ccbv.co.uk/"&gt;&lt;strong&gt;Classy Class Based Views&lt;/strong&gt;&lt;/a&gt;: Detailed descriptions for each of Django's class-based views&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I don't have any of these pages bookmarked, I just google for them and then search using &lt;code&gt;ctrl-f&lt;/code&gt; to find what I need in seconds.&lt;/p&gt;

&lt;p&gt;When using Django REST Framework I often find myself referring to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="http://www.cdrf.co/"&gt;&lt;strong&gt;Classy DRF&lt;/strong&gt;&lt;/a&gt;: Like Classy Class Based Views but for DRF&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.django-rest-framework.org/api-guide/serializers/"&gt;&lt;strong&gt;Serializer reference&lt;/strong&gt;&lt;/a&gt;: To make serializers work&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.django-rest-framework.org/api-guide/fields/"&gt;&lt;strong&gt;Serializer field reference&lt;/strong&gt;&lt;/a&gt;: All the different serializer fields&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.django-rest-framework.org/api-guide/relations/#nested-relationships"&gt;&lt;strong&gt;Nested relationships&lt;/strong&gt;&lt;/a&gt;: How to put serializers &lt;a href="https://mattsegal.dev/img/xzibit.png"&gt;inside of other serializers&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Search insead of reading
&lt;/h2&gt;

&lt;p&gt;Most documentation is not meant to be read linearly, from start to end, like a novel: most pages are too long to read. Instead, you should strategically search for what you want. Most documentation involves big lists of things, because they're so much stuff that the authors need to explain in a lot of detail. You cannot rely on brute-force reading all the content to find the info you need.&lt;/p&gt;

&lt;p&gt;You can use your browser's build in text search feature (&lt;code&gt;ctrl-f&lt;/code&gt;) to quickly find the text that you need. This will save you a lot of scrolling and squinting at your screen. I use this technique all the time when browsing the Django docs. &lt;a href="https://www.loom.com/share/cc4b030513b0406c91a1eadcd08514a2"&gt;Here's a video&lt;/a&gt; of me finding out how to log in with Django using &lt;code&gt;ctrl-f&lt;/code&gt;. &lt;a href="https://www.loom.com/share/1be42c1709334817ab3cb055ad8acf69"&gt;Here's me struggling&lt;/a&gt; to get past the first list by trying to read all the words with my pathetic human eyes. I genuinely did miss the "auth" section several times when trying to read that list manually while writing this post.&lt;/p&gt;

&lt;p&gt;Using search is how you navigate the enormous &lt;a href="https://docs.djangoproject.com/en/3.0/contents/"&gt;table of contents&lt;/a&gt; or the &lt;a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/"&gt;39 browser pages of authentication overview&lt;/a&gt;. You're not supposed to read all that stuff, you're supposed to strategically search it. In our login example, good search terms would be "auth", "login", "log in" and "user".&lt;/p&gt;

&lt;p&gt;In addition, most really long pages will have a sidebar summarising all the content. If you're going to read something, read that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0A6QiRRQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/docs-sidebar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0A6QiRRQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/img/docs-sidebar.png" alt="django sections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Read the source code
&lt;/h2&gt;

&lt;p&gt;This is kind of the documentation equivalent of "go fuck yourself", but when you need an answer and the documentation doesn't have it, then the code is the authoratative source on how the library works. There are many library details that would be too laborious to document in full, and at some point the expectation is that if you &lt;em&gt;really need to know&lt;/em&gt; how something works, then you should try reading the code. The &lt;a href="https://github.com/django/django"&gt;Django source code&lt;/a&gt; is pretty well written, and the more time you spend immersed in it, the easier it will be to navigate. This isn't really advice for beginners, but if you're feeling brave, then give it a try.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The Django docs, in my opionion, really are quite good, but like most code docs, they're hard for beginners to navigate. I hope that these tips will make learning Django a more enjoyable experience for you. To summarise my tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify the different sections of the documentation&lt;/li&gt;
&lt;li&gt;Do the tutorial first if you're not feeling confident, or at least skim read it&lt;/li&gt;
&lt;li&gt;Avoid the API reference early on&lt;/li&gt;
&lt;li&gt;Try find a how to guide for your problem&lt;/li&gt;
&lt;li&gt;Try find a topic overview and explanation for your topic&lt;/li&gt;
&lt;li&gt;Remember key references for quick lookup later&lt;/li&gt;
&lt;li&gt;Search the docs, don't read them like a book&lt;/li&gt;
&lt;li&gt;Read the source code if you're desperate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As good as it is, the Django docs do not, and should not, tell you everything there is to know about how to use Django. At some point, you will need to turn to Django community blogs like &lt;a href="https://simpleisbetterthancomplex.com/"&gt;Simple is Better than Complex&lt;/a&gt;, YouTube videos, courses and books. When you need to deploy your Django app, you might enjoy my guide on &lt;a href="//https:/mattsegal.dev/simple-django-deployment.html"&gt;Django deployment&lt;/a&gt; and my overview of &lt;a href="//https:/mattsegal.dev/django-prod-architectures.html"&gt;Django server setups&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>django</category>
      <category>python</category>
    </item>
    <item>
      <title>How to pull production data into your local Postgres database</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Sun, 21 Jun 2020 10:47:47 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-pull-production-data-into-your-local-postgres-database-277f</link>
      <guid>https://dev.to/mattdsegal/how-to-pull-production-data-into-your-local-postgres-database-277f</guid>
      <description>&lt;p&gt;Sometimes you want to write a feature for your Django app that requires a lot of structured data that already exists in production. This happened to me recently: I needed to create a reporting tool for internal business users. The problem was that I didn't have much data in my local database. How can I see what my reports will look like if I don't have any data?&lt;/p&gt;

&lt;p&gt;It's possible to generate a bunch of fake data using a management command. I've written earlier about &lt;a href="https://mattsegal.dev/django-factoryboy-dummy-data.html"&gt;how to do this with FactoryBoy&lt;/a&gt;. This approach is great for filling web pages with dummy content, but it's tedious to do if your data is highly structured and follows a bunch of implcit rules. In the case of my reporting tool, the data I wanted involved hundreds of form submissions, and each submission has dozens of answers with many different data types. Writing a script to generate data like this would haven take ages! I've also seen situations like this when working with billing systems and online stores with many product categories.&lt;/p&gt;

&lt;p&gt;Wouldn't it be nice if we could just get a copy of our production data and use that for local development? You could just pull the latest data from prod and work on your feature with the confidence that you have plenty of data that is structured correctly.&lt;/p&gt;

&lt;p&gt;In this post I'll show you a script which you can use to fetch a Postgres database backup from cloud storage and use it to populate your local Postgres database with prod data. This post builds on three previous posts of mine, which you might want to read if you can't follow the scripting in this post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mattsegal.dev/reset-django-local-database.html"&gt;How to automatically reset your local Django database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mattsegal.dev/postgres-backup-and-restore.html"&gt;How to backup and restore a Postgres database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mattsegal.dev/postgres-backup-automate.html"&gt;How to automate your Postgres database backups&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm going to do all of my scripting in bash, but it's also possible to write similar scripts in PowerShell, with only a few tweaks to the syntax.&lt;/p&gt;

&lt;h3&gt;
  
  
  Starting script
&lt;/h3&gt;

&lt;p&gt;Let's start with the "database reset" bash script from my &lt;a href="https://mattsegal.dev/reset-django-local-database.html"&gt;previous post&lt;/a&gt;. This script resets your local database, runs migrations and creates a local superuser for you to use. We're going to extend this script with an additional step to download and restore from our latest database backup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Resets the local Django database, adding an admin login and migrations&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Resetting the database"&lt;/span&gt;
./manage.py reset_db &lt;span class="nt"&gt;--close-sessions&lt;/span&gt; &lt;span class="nt"&gt;--noinput&lt;/span&gt;

&lt;span class="c"&gt;# =========================================&lt;/span&gt;
&lt;span class="c"&gt;# DOWNLOAD AND RESTORE DATABASE BACKUP HERE&lt;/span&gt;
&lt;span class="c"&gt;# =========================================&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Running migrations"&lt;/span&gt;
./manage.py migrate

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Creating new superuser 'admin'"&lt;/span&gt;
./manage.py createsuperuser &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--email&lt;/span&gt; admin@example.com &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--noinput&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Setting superuser 'admin' password to 12345"&lt;/span&gt;
./manage.py shell_plus &lt;span class="nt"&gt;--quiet-load&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
u=User.objects.get(username='admin')
u.set_password('12345')
u.save()
"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Database restore finished."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Fetching the latest database backup
&lt;/h3&gt;

&lt;p&gt;Now that we have a base script to work with, we need to fetch the latest database backup. I'm going to assume that you've followed my guide on &lt;a href="https://mattsegal.dev/postgres-backup-automate.html"&gt;automating your Postgres database backups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's say your database is saved in an AWS S3 bucket called &lt;code&gt;mydatabase-backups&lt;/code&gt;, and you've saved your backups with a timestamp in the filename, like &lt;code&gt;postgres_mydatabase_1592731247.pgdump&lt;/code&gt;. Using these two facts we can use a little bit of bash scripting to find the name of the latest backup from our S3 bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find the latest backup file&lt;/span&gt;
&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;s3://mydatabase-backups
&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Found file &lt;/span&gt;&lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;&lt;span class="s2"&gt; in bucket &lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once you know the name of the latest backup file, you can download it to the current directory with the &lt;code&gt;aws&lt;/code&gt; CLI tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download the latest backup file&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;.&lt;/code&gt; in this case refers to the current directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Restoring from the latest backup
&lt;/h3&gt;

&lt;p&gt;Now that you've downloaded the backup file, you can apply it to your local database with &lt;code&gt;pg_restore&lt;/code&gt;. You may need to install a Postgres client on your local machine to get access to this tool. Assuming your local Postgres credentials aren't a secret, you can just hardcode them into the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_restore &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--clean&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--dbname&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--host&lt;/span&gt; localhost &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--port&lt;/span&gt; 5432 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--username&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--no-owner&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;In this case we use &lt;code&gt;--clean&lt;/code&gt; to remove any existing data and we use &lt;code&gt;--no-owner&lt;/code&gt; to ignore any commands that set ownership of objects in the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Look ma, no files!
&lt;/h3&gt;

&lt;p&gt;You don't have to save your backup file to disk before you use it to restore your local database: you can stream the data directly from &lt;code&gt;aws s3 cp&lt;/code&gt; to &lt;code&gt;pg_restore&lt;/code&gt; using pipes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; - | &lt;span class="se"&gt;\&lt;/span&gt;
    pg_restore &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--clean&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--dbname&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--host&lt;/span&gt; localhost &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--port&lt;/span&gt; 5432 &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--username&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--no-owner&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-&lt;/code&gt; in this case means "stream to stdout", which we use so that we can pipe the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final script
&lt;/h3&gt;

&lt;p&gt;Here's the whole thing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Resets the local Django database,&lt;/span&gt;
&lt;span class="c"&gt;# restores from latest prod backup,&lt;/span&gt;
&lt;span class="c"&gt;# and adds an admin login and migrations&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Resetting the database"&lt;/span&gt;
./manage.py reset_db &lt;span class="nt"&gt;--close-sessions&lt;/span&gt; &lt;span class="nt"&gt;--noinput&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Restoring database from S3 backups"&lt;/span&gt;
&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;s3://mydatabase-backups
&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1&lt;span class="si"&gt;)&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; - | &lt;span class="se"&gt;\&lt;/span&gt;
    pg_restore &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--clean&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--dbname&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--host&lt;/span&gt; localhost &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--port&lt;/span&gt; 5432 &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--username&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--no-owner&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Running migrations"&lt;/span&gt;
./manage.py migrate

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Creating new superuser 'admin'"&lt;/span&gt;
./manage.py createsuperuser &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--email&lt;/span&gt; admin@example.com &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--noinput&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Setting superuser 'admin' password to 12345"&lt;/span&gt;
./manage.py shell_plus &lt;span class="nt"&gt;--quiet-load&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
u=User.objects.get(username='admin')
u.set_password('12345')
u.save()
"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Database restore finished."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You should be able to to run this over and over and over to get the latest database backup working on your local machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other considerations
&lt;/h3&gt;

&lt;p&gt;When talking about using production backups locally, there are two points that I think are important.&lt;/p&gt;

&lt;p&gt;First, production data can contain sensitive user information including names, addresses, emails and even credit card details. You need to ensure that this data is only be distributed to people who are authorised to access it, or alternatively the backups should be sanitized so the senitive data is overwritten or removed.&lt;/p&gt;

&lt;p&gt;Secondly, It's possible to use database backups to debug issues in production. I think it's a great method for squashing hard-to-reproduce bugs, but it shouldn't be your only way to solve production errors. Before you move onto this technique, you should first ensure you have &lt;a href="https://mattsegal.dev/file-logging-django.html"&gt;application logging&lt;/a&gt; and &lt;a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html"&gt;error monitoring&lt;/a&gt; set up for your Django app, so that you don't lean on your backups as a crutch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next steps
&lt;/h3&gt;

&lt;p&gt;If you don't already have automated prod backups, I encourage you to set that up if you have any valuable data in your Django app. Once that's done, you'll be able to use this script to pull down prod data into your local dev environment on demand.&lt;/p&gt;

</description>
      <category>django</category>
      <category>postgres</category>
      <category>bash</category>
      <category>database</category>
    </item>
    <item>
      <title>How to automatically reset your local Django database</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Sun, 21 Jun 2020 10:45:40 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-automatically-reset-your-local-django-database-354b</link>
      <guid>https://dev.to/mattdsegal/how-to-automatically-reset-your-local-django-database-354b</guid>
      <description>&lt;p&gt;Sometimes when you're working on a Django app you want a fresh start. You want to nuke all of the data in your local database and start again from scratch. Maybe you ran some migrations that you don't want to keep, or perhaps there's some test data that you want to get rid of. This kind of problem doesn't crop up very often, but when it does it's &lt;em&gt;super&lt;/em&gt; annoying to do it manually over and over.&lt;/p&gt;

&lt;p&gt;In this post I'll show you small script that you can use to reset your local Django database. It completely automates deleting the old data, running migrations and setting up new users. I've written the script in &lt;code&gt;bash&lt;/code&gt; but most of it will also work in &lt;code&gt;powershell&lt;/code&gt; or &lt;code&gt;cmd&lt;/code&gt; with only minor changes.&lt;/p&gt;

&lt;p&gt;For those of you who hate reading, the full script is near the bottom.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resetting the database
&lt;/h3&gt;

&lt;p&gt;We're going to reset our local database with the &lt;a href="https://django-extensions.readthedocs.io/en/latest/installation_instructions.html"&gt;django-extensions&lt;/a&gt; package, which provides a nifty little helper command called &lt;code&gt;reset_db&lt;/code&gt;. This command destroys and recreates your Django app's database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py reset_db
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I like to add the &lt;code&gt;--noinput&lt;/code&gt; flag so the script does not ask me for confirmation, and the &lt;code&gt;--close-sessions&lt;/code&gt; flag if I'm using PostgreSQL locally so that the command does not fail if my Django app is connected the database at the same time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py reset_db &lt;span class="nt"&gt;--noinput&lt;/span&gt; &lt;span class="nt"&gt;--close-sessions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is is a good start, but now we have no migrations, users or any other data in our database. We need to add some data back in there before we can start using the app again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running migrations
&lt;/h3&gt;

&lt;p&gt;Before you do anything else it's important to run migrations so that all your database tables are set up correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py migrate
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating an admin user
&lt;/h3&gt;

&lt;p&gt;You want to have a superuser set up so you can log into the Django admin. It's nice when a script guarantees that your superuser always has the same username and password. The first part of creating a superuser is pretty standard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py createsuperuser &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--email&lt;/span&gt; admin@example.com &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--noinput&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now we want to set the admin user's password to something easy to remember, like "12345". This isn't a security risk because it's just for local development. This step involves a little more scripting trickery. Here we can use &lt;code&gt;shell_plus&lt;/code&gt;, which is an enhanced Django shell provided by django-extensions. The &lt;code&gt;shell_plus&lt;/code&gt; command will automatically import all of our models, which means we can write short one liners like this one, which prints the number of Users in the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py shell_plus &lt;span class="nt"&gt;--quiet-load&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"print(User.objects.count())"&lt;/span&gt;
&lt;span class="c"&gt;# 13&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Using this method we can grab our admin user and set their password:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py shell_plus &lt;span class="nt"&gt;--quiet-load&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
u = User.objects.get(username='admin')
u.set_password('12345')
u.save()
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Setting up new data
&lt;/h3&gt;

&lt;p&gt;There might be a little bit of data that you want to set up every time you reset your database. For example, in one app I run, I want to ensure that there is always a &lt;code&gt;SlackMessage&lt;/code&gt; model that has a &lt;code&gt;SlackChannel&lt;/code&gt;. We can set up this data in the same way we set up the admin user's password:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py shell_plus &lt;span class="nt"&gt;--quiet-load&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
c = SlackChannel.objects.create(name='Test Alerts')
SlackMessage.objects.create(channel=c)
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If you need to set up a &lt;em&gt;lot&lt;/em&gt; of data then there are options like &lt;a href="https://docs.djangoproject.com/en/3.0/howto/initial-data/"&gt;fixtures&lt;/a&gt; or tools like &lt;a href="https://factoryboy.readthedocs.io/en/latest/"&gt;Factory Boy&lt;/a&gt; (which I heartily recommend). If you only need to do a few lines of scripting to create your data, then you can include them in this script. If your development data setup is very complicated, then I recommend putting all the setup code into a custom management command.&lt;/p&gt;

&lt;h3&gt;
  
  
  The final script
&lt;/h3&gt;

&lt;p&gt;This is the script that you can use to reset your local Django database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Resets the local Django database, adding an admin login and migrations&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Resetting the database"&lt;/span&gt;
./manage.py reset_db &lt;span class="nt"&gt;--close-sessions&lt;/span&gt; &lt;span class="nt"&gt;--noinput&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Running migrations"&lt;/span&gt;
./manage.py migrate

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Creating new superuser 'admin'"&lt;/span&gt;
./manage.py createsuperuser &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--username&lt;/span&gt; admin &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--email&lt;/span&gt; admin@example.com &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--noinput&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Setting superuser 'admin' password to 12345"&lt;/span&gt;
./manage.py shell_plus &lt;span class="nt"&gt;--quiet-load&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
u=User.objects.get(username='admin')
u.set_password('12345')
u.save()
"&lt;/span&gt;

&lt;span class="c"&gt;# Any extra data setup goes here.&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;&amp;gt;&amp;gt; Database restore finished."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Other methods
&lt;/h3&gt;

&lt;p&gt;It's good to note that what I'm proposing is the "nuclear option": purge everything and restart from scratch. There are also some more precise methods available for managing your local database:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you just want to reverse some particular migrations, then you can use the &lt;code&gt;migrate&lt;/code&gt; command &lt;a href="https://docs.djangoproject.com/en/3.0/topics/migrations/#reversing-migrations"&gt;as documented here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you just want to delete all your data and you don't care about re-applying the migrations, then the &lt;code&gt;flush&lt;/code&gt; management command, &lt;a href="https://docs.djangoproject.com/en/3.0/ref/django-admin/#flush"&gt;documented here&lt;/a&gt; will take care of that.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Docker environments
&lt;/h3&gt;

&lt;p&gt;If you're running your local Django app in a Docker container via &lt;code&gt;docker-compose&lt;/code&gt;, then this process is a little bit more tricky, but it's not too much more complicated. You just need to add two commands to your script.&lt;/p&gt;

&lt;p&gt;First you want a command to kill all running containers, which I do because I'm superstitious and don't trust that &lt;code&gt;reset_db&lt;/code&gt; will actually close all database connections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;function &lt;/span&gt;stop_docker &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Stopping all running Docker containers"&lt;/span&gt;
    &lt;span class="c"&gt;# Ensure that no containers automatically restart&lt;/span&gt;
    docker update &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;no &lt;span class="sb"&gt;`&lt;/span&gt;docker ps &lt;span class="nt"&gt;-q&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;
    &lt;span class="c"&gt;# Kill everything&lt;/span&gt;
    docker &lt;span class="nb"&gt;kill&lt;/span&gt; &lt;span class="sb"&gt;`&lt;/span&gt;docker ps &lt;span class="nt"&gt;-q&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We also want a shorthand way to run commands inside your docker environment. Let's say you are working with a compose file located at &lt;code&gt;docker/docker-compose.local.yml&lt;/code&gt; and your Django app's container is called &lt;code&gt;web&lt;/code&gt;. Then you can run your commands inside the container as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;function &lt;/span&gt;run_docker &lt;span class="o"&gt;{&lt;/span&gt;
    docker-compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker/docker-compose.local.yml run &lt;span class="nt"&gt;--rm&lt;/span&gt; web &lt;span class="nv"&gt;$@&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now we can just prefix &lt;code&gt;run_docker&lt;/code&gt; to all the management commands we run. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Without Docker&lt;/span&gt;
./manage.py reset_db &lt;span class="nt"&gt;--close-sessions&lt;/span&gt; &lt;span class="nt"&gt;--noinput&lt;/span&gt;
&lt;span class="c"&gt;# With Docker&lt;/span&gt;
run_docker ./manage.py reset_db &lt;span class="nt"&gt;--close-sessions&lt;/span&gt; &lt;span class="nt"&gt;--noinput&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I will note that this &lt;code&gt;run_docker&lt;/code&gt; shortcut can act a little weird when you're passing strings to &lt;code&gt;shell_plus&lt;/code&gt;. You might need to experiment with different methods of escaping whitespace etc.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Hopefully this script will save you some time when you're working on your Django app. If you're interested in more Django-related database stuff then you might enjoy reading about how to &lt;a href="https://mattsegal.dev/postgres-backup-and-restore.html"&gt;back up and restore a Postgres database&lt;/a&gt; and then how to &lt;a href="https://mattsegal.dev/postgres-backup-automate.html"&gt;fully automate your prod backup process&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>django</category>
      <category>bash</category>
      <category>database</category>
    </item>
    <item>
      <title>How to automate your Postgres database backups</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Sun, 21 Jun 2020 10:43:16 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-automate-your-postgres-database-backups-54p9</link>
      <guid>https://dev.to/mattdsegal/how-to-automate-your-postgres-database-backups-54p9</guid>
      <description>&lt;p&gt;If you've got a web app running in production, then you'll want to take &lt;a href="https://mattsegal.dev/postgres-backup-and-restore.html"&gt;regular database backups&lt;/a&gt;, or else you risk losing all your data. Taking these backups manually is fine, but it's easy to forget to do it. It's better to remove the chance of human error and automate the whole process. To automate your backup and restore you will need three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A safe place to store your backup files&lt;/li&gt;
&lt;li&gt;A script that creates the backups and uploads them to the safe place&lt;/li&gt;
&lt;li&gt;A method to automatically run the backup script every day&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A safe place for your database backup files
&lt;/h3&gt;

&lt;p&gt;You don't want to store your backup files on the same server as your database. If your database server gets deleted, then you'll lose your backups as well. Instead, you should store your backups somewhere else, like a hard drive, your PC, or in the cloud.&lt;/p&gt;

&lt;p&gt;I like using cloud object storage for this kind of use-case. If you haven't heard of "object storage" before: it's just a kind of cloud service where you can store a bunch of files. All major cloud providers offer this service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon's AWS has the &lt;a href="https://aws.amazon.com/s3/"&gt;Simple Storage Service (S3)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Microsoft's Azure has &lt;a href="https://azure.microsoft.com/en-us/services/storage/"&gt;Storage&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google Cloud also has &lt;a href="https://cloud.google.com/storage"&gt;Storage&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;DigitalOcean has &lt;a href="https://www.digitalocean.com/products/spaces/"&gt;Spaces&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These object storage services are &lt;em&gt;very&lt;/em&gt; cheap at around 2c/GB/month, you'll never run out of disk space, they're easy to access from command line tools and they have very fast upload/download speeds, especially to/from other services hosted with the same cloud provider. I use these services a lot: this blog is being served from AWS S3.&lt;/p&gt;

&lt;p&gt;I like using S3 simply because I'm quite familiar with it, so that's what we're going to use for the rest of this post. If you're not already familiar with using the AWS command-line, then check out this post I wrote about &lt;a href="https://mattsegal.dev/aws-s3-intro.html"&gt;getting started with AWS S3&lt;/a&gt; before you continue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a database backup script
&lt;/h3&gt;

&lt;p&gt;In my &lt;a href="https://mattsegal.dev/postgres-backup-and-restore.html"&gt;previous post on database backups&lt;/a&gt; I showed you a small script to automatically take a backup using PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Backs up mydatabase to a file.&lt;/span&gt;
&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s2"&gt;"+%s"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgres_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.pgdump"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backing up &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt; to &lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
pg_dump &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;custom &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backup completed for &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I'm going to assume you have set up your Postgres database environment variables (&lt;code&gt;PGHOST&lt;/code&gt;, etc) either in the script, or elsewhere, as mentioned in the previous post.&lt;br&gt;
Next we're going to get our script to upload all backups to AWS S3.&lt;/p&gt;
&lt;h3&gt;
  
  
  Uploading backups to AWS Simple Storage Service (S3)
&lt;/h3&gt;

&lt;p&gt;We will be uploading our backups to S3 with the &lt;code&gt;aws&lt;/code&gt; command line (CLI) tool. To get this tool to work, we need to set up our AWS credentials on the server by either using &lt;code&gt;aws configure&lt;/code&gt; or by setting the environment variables &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;. Once that's done we can use &lt;code&gt;aws s3 cp&lt;/code&gt; to upload our backup files. Let's say we're using a bucket called "&lt;code&gt;mydatabase-backups&lt;/code&gt;":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Backs up mydatabase to a file and then uploads it to AWS S3.&lt;/span&gt;
&lt;span class="c"&gt;# First, dump database backup to a file&lt;/span&gt;
&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s2"&gt;"+%s"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgres_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.pgdump"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backing up &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt; to &lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
pg_dump &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;custom &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;

&lt;span class="c"&gt;# Second, copy file to AWS S3&lt;/span&gt;
&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;s3://mydatabase-backups
&lt;span class="nv"&gt;S3_TARGET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;/&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Copying &lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt; to &lt;/span&gt;&lt;span class="nv"&gt;$S3_TARGET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt; &lt;span class="nv"&gt;$S3_TARGET&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backup completed for &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You should be able to run this multiple times and see a new backup appear in your S3 bucket's webpage every time you do it. As a bonus, you can add a little one liner at the end of your script that checks for the last uploaded file to the S3 bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="nv"&gt;BACKUP_RESULT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Latest S3 backup: &lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_RESULT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once you're confident that your backup script works, we can move on to getting it to run every day.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running cron jobs
&lt;/h3&gt;

&lt;p&gt;Now we need to get our server to run this script every day, even when we're not around. The simplest way to do this is on a Linux server is with &lt;a href="https://en.wikipedia.org/wiki/Cron"&gt;cron&lt;/a&gt;. Cron can automatically run scripts for us on a schedule. We'll be using the &lt;code&gt;crontab&lt;/code&gt; tool to set up our backup job.&lt;/p&gt;

&lt;p&gt;You can read more about how to use crontab &lt;a href="https://linuxize.com/post/scheduling-cron-jobs-with-crontab/"&gt;here&lt;/a&gt;. If you find that you're having issues setting up cron, you might also find this &lt;a href="https://serverfault.com/questions/449651/why-is-my-crontab-not-working-and-how-can-i-troubleshoot-it"&gt;StackOverflow post&lt;/a&gt; useful.&lt;/p&gt;

&lt;p&gt;Before we set up our daily database backup job, I suggest trying out a test script to make sure that your cron setup is working. For example, this script prints the current time when it is run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;nano&lt;/code&gt;, you can create a new file called &lt;code&gt;~/test.sh&lt;/code&gt;, save it, then make it executable as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;nano ~/test.sh
&lt;span class="c"&gt;# Write out the time printing script in nano, save the file.&lt;/span&gt;
&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ~/test.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Then you can test it out a little by running it a couple of times to check that it is printing the time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;~/test.sh
&lt;span class="c"&gt;# Sat Jun  6 08:05:14 UTC 2020&lt;/span&gt;
~/test.sh
&lt;span class="c"&gt;# Sat Jun  6 08:05:14 UTC 2020&lt;/span&gt;
~/test.sh
&lt;span class="c"&gt;# Sat Jun  6 08:05:14 UTC 2020&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once you're confident that your test script works, you can create a cron job to run it every minute. Cron uses a special syntax to specifiy how often a job runs. These "cron expressions" are a pain to write by hand, so I use &lt;a href="https://crontab.cronhub.io/"&gt;this tool&lt;/a&gt; to generate them. The cron expression for "every minute" is the inscrutable string "&lt;code&gt;* * * * *&lt;/code&gt;". This is the crontab entry that we're going to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test crontab entry&lt;/span&gt;
&lt;span class="nv"&gt;SHELL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/bin/bash
&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; ~/test.sh &amp;amp;&amp;gt;&amp;gt; ~/time.log
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;SHELL&lt;/code&gt; setting tells crontab to use bash to execute our command&lt;/li&gt;
&lt;li&gt;The "&lt;code&gt;* * * * *&lt;/code&gt;" entry tells cron to execute our command every minute&lt;/li&gt;
&lt;li&gt;The command &lt;code&gt;~/test.sh &amp;amp;&amp;gt;&amp;gt; ~/time.log&lt;/code&gt; runs our test script &lt;code&gt;~/test.sh&lt;/code&gt; and then appends all output to a log file called &lt;code&gt;~/time.log&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enter the text above into your user's crontab file using the crontab editor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;crontab &lt;span class="nt"&gt;-e&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once you've saved your entry, you should then be able to view your crontab entry using the list command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;crontab &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;span class="c"&gt;# SHELL=/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# * * * * * ~/test.sh &amp;amp;&amp;gt;&amp;gt; ~/time.log&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You can check that cron is actually trying to run your script by watching the system log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /var/log/syslog | &lt;span class="nb"&gt;grep &lt;/span&gt;CRON
&lt;span class="c"&gt;# Jun  6 11:17:01 swarm CRON[6908]: (root) CMD (~/test.sh &amp;amp;&amp;gt;&amp;gt; ~/time.log)&lt;/span&gt;
&lt;span class="c"&gt;# Jun  6 11:17:01 swarm CRON[6908]: (root) CMD (~/test.sh &amp;amp;&amp;gt;&amp;gt; ~/time.log)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You can also watch your logfile to see that time is being written every minute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; time.log
&lt;span class="c"&gt;# Sat Jun 6 11:34:01 UTC 2020&lt;/span&gt;
&lt;span class="c"&gt;# Sat Jun 6 11:35:01 UTC 2020&lt;/span&gt;
&lt;span class="c"&gt;# Sat Jun 6 11:36:01 UTC 2020&lt;/span&gt;
&lt;span class="c"&gt;# Sat Jun 6 11:37:01 UTC 2020&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once you're happy that you can run a test script every minute with cron, we can move on to running your database backup script daily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running our backup script daily
&lt;/h3&gt;

&lt;p&gt;Now we're nearly ready to run our backup script using a cron job. There are a few changes that we'll need to make to our existing setup. First we need to write our database backup script to &lt;code&gt;~/backup.sh&lt;/code&gt; and make sure it is executable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ~/backup.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Then we need to crontab entry to run every day, which will be "&lt;a href="https://crontab.cronhub.io/"&gt;&lt;code&gt;0 0 * * *&lt;/code&gt;&lt;/a&gt;", and update our cron command to run our backup script. Our new crontab entry should be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Database backup crontab entry&lt;/span&gt;
&lt;span class="nv"&gt;SHELL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/bin/bash
0 0 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; ~/backup.sh &amp;amp;&amp;gt;&amp;gt; ~/backup.log
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Update your crontab with &lt;code&gt;crontab -e&lt;/code&gt;. Now we wait! This script should run every night at midnight (server time) to take your database backups and upload them to AWS S3. If this isn't working, then change your cron expression so that it runs the script every minute, and use the steps I showed above to try and debug the problem.&lt;/p&gt;

&lt;p&gt;Hopefully it all runs OK and you will have plenty of daily database backups to roll back to if anything ever goes wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automatic restore from the latest backup
&lt;/h3&gt;

&lt;p&gt;When disaster strikes and you need your backups, you could manually view your S3 bucket, download the backup file, upload it to the server and manual run the restore, which I documented in my &lt;a href="https://mattsegal.dev/postgres-backup-and-restore.html"&gt;previous post&lt;/a&gt;. This is totally fine, but as a bonus I thought it would be nice to include a script that automatically downloads the latest backup file and uses it to restore your database. This kind of script would be ideal for dumping production data into a test server. First I'll show you the script, then I'll explain how it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Restoring database &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt; from S3 backups"&lt;/span&gt;

&lt;span class="c"&gt;# Find the latest backup file&lt;/span&gt;
&lt;span class="nv"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;s3://mydatabase-backups
&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Found file &lt;/span&gt;&lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;&lt;span class="s2"&gt; in bucket &lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Restore from the latest backup file&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Restoring &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt; from &lt;/span&gt;&lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;S3_TARGET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;/&lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nv"&gt;$S3_TARGET&lt;/span&gt; - | pg_restore &lt;span class="nt"&gt;--dbname&lt;/span&gt; &lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt; &lt;span class="nt"&gt;--clean&lt;/span&gt; &lt;span class="nt"&gt;--no-owner&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Restore completed"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I've assumed that all the Postgres environment variables (&lt;code&gt;PGHOST&lt;/code&gt;, etc) are already set elsewhere.&lt;/p&gt;

&lt;p&gt;There are three tasks that are done in this script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;finding the latest backup file in S3&lt;/li&gt;
&lt;li&gt;downloading the backup file&lt;/li&gt;
&lt;li&gt;restoring from the backup file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the first part of this script is finding the latest database backup file. The way we know which file is the latest is because of the Unix timestamp which we added to the filename. The first command we use is &lt;code&gt;aws s3 ls&lt;/code&gt;, which shows us all the files in our backup bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;
&lt;span class="c"&gt;# 2019-04-04 10:04:58     112309 postgres_mydatabase_1554372295.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# 2019-04-06 07:48:53     112622 postgres_mydatabase_1554536929.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# 2019-04-14 07:24:02     113484 postgres_mydatabase_1555226638.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# 2019-05-06 11:37:39     115805 postgres_mydatabase_1557142655.pgdump&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We then use &lt;code&gt;awk&lt;/code&gt; to isolate the filename. &lt;code&gt;awk&lt;/code&gt; is a text processing tool which I use occasionally, along with &lt;code&gt;cut&lt;/code&gt; and &lt;code&gt;sed&lt;/code&gt; to mangle streams of text into the shape I want. I hate them all, but they can be useful.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1554372295.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1554536929.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1555226638.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1557142655.pgdump&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We then run &lt;code&gt;sort&lt;/code&gt; over this output to ensure that each line is sorted by the time. The aws CLI tool seems to sort this data by the uploaded time, but we want to use &lt;em&gt;our&lt;/em&gt; timestamp, just in case a file was manually uploaded out-of-order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1554372295.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1554536929.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1555226638.pgdump&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1557142655.pgdump&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We use &lt;code&gt;tail&lt;/code&gt; to grab the last line of the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1
&lt;span class="c"&gt;# postgres_mydatabase_1557142655.pgdump&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And there's our filename! We use the &lt;code&gt;$()&lt;/code&gt; &lt;a href="http://www.tldp.org/LDP/abs/html/commandsub.html"&gt;command-substituation&lt;/a&gt; thingy to capture the command output and store it in a variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LATEST_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $4}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;
&lt;span class="c"&gt;# postgres_mydatabase_1557142655.pgdump&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And that's part one of our script done: find the latest backup file. Now we need to download that file and use it to restore our database. We use the &lt;code&gt;aws&lt;/code&gt; CLI to copy backup file from S3 and stream the bytes into stdout. This literally prints out your whole backup file into the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;S3_TARGET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;/&lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nv"&gt;$S3_TARGET&lt;/span&gt; -
&lt;span class="c"&gt;# xtshirt9.5.199.5.19k0ENCODINENCODING&lt;/span&gt;
&lt;span class="c"&gt;# SET client_encoding = 'UTF8';&lt;/span&gt;
&lt;span class="c"&gt;# false00&lt;/span&gt;
&lt;span class="c"&gt;# ... etc ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-&lt;/code&gt; symbol is commonly used in shell scripting to mean "write to stdout". This isn't very useful on it's own, but we can send that data to the &lt;code&gt;pg_restore&lt;/code&gt; command via a pipe:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;S3_TARGET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET&lt;/span&gt;/&lt;span class="nv"&gt;$LATEST_FILE&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nv"&gt;$S3_TARGET&lt;/span&gt; - | pg_restore &lt;span class="nt"&gt;--dbname&lt;/span&gt; &lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt; &lt;span class="nt"&gt;--clean&lt;/span&gt; &lt;span class="nt"&gt;--no-owner&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And that's the whole script!&lt;/p&gt;

&lt;h3&gt;
  
  
  Next steps
&lt;/h3&gt;

&lt;p&gt;Now you can set up automated backups for your Postgres database. Hopefully having these daily backups this will take a weight off your mind. Don't forget to do a test restore every now and then, because backups are worthless if you aren't confident that they actually work.&lt;/p&gt;

&lt;p&gt;If you want to learn more about the Unix shell tools I used in this post, then I recommend having a go at the &lt;a href="https://overthewire.org/"&gt;Over the Wire Wargames&lt;/a&gt;, which teaches you about bash scripting and hacking at the same time.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>django</category>
      <category>bash</category>
      <category>database</category>
    </item>
    <item>
      <title>How to backup and restore a Postgres database</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Sun, 21 Jun 2020 10:38:44 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-backup-and-restore-a-postgres-database-44o0</link>
      <guid>https://dev.to/mattdsegal/how-to-backup-and-restore-a-postgres-database-44o0</guid>
      <description>&lt;p&gt;You've deployed your Django web app to to the internet. Grats! Now you have a fun new problem: your app's database is full of precious "live" data, and if you lose that data, it's gone forever. If your database gets blown away or corrupted, then you will need backups to restore your data. This post will go over how to backup and restore PostgreSQL, which is the database most commonly deployed with Django.&lt;/p&gt;

&lt;p&gt;Not everyone needs backups. If your Django app is just a hobby project then losing all your data might not be such a big deal. That said, if your app is a critical part of a business, then losing your app's data could literally mean the end of the business - people losing their jobs and going bankrupt. So, at least some of time, you don't want to lose all your data.&lt;/p&gt;

&lt;p&gt;The good news is that backing up and restoring Postgres is pretty easy, you only need two commands: &lt;code&gt;pg_dump&lt;/code&gt; and &lt;code&gt;pg_restore&lt;/code&gt;. If you're using MySQL instead of Postgres, then you can do something very similar to the instructions in this post using &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html"&gt;&lt;code&gt;mysqldump&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Taking database backups
&lt;/h3&gt;

&lt;p&gt;I'm going to assume that you've already got a Postgres database running somewhere. You'll need to run the following code from a &lt;code&gt;bash&lt;/code&gt; shell on a Linux machine that can access the database. In this example, let's say you're logged into the database server with &lt;code&gt;ssh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The first thing to do is set some &lt;a href="https://www.postgresql.org/docs/current/libpq-envars.html"&gt;Postgres-specifc environment variables&lt;/a&gt; to specify your target database and login credentials. This is mostly for our convenience later on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The server Postgres is running on&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGHOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost
&lt;span class="c"&gt;# The port Postgres is listening on&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGPORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;span class="c"&gt;# The database you want to back up&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydatabase
&lt;span class="c"&gt;# The database user you are logging in as&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGUSER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myusername
&lt;span class="c"&gt;# The database user's password&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypassw0rd
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You can test these environment variables by running a &lt;a href="https://www.postgresql.org/docs/current/app-psql.html"&gt;&lt;code&gt;psql&lt;/code&gt;&lt;/a&gt; command to list all the tables in your app's database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s2"&gt;t"&lt;/span&gt;

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# List of relations&lt;/span&gt;
&lt;span class="c"&gt;# Schema | Name          | Type  | Owner&lt;/span&gt;
&lt;span class="c"&gt;#--------+---------------+-------+--------&lt;/span&gt;
&lt;span class="c"&gt;# public | auth_group    | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# public | auth_group... | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# public | auth_permi... | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# public | django_adm... | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# .. etc ..&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;psql&lt;/code&gt; is missing you can install it on Ubuntu or Debian using &lt;code&gt;apt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;postgresql-client
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now we're ready to create a database dump with &lt;a href="https://www.postgresql.org/docs/12/app-pgdump.html"&gt;&lt;code&gt;pg_dump&lt;/code&gt;&lt;/a&gt;. It's pretty simple to use because we set up those environment variables earlier. When you run &lt;code&gt;pg_dump&lt;/code&gt;, it just spits out a bunch of SQL statements as hundreds, or even thousands of lines of text. You can take a look at the output using &lt;code&gt;head&lt;/code&gt; to view the first 10 lines of text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_dump | &lt;span class="nb"&gt;head&lt;/span&gt;

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# --&lt;/span&gt;
&lt;span class="c"&gt;# -- PostgreSQL database dump&lt;/span&gt;
&lt;span class="c"&gt;# --&lt;/span&gt;
&lt;span class="c"&gt;# -- Dumped from database version 9.5.19&lt;/span&gt;
&lt;span class="c"&gt;# -- Dumped by pg_dump version 9.5.19&lt;/span&gt;
&lt;span class="c"&gt;# SET statement_timeout = 0;&lt;/span&gt;
&lt;span class="c"&gt;# SET lock_timeout = 0;&lt;/span&gt;
&lt;span class="c"&gt;# SET client_encoding = 'UTF8';&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The SQL statements produced by &lt;code&gt;pg_dump&lt;/code&gt; are instructions on how to re-create your database. You can turn this output into a backup by writing all this SQL text into a file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_dump &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; mybackup.sql
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;That's it! You now have a database backup. You might have noticed that storing all your data as SQL statements is rather inefficient. You can compress this data by using the "custom" dump format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_dump &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;custom &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; mybackup.pgdump
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This "custom" format is ~3x smaller in terms of file size, but it's not as pretty for humans to read because it's now in some funky non-text binary format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_dump &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;custom | &lt;span class="nb"&gt;head&lt;/span&gt;

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# xtshirt9.5.199.5.19k0ENCODINENCODING&lt;/span&gt;
&lt;span class="c"&gt;# SET client_encoding = 'UTF8';&lt;/span&gt;
&lt;span class="c"&gt;# false00&lt;/span&gt;
&lt;span class="c"&gt;# ... etc ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Finally, &lt;code&gt;mybackup.pgdump&lt;/code&gt; is a crappy file name. It's not clear what is inside the file. Are we going to remember which database this is for? How do we know that this is the freshest copy? Let's add a &lt;a href="https://en.wikipedia.org/wiki/Unix_time"&gt;timestamp&lt;/a&gt; plus a descriptive name to help us remember:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get Unix epoch timestamp&lt;/span&gt;
&lt;span class="c"&gt;# Eg. 1591255548&lt;/span&gt;
&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s2"&gt;"+%s"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;# Descriptive file name&lt;/span&gt;
&lt;span class="c"&gt;# Eg. postgres_mydatabase_1591255548.pgdump&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgres_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.pgdump"&lt;/span&gt;
pg_dump &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;custom &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Now you can run these commands every month, week, or day to get a snapshot of your data. If you wanted, you could write this whole thing into a &lt;code&gt;bash&lt;/code&gt; script called &lt;code&gt;backup.sh&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Backs up mydatabase to a file.&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGHOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGPORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5432
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mydatabase
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGUSER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;myusername
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mypassw0rd
&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s2"&gt;"+%s"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgres_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PGDATABASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TIME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.pgdump"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backing up &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt; to &lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
pg_dump &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;custom &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backup completed"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You should avoid hardcoding passwords like I just did above, it's better to pass credentials in as a script argument or environment variable. The file &lt;code&gt;/etc/environment&lt;/code&gt; is a nice place to store these kinds of credentials on a secure server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Restoring your database from backups
&lt;/h3&gt;

&lt;p&gt;It's pointless creating backups if you don't know how to use them to restore your data. There are three scenarios that I can think of where you want to run a restore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to set up your database from scratch&lt;/li&gt;
&lt;li&gt;You want to rollback your exiting database to a previous time&lt;/li&gt;
&lt;li&gt;You want to restore data in your dev environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll go over these scenarios one at a time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Restoring from scratch
&lt;/h3&gt;

&lt;p&gt;Sometimes you can lose the database server and there is nothing left. Maybe you deleted it by accident, thinking it was a different server. Luckily you have your database backup file, and hopefully some &lt;a href="https://mattsegal.dev/intro-config-management.html"&gt;automated configuration management&lt;/a&gt; to help you quickly set the server up again.&lt;/p&gt;

&lt;p&gt;Once you've got the new server provisioned and PostgreSQL installed, you'll need to recreate the database and the user who owns it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; postgres psql &lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
    CREATE USER &lt;/span&gt;&lt;span class="nv"&gt;$PGUSER&lt;/span&gt;&lt;span class="sh"&gt; WITH PASSWORD '&lt;/span&gt;&lt;span class="nv"&gt;$PGPASSWORD&lt;/span&gt;&lt;span class="sh"&gt;';
    CREATE DATABASE &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="sh"&gt; WITH OWNER &lt;/span&gt;&lt;span class="nv"&gt;$PGUSER&lt;/span&gt;&lt;span class="sh"&gt;;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Then you can set up the same environment variables that we did earlier (PGHOST, etc.) and then use &lt;a href="https://www.postgresql.org/docs/12/app-pgrestore.html"&gt;&lt;code&gt;pg_restore&lt;/code&gt;&lt;/a&gt; to restore your data.&lt;br&gt;
You'll probably see some warning errors, which is normal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres_mydatabase_1591255548.pgdump
pg_restore &lt;span class="nt"&gt;--dbname&lt;/span&gt; &lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# ... lots of errors ...&lt;/span&gt;
&lt;span class="c"&gt;# pg_restore: WARNING:  no privileges were granted for "public"&lt;/span&gt;
&lt;span class="c"&gt;# WARNING: errors ignored on restore: 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I'm not 100% on what all these errors mean, but I believe they're mostly related to the restore script trying to modify Postgres objects that your user does not have permission to modify. If you're using a standard Django app this shouldn't be an issue. You can check that the restore actually worked by checking your tables with &lt;code&gt;psql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check the tables&lt;/span&gt;
psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s2"&gt;t"&lt;/span&gt;

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# List of relations&lt;/span&gt;
&lt;span class="c"&gt;# Schema | Name          | Type  | Owner&lt;/span&gt;
&lt;span class="c"&gt;#--------+---------------+-------+--------&lt;/span&gt;
&lt;span class="c"&gt;# public | auth_group    | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# public | auth_group... | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# public | auth_permi... | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# public | django_adm... | table | myusername&lt;/span&gt;
&lt;span class="c"&gt;# .. etc ..&lt;/span&gt;

&lt;span class="c"&gt;# Check the last migration&lt;/span&gt;
psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM django_migrations ORDER BY id DESC LIMIT 1"&lt;/span&gt;

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;#  id |  app   | name      | applied&lt;/span&gt;
&lt;span class="c"&gt;# ----+--------+-----------+---------------&lt;/span&gt;
&lt;span class="c"&gt;#  20 | tshirt | 0003_a... | 2019-08-26...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;There you go! Your database has been restored. Crisis averted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rolling back an existing database
&lt;/h3&gt;

&lt;p&gt;If you want to roll your existing database back to an previous point in time, deleting all new data, then you will need to use the &lt;code&gt;--clean&lt;/code&gt; flag, which drops your restored database tables before re-creating them (&lt;a href="https://www.postgresql.org/docs/12/app-pgrestore.html"&gt;docs here&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres_mydatabase_1591255548.pgdump
pg_restore &lt;span class="nt"&gt;--clean&lt;/span&gt; &lt;span class="nt"&gt;--dbname&lt;/span&gt; &lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Restoring a dev environment
&lt;/h3&gt;

&lt;p&gt;It's often beneficial to restore a testing or development database from a known backup.&lt;br&gt;
When you do this, you're not so worried about setting up the right user permissions.&lt;br&gt;
In this case you want to completely destroy and re-create the database to get a completely fresh start, and you want to use the &lt;code&gt;--no-owner&lt;/code&gt; flag to ignore any database-user related stuff in the restore script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; postgres psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"DROP DATABASE &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; postgres psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"CREATE DATABASE &lt;/span&gt;&lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres_mydatabase_1591255548.pgdump
pg_restore &lt;span class="nt"&gt;--no-owner&lt;/span&gt; &lt;span class="nt"&gt;--dbname&lt;/span&gt; &lt;span class="nv"&gt;$PGDATABASE&lt;/span&gt; &lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;I use this method quite often to pull non-sensitive data down from production environments to try and reproduce bugs that have occured in prod. It's much easier to fix mysterious bugs when you have regular database backups, &lt;a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html"&gt;error reporting&lt;/a&gt; and &lt;a href="https://mattsegal.dev/django-logging-papertrail.html"&gt;centralized logging&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next steps
&lt;/h3&gt;

&lt;p&gt;I hope you now have the tools you need to backups and restore your Django app's Postgres database. If you want to read more the &lt;a href="https://www.postgresql.org/docs/12/index.html"&gt;Postgres docs&lt;/a&gt; have a good section on &lt;a href="https://www.postgresql.org/docs/12/backup-dump.html"&gt;database backups&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once you've got your head around database backups, you should automate the process to make it more reliable. I will show you how to do this in &lt;a href="https://mattsegal.dev/postgres-backup-automate.html"&gt;this follow-up post&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>django</category>
      <category>bash</category>
      <category>database</category>
    </item>
    <item>
      <title>How to generate lots of dummy data for your Django app</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Sat, 20 Jun 2020 07:17:48 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-generate-lots-of-dummy-data-for-your-django-app-3ajl</link>
      <guid>https://dev.to/mattdsegal/how-to-generate-lots-of-dummy-data-for-your-django-app-3ajl</guid>
      <description>&lt;p&gt;It sucks when you're working on a Django app and all your pages are empty. For example, if you're working on a forum webapp, then all your discussion boards will be empty by default:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LiK8t1eJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/dummy-threads-empty.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LiK8t1eJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/dummy-threads-empty.png" alt="dummy-threads-empty"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Manually creating enough data for your pages to look realistic is a lot of work. Wouldn't it be nice if there was an automatic way to populate your local database with dummy data&lt;br&gt;
that looks real? Eg. your forum app has many threads:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rjP2UEda--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/dummy-threads-full.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rjP2UEda--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/dummy-threads-full.png" alt="dummy-threads"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even better, wouldn't it be cool if there was an easy way to populate each thread with as many comments&lt;br&gt;
as you like?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rRVizitz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/dummy-comments.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rRVizitz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/dummy-comments.png" alt="dummy-comments"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post I'll show you how to use &lt;a href="https://factoryboy.readthedocs.io/en/latest/"&gt;Factory Boy&lt;/a&gt; and a few other tricks to quickly and repeatably generate an endless amount of dummy data for your Django app. By the end of the post you'll be able to generate all your test data using a management command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;./manage.py setup_test_data
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;There is example code for this blog post hosted in &lt;a href="https://github.com/MattSegal/djdt-perf-demo"&gt;this GitHub repo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example application
&lt;/h3&gt;

&lt;p&gt;In this post we'll be working with an example app that is an online forum. There are four models that we'll be working with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# models.py
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""A person who uses the website"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CharField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""A forum comment thread"""&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CharField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;creator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Comment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""A comment by a user on a thread"""&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CharField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;poster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Club&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""A group of users interested in the same thing"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CharField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;member&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ManyToManyField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Building data with Factory Boy
&lt;/h3&gt;

&lt;p&gt;We'll be using &lt;a href="https://factoryboy.readthedocs.io/en/latest/"&gt;Factory Boy&lt;/a&gt; to generate all our dummy data. It's a library that's built for automated testing, but it also works well for this use-case. Factory Boy can easily be configured to generate random but realistic data like names, emails and paragraphs by internally using the &lt;a href="https://faker.readthedocs.io/en/master/"&gt;Faker&lt;/a&gt; library.&lt;/p&gt;

&lt;p&gt;When using Factory Boy you create classes called "factories", which each represent a Django model. For example, for a user, you would create a factory class as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# factories.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;factory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;factory.django&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DjangoModelFactory&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;

&lt;span class="c1"&gt;# Defining a factory
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UserFactory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DjangoModelFactory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;

    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Faker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"first_name"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Using a factory with auto-generated data
&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UserFactory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="c1"&gt;# Kimberly
&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="c1"&gt;# 51
&lt;/span&gt;
&lt;span class="c1"&gt;# You can optionally pass in your own data
&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UserFactory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="c1"&gt;# Alice
&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="c1"&gt;# 52
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You can find the data types that Faker can produce by looking at the "&lt;a href="https://faker.readthedocs.io/en/master/providers.html"&gt;providers&lt;/a&gt;" that the library offers. Eg. I found "first_name" by reviewing the options inside the &lt;a href="https://faker.readthedocs.io/en/master/providers/faker.providers.person.html"&gt;person provider&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Another benefit of Factory boy is that it can be set up to generate related data using &lt;a href="https://factoryboy.readthedocs.io/en/latest/recipes.html#dependent-objects-foreignkey"&gt;SubFactory&lt;/a&gt;, saving you a lot of boilerplate and time. For example we can set up the &lt;code&gt;ThreadFactory&lt;/code&gt; so that it generates a &lt;code&gt;User&lt;/code&gt; as its creator automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# factories.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ThreadFactory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DjangoModelFactory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Thread&lt;/span&gt;

    &lt;span class="n"&gt;creator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SubFactory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserFactory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Faker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"sentence"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;nb_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;variable_nb_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create a new thread
&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ThreadFactory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;  &lt;span class="c1"&gt;# Room marriage study
&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;creator&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;User: Michelle&amp;gt;
&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;creator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;  &lt;span class="c1"&gt;# Michelle
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The ability to automatically generate related models and fake data makes Factory Boy quite powerful. It's worth taking a quick look at the &lt;a href="https://factoryboy.readthedocs.io/en/latest/recipes.html"&gt;other suggested patterns&lt;/a&gt; if you decide to try it out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding a management command
&lt;/h3&gt;

&lt;p&gt;Once you've defined all the models that you want to generate with Factory Boy, you can write a &lt;a href="https://simpleisbetterthancomplex.com/tutorial/2018/08/27/how-to-create-custom-django-management-commands.html"&gt;management command&lt;/a&gt; to automatically populate your database. This is a pretty crude script that doesn't take advantage of all of Factory Boy's features, like sub-factories, but I didn't want to spend too much time getting fancy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# setup_test_data.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;random&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;django.db&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;transaction&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;django.core.management.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseCommand&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;forum.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Club&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Comment&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;forum.factories&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;UserFactory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ThreadFactory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ClubFactory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;CommentFactory&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;NUM_USERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;NUM_CLUBS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;NUM_THREADS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="n"&gt;COMMENTS_PER_THREAD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
&lt;span class="n"&gt;USERS_PER_CLUB&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseCommand&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;help&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Generates test data"&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Deleting old data..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Comment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Club&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;all&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Creating new data..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Create all the users
&lt;/span&gt;        &lt;span class="n"&gt;people&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NUM_USERS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;person&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;UserFactory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;people&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;person&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add some users to clubs
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NUM_CLUBS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;club&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ClubFactory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;members&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;people&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;USERS_PER_CLUB&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;club&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;members&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create all the threads
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NUM_THREADS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;creator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;people&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ThreadFactory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;creator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;creator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Create comments for each thread
&lt;/span&gt;            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COMMENTS_PER_THREAD&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;commentor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;people&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;CommentFactory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;commentor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Using the &lt;code&gt;transaction.atomic&lt;/code&gt; decorator makes a big difference in the runtime of this script, since it bundles up 100s of queries and submits them in one go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Images
&lt;/h3&gt;

&lt;p&gt;If you need dummy images for your website as well then there are a lot of great free tools online to help. I use &lt;a href="https://api.adorable.io"&gt;adorable.io&lt;/a&gt; for dummy profile pics and &lt;a href="https://picsum.photos/"&gt;Picsum&lt;/a&gt; or &lt;a href="https://unsplash.com/developers"&gt;Unsplash&lt;/a&gt; for larger pictures like this one: &lt;a href="https://picsum.photos/700/500"&gt;https://picsum.photos/700/500&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KWbvUxf8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://picsum.photos/700/500" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KWbvUxf8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://picsum.photos/700/500" alt="picsum-example"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Next steps
&lt;/h3&gt;

&lt;p&gt;Hopefully this post helps you spin up a lot of fake data for your Django app very quickly. If you enjoy using Factory Boy to generate your dummy data, then you also might like incorporating it into your unit tests.&lt;/p&gt;

</description>
      <category>django</category>
      <category>webdev</category>
    </item>
    <item>
      <title>A tour of Django server setups</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Thu, 18 Jun 2020 23:32:05 +0000</pubDate>
      <link>https://dev.to/mattdsegal/a-tour-of-django-server-setups-2h06</link>
      <guid>https://dev.to/mattdsegal/a-tour-of-django-server-setups-2h06</guid>
      <description>&lt;p&gt;If you haven't deployed a lot of Django apps, then you might wonder: how do professionals put Django apps on the internet? What does Django typically look like when it's running in production? You might even be thinking &lt;em&gt;what the hell is &lt;a href="https://www.techopedia.com/definition/8989/production-environment"&gt;production&lt;/a&gt;?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before I started working a developer there was just a fuzzy cloud in my head where the knowledge of production infrastructure should be. If there's a fuzzy cloud in your head, let's fix it. There are many ways to extend a Django server setup to achieve better performance, cost-effectiveness and reliability. This post will take you on a tour of some common Django server setups, from the most simple and basic to the more complex and powerful. I hope it will build up your mental model of how Django is hosted in production, piece-by-piece.&lt;/p&gt;

&lt;h3&gt;
  
  
  Your local machine
&lt;/h3&gt;

&lt;p&gt;Let's start by reviewing a Django setup that you are alreay familiar with: your local machine. Going over this will be a warm-up for later sections. When you run Django locally, you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your web browser (Chrome, Safari, Firefox, etc)&lt;/li&gt;
&lt;li&gt;Django running with the runserver management command&lt;/li&gt;
&lt;li&gt;A SQLite database sitting in your project folder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TbHjR2IU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/local-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TbHjR2IU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/local-server.png" alt="local server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pretty simple right? Next let's look at something similar, but deployed to a web server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simplest possible webserver
&lt;/h3&gt;

&lt;p&gt;The simplest Django web server you can setup is very similar to your local dev environment. Most professional Django devs don't use a basic setup like this for their production environments. It works perfectly fine, but it has some limitations that we'll discuss later. It looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FXy6Q2dw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/simple-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FXy6Q2dw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/simple-server.png" alt="simple server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Typically people run Django on a Linux virtual machine, often using the Ubuntu distribution. The virtual machine is hosted by a cloud provider like &lt;a href="https://aws.amazon.com/"&gt;Amazon&lt;/a&gt;, &lt;a href="https://cloud.google.com/gcp/"&gt;Google&lt;/a&gt;, &lt;a href="https://azure.microsoft.com/en-au/"&gt;Azure&lt;/a&gt;, &lt;a href="https://www.digitalocean.com/"&gt;DigitalOcean&lt;/a&gt; or &lt;a href="https://www.linode.com/"&gt;Linode&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Instead of using runserver, you should use a WSGI server like &lt;a href="https://gunicorn.org/"&gt;Gunicorn&lt;/a&gt; to run your Django app. I go into more detail on why you shouldn't use runserver in production, and explain WSGI &lt;a href="https://mattsegal.dev/simple-django-deployment-2.html#wsgi"&gt;here&lt;/a&gt;. Otherwise, not that much is different from your local machine: you can still use SQLite as the database (&lt;a href="https://mattsegal.dev/simple-django-deployment-2.html#sqlite"&gt;more here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;This is the bare bones of the setup. There are a few other details that you'll need to manage like &lt;a href="https://mattsegal.dev/dns-for-noobs.html"&gt;setting up DNS&lt;/a&gt;, virtual environments, babysitting Gunicorn with a process supervisor like &lt;a href="https://mattsegal.dev/simple-django-deployment-4.html"&gt;Supervisord&lt;/a&gt; or how to serve static files with &lt;a href="http://whitenoise.evans.io/en/stable/"&gt;Whitenoise&lt;/a&gt;. If you're interested in a more complete guide on how to set up a simple server like this, I wrote &lt;a href="https://mattsegal.dev/simple-django-deployment.html"&gt;a guide&lt;/a&gt; that explains how to deploy Django.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typical standalone webserver
&lt;/h3&gt;

&lt;p&gt;Let's go over an environment that a professional Django dev might set up in production when using a single server. It's not the exact setup that everyone will always use, but the structure is very common.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--V8anZOcv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/typical-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--V8anZOcv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/typical-server.png" alt="typical server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some things are the same as the simple setup above: it's still a Linux virtual machine with Django being run by Gunicorn. There are three main differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQLite has been replaced by a different database, &lt;a href="https://www.postgresql.org/"&gt;PostgreSQL&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://www.nginx.com/"&gt;NGINX&lt;/a&gt; web server is now sitting in-front of Gunicorn in a &lt;a href="https://www.nginx.com/resources/glossary/reverse-proxy-server/"&gt;reverse-proxy&lt;/a&gt; setup&lt;/li&gt;
&lt;li&gt;Static files are now being served from outside of Django&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why did we swap SQLite for PostgreSQL? In general Postgres is a litte more advanced and full featured. For example, Postgres can handle multiple writes at the same time, while SQLite can't.&lt;/p&gt;

&lt;p&gt;Why did we add NGINX to our setup? NGINX is a dedicated webserver which provides extra features and performance improvements over just using Gunicorn to serve web requests. For example we can use NGINX to directly serve our app's static and media files more efficiently. NGINX can also be configured to a lot of other useful things, like encrypt your web traffic using HTTPS and compress your files to make your site faster. NGINX is the web server that is most commonly combined with Django, but there are also alternatives like the &lt;a href="https://httpd.apache.org/"&gt;Apache HTTP server&lt;/a&gt; and &lt;a href="https://docs.traefik.io/"&gt;Traefik&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's important to note that everything here lives on a single server, which means that if the server goes away, so does all your data, &lt;a href="https://mattsegal.dev/postgres-backup-and-restore.html"&gt;unless you have backups&lt;/a&gt;. This data includes your Django tables, which are stored in Postgres, and files uploaded by users, which will be stored in the &lt;a href="https://docs.djangoproject.com/en/3.0/ref/settings/#media-root"&gt;MEDIA_ROOT&lt;/a&gt; folder, somewhere on your filesystem. Having only one server also means that if your server restarts or shuts off, so does your website. This is OK for smaller projects, but it's not acceptable for big sites like StackOverflow or Instagram, where the cost of downtime is very high.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single webserver with multiple apps
&lt;/h3&gt;

&lt;p&gt;Once you start using NGINX and PostgreSQL, you can run multiple Django apps on the same machine. You can save money on hosting fees by packing multiple apps onto a single server rather than paying for a separate server for each app. This setup also allows you to re-use some of the services and configurations that you've already set up.&lt;/p&gt;

&lt;p&gt;NGINX is able to route incoming HTTP requests to different apps based on the domain name, and Postgres can host multiple databases on a single machine.&lt;br&gt;
For example, I use a single server to host some of my personal Django projects: &lt;a href="http://mattslinks.xyz/"&gt;Matt's Links&lt;/a&gt;, &lt;a href="http://memories.ninja/"&gt;Memories Ninja&lt;/a&gt; and &lt;a href="https://www.blogreader.com.au/"&gt;Blog Reader&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x-MXvQyY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/multi-app-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x-MXvQyY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/multi-app-server.png" alt="multi-app server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've omitted the static files for simplicity. Note that having multiple apps on one server saves you hosting costs, but there are downsides: restarting the server restarts all of your apps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single webserver with a worker
&lt;/h3&gt;

&lt;p&gt;Some web apps need to do things other than just &lt;a href="https://www.codecademy.com/articles/what-is-crud"&gt;CRUD&lt;/a&gt;. For example, my website &lt;a href="https://www.blogreader.com.au/"&gt;Blog Reader&lt;/a&gt; needs to scrape &lt;a href="https://slatestarcodex.com/2020/04/24/employer-provided-health-insurance-delenda-est/"&gt;text&lt;/a&gt; from a website and then send it to an Amazon API to be translated into &lt;a href="https://media.blogreader.com.au/media/043dcf9fe4c1df539468000cb97af1d7.mp3"&gt;audio files&lt;/a&gt;. Another common example is "thumbnailing", where you upload a huge 5MB image file to Facebook and they downsize it into a crappy 120kB JPEG. These kinds of tasks do not happen inside a Django view, because they take too long to run. Instead they have to happen "offline", in a separate worker process, using tools like &lt;a href="http://www.celeryproject.org/"&gt;Celery&lt;/a&gt;, &lt;a href="https://huey.readthedocs.io/en/latest/django.html"&gt;Huey&lt;/a&gt;, &lt;a href="https://github.com/rq/django-rq"&gt;Django-RQ&lt;/a&gt; or &lt;a href="https://django-q.readthedocs.io/en/latest/"&gt;Django-Q&lt;/a&gt;. All these tools provide you with a way to run tasks outside of Django views and do more complicated things, like co-ordinate multiple tasks and run them on schedules.&lt;/p&gt;

&lt;p&gt;All of these tools follow a similar pattern: tasks are dispatched by Django and put in a queue where they wait to be executed. This queue is managed by a service called a "broker", which keeps track of all the tasks that need to be done. Common brokers for Django tasks are Redis and RabbitMQ. A worker process, which uses the same codebase as your Django app, pulls tasks out the broker and runs them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Qn-OIKAP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/worker-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Qn-OIKAP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/worker-server.png" alt="worker server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you haven't worked with task queues before then it's not immediately obvious how this all works, so let me give an example. You want to upload a 2MB &lt;a href="https://memories-ninja-prod.s3-ap-southeast-2.amazonaws.com/original/7e26334177b6ee7d5ab4c21f7149190e.jpeg"&gt;photo of your breakfast&lt;/a&gt; from your phone to a Django site. To optimise image loading performance, the Django site will turn that 2MB photo upload into a 70kB &lt;a href="https://memories-ninja-prod.s3.amazonaws.com/display/7e26334177b6ee7d5ab4c21f7149190e.jpeg"&gt;display image&lt;/a&gt; and a smaller &lt;a href="https://memories-ninja-prod.s3.amazonaws.com/thumbnail/7e26334177b6ee7d5ab4c21f7149190e.jpeg"&gt;thumbnail image&lt;/a&gt;. So this is what happenes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A user uploads a photo to a Django view, which saves the original photo to the filesystem and updates the database to show that the file has been received&lt;/li&gt;
&lt;li&gt;The view also pushes a thumbnailing task to the task broker&lt;/li&gt;
&lt;li&gt;The broker receives the task and puts it in a queue, where it waits to be executed&lt;/li&gt;
&lt;li&gt;The worker asks the broker for the next task and the broker sends the thumbnailing tasks&lt;/li&gt;
&lt;li&gt;The worker reads the task description and runs some Python function, which reads the original image from the filesystem, creates the smaller thumbnail images, saves them and then updates the database to show that the thumbnailing is complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to learn more about this stuff, I've written guides for getting started with &lt;a href="https://mattsegal.dev/offline-tasks.html"&gt;offline tasks&lt;/a&gt; and &lt;a href="https://mattsegal.dev/simple-scheduled-tasks.html"&gt;scheduled tasks&lt;/a&gt; with Django Q.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single webserver with a cache
&lt;/h3&gt;

&lt;p&gt;Sometimes you'll want to &lt;a href="https://docs.djangoproject.com/en/3.0/topics/cache/"&gt;use a cache&lt;/a&gt; to store data for a short time. For example, caches are commonly used when you have some data that was expensive to pull from the database or an API and you want to re-use it for a little while. &lt;a href="https://redis.io/"&gt;Redis&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Memcached"&gt;Memcached&lt;/a&gt; are both popular cache services that are used in production with Django. It's not a very complicated setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OqshTh9S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/cache-on-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OqshTh9S--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/cache-on-server.png" alt="cache on server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Single webserver with Docker
&lt;/h3&gt;

&lt;p&gt;If you've heard of &lt;a href="https://www.docker.com/"&gt;Docker&lt;/a&gt; before you might be wondering where it factors into these setups. It's a great tool for creating consistent programming environments, but it doesn't actually change how any of this works too much. Most of the setups I've described would work basically the same way... except everything is inside a Docker container.&lt;/p&gt;

&lt;p&gt;For example, if you were running multiple Django apps on one server and you wanted to use Docker containers, then you might do something like this using &lt;a href="https://docs.docker.com/engine/swarm/"&gt;Docker Swarm&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jBVAzY1m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/swarm-server.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jBVAzY1m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/swarm-server.png" alt="docker on server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see it's not such a different structure compared to what we were doing before Docker. The containers are just wrappers around the services that we were already running. Putting things inside of Docker containers doesn't really change how all the services talk to each other. If you really wanted to you could wrap Docker containers around more things like NGINX, the database, a Redis cache, whatever. This is why I think it's valuable to learn how to deploy Django without Docker first. That said, you can do some more complicated setups with Docker containers, which we'll get into later.&lt;/p&gt;

&lt;h3&gt;
  
  
  External services
&lt;/h3&gt;

&lt;p&gt;So far I've been showing you server setups with just one virtual machine running Ubuntu. This is the simplest setup that you can use, but it has limitations: there are some things that you might need that a single server can't give you. In this section I'm going to walk you through how we can break apart our single server into more advanced setups.&lt;/p&gt;

&lt;p&gt;If you've studied programming you might have read about &lt;a href="https://en.wikipedia.org/wiki/Separation_of_concerns"&gt;separation of concerns&lt;/a&gt;, the &lt;a href="https://en.wikipedia.org/wiki/Single-responsibility_principle"&gt;single responsibility principle&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller"&gt;model-view-controller (MVC)&lt;/a&gt;. A lot of the changes that we're going to make will have a similar kind of vibe: we're going to split up our services into smaller, more specialised units, based on their "responsibilities". We're going to pull apart our services bit-by-bit until there's nothing left. Just a note: you might not need to do this for your services, this is just an overview of what you &lt;em&gt;could&lt;/em&gt; do.&lt;/p&gt;

&lt;h3&gt;
  
  
  External services - database
&lt;/h3&gt;

&lt;p&gt;The first thing you'd want to pull off of our server is the database. This involves putting PostgreSQL onto its own virtual machine. You can set this up yourself or pay a little extra for an off-the-shelf service like &lt;a href="https://aws.amazon.com/rds/"&gt;Amazon RDS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cgBUk6Fs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/postgres-external.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cgBUk6Fs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/postgres-external.png" alt="postgres on server setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are a couple of reasons that you'd want to put the database on its own server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You might have multiple apps on different servers that depend on the same database&lt;/li&gt;
&lt;li&gt;Your database performance will not be impacted by "noisy neighbours" eating up CPU, RAM or disk space on the same machine&lt;/li&gt;
&lt;li&gt;You've moved your precious database away from your Django web server, which means you can delete and re-create your Django app's server with less concern&lt;/li&gt;
&lt;li&gt;&lt;em&gt;mumble muble security mumble&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using an off-the-shelf option like AWS RDS is attractive because it reduces the amount of admin work that you need to run your database server. If you're a backend web developer with a lot of work to do and more money than time then this is a good move.&lt;/p&gt;

&lt;h3&gt;
  
  
  External services - object storage
&lt;/h3&gt;

&lt;p&gt;It is common to push file storage off the web server into "object storage", which is basically a filesystem behind a nice API. This is often done using &lt;a href="https://django-storages.readthedocs.io/en/latest/"&gt;django-storages&lt;/a&gt;, which I enjoy using. Object storage is usually used for user-uploaded "media" such as documents, photos and videos. I use AWS S3 (Simple Storage Service) for this, but every big cloud hosting provider has some sort of "object storage" offering.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UhcDB9U7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/files-external-revised.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UhcDB9U7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/files-external-revised.png" alt="AWS S3 setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are a few reasons why this is a good idea&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You've moved all of your app's state (files, database) off of your server, so now you can move, destroy and re-create the Django server with no data loss&lt;/li&gt;
&lt;li&gt;File downloads hit the object storage service, rather than your server, meaning you can scale your file downloads more easily&lt;/li&gt;
&lt;li&gt;You don't need to worry about any filesystem admin, like running out of disk space&lt;/li&gt;
&lt;li&gt;Multiple servers can easily share the same set of files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hopefully you see a theme here, we're taking shit we don't care about and making it someone else's problem. Paying someone else to do the work of managing our files and database leaves us more free time to work on more important things.&lt;/p&gt;

&lt;h3&gt;
  
  
  External services - web server
&lt;/h3&gt;

&lt;p&gt;You can also run your "web server" (NGINX) on a different virtual machine to your "app server" (Gunicorn + Django):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZkLkchOu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/nginx-1-external.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZkLkchOu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/nginx-1-external.png" alt="nginx external setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This seems kind of pointless though, why would you bother? Well, for one, you might have multiple identical app servers set up for redundancy and to handle high traffic, and NGINX can act as a &lt;a href="https://www.nginx.com/resources/glossary/load-balancing/"&gt;load balancer&lt;/a&gt; between the different servers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nkSBQz0h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/nginx-2-external.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nkSBQz0h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/nginx-2-external.png" alt="nginx external setup 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You could also replace NGINX with an off-the-shelf load balancer like an AWS Elastic Load Balancer or something similar.&lt;/p&gt;

&lt;p&gt;Note how putting our services on their own servers allows us to scale them out over multiple virtual machines. We couldn't run our Django app on three servers at the same time if we also had three copies of our filesystem and three databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  External services - task queue
&lt;/h3&gt;

&lt;p&gt;You can also push your "offline task" services onto their own servers. Typically the broker service would get its own machine and the worker would live on another:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XAV12JHA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/worker-1-external.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XAV12JHA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/worker-1-external.png" alt="worker external setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Splitting your worker onto its own server is useful because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can protect your Django web app from "noisy neighbours": workers which are hogging all the RAM and CPU&lt;/li&gt;
&lt;li&gt;You can give the worker server extra resources that it needs: CPU, RAM, or access to a GPU&lt;/li&gt;
&lt;li&gt;You can now make changes to the worker server without risking damage to the task queue or the web server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that you've split things up, you can also scale out your workers to run more tasks in parallel:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eUp6fDec--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/worker-2-external.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eUp6fDec--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/worker-2-external.png" alt="worker external setup 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You could potentially swap our your self-managed broker (Redis or RabbitMQ) for a managed queue like &lt;a href="https://aws.amazon.com/sqs/"&gt;Amazon SQS&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  External services - final form
&lt;/h3&gt;

&lt;p&gt;If you went totally all-out, your Django app could be set up like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1seeaIEU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/full-external.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1seeaIEU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/full-external.png" alt="fully external setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, you can go pretty crazy splitting up all the parts of your Django app and spreading across multiple servers. There are many upsides to this, but the downside is that you now have mutiple servers to provision, update, monitor and maintain. Sometimes the extra complexity is well worth or and sometimes it's a waste of your time. That said, there are many benefits to this setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your web and worker servers are completely replaceable, you can destroy, create and update them without affecting uptime at all&lt;/li&gt;
&lt;li&gt;You can now do &lt;a href="https://martinfowler.com/bliki/BlueGreenDeployment.html"&gt;blue-green deployments&lt;/a&gt; with zero web app downtime&lt;/li&gt;
&lt;li&gt;Your files and database are easily shared between multiple servers and applications&lt;/li&gt;
&lt;li&gt;You can provision different sized servers for their different workloads&lt;/li&gt;
&lt;li&gt;You can swap out your self-managed servers for managed infrastructure, like moving your task broker to AWS SQS, or your database to AWS RDS&lt;/li&gt;
&lt;li&gt;You can now autoscale your servers (more on this later)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you have complicated infrastructure like this you need to start automating your infrastructure setup and server config. It's just not feasible to manage this stuff manually once your setup has this many moving parts. I recorded a talk on &lt;a href="https://mattsegal.dev/intro-config-management.html"&gt;configuration management&lt;/a&gt; that introduces these concepts. You'll need to start looking into tools like &lt;a href="https://www.ansible.com/"&gt;Ansible&lt;/a&gt; and &lt;a href="https://www.packer.io/"&gt;Packer&lt;/a&gt; to configure your virtual machines, and tools like &lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt; or &lt;a href="https://aws.amazon.com/cloudformation/"&gt;CloudFormation&lt;/a&gt; to configure your cloud services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auto scaling groups
&lt;/h3&gt;

&lt;p&gt;You've already seen how you can have multiple web servers running the same app, or multiple worker servers all pulling tasks from a queue. These servers cost money, dollars per hour, and it can get very expensive to run more servers than you need.&lt;/p&gt;

&lt;p&gt;This is where &lt;a href="https://aws.amazon.com/autoscaling/"&gt;autoscaling&lt;/a&gt; comes in. You can setup your cloud services to use some sort of trigger, such as virtual machine CPU usage, to automatically create new virtual machines from an image and add them to an autoscaling group.&lt;/p&gt;

&lt;p&gt;Let's use our task worker servers as an example. If you have a thumbnailing service that turns &lt;a href="https://memories-ninja-prod.s3-ap-southeast-2.amazonaws.com/original/7e26334177b6ee7d5ab4c21f7149190e.jpeg"&gt;big uploaded photos&lt;/a&gt; into &lt;a href="https://memories-ninja-prod.s3.amazonaws.com/thumbnail/7e26334177b6ee7d5ab4c21f7149190e.jpeg"&gt;smaller photos&lt;/a&gt; then one server should be able to handle dozens of file uploads per second. What if during some periods of the day, like around 6pm after work, you saw file uploads spike from dozens per second to &lt;em&gt;thousands&lt;/em&gt; per second? Then you'd need more servers! With an autoscaling setup, the CPU usage on your worker servers would spike, triggering the creation of more and more worker servers, until you had enough to handle all the uploads. When the rate of file uploads drops, the extra servers would be automatically destroyed, so you aren't always paying for them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container clusterfuck
&lt;/h3&gt;

&lt;p&gt;There is a whole world of container fuckery that I haven't covered in much detail, because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I don't know it very well&lt;/li&gt;
&lt;li&gt;It's a little complicated for the targed audience of this post; and&lt;/li&gt;
&lt;li&gt;I don't think that most people need it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For completeness I'll quickly go over some of the cool, crazy things you can do with containers. You can use tools like &lt;a href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt; and &lt;a href="https://www.sumologic.com/glossary/docker-swarm/"&gt;Docker Swarm&lt;/a&gt; with a set of config files to define all your services as Docker containers and how they should all talk to each other. All your containers run somewhere in your Kubernetes/Swarm cluster, but as a developer, you don't really care what server they're on. You just build your Docker containers, write your config file, and push it up to your infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TNi5PBJN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/kubernetes-maybe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TNi5PBJN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://mattsegal.dev/django-prod-architecture/kubernetes-maybe.png" alt="maybe kubernetes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using these "container orchestration" tools allows you to decouple your containers from their underlying infrastructure. Multiple teams can deploy their apps to the same set of servers without any conflict between their apps.&lt;br&gt;
This is the kind of infrastructure that enables teams to deploy &lt;a href="https://www.youtube.com/watch?v=y8OnoxKotPQ"&gt;microservices&lt;/a&gt;. Big enterprises like Target will have specialised teams dedicated to setting up and maintaining these container orchestration systems, while other teams can use them without having to think about the underlying servers. These teams are essentially supplying a "platform as a service" (PaaS) to the rest of the organisation.&lt;/p&gt;

&lt;p&gt;As you might have noticed, there is probably too much complexity in these container orchestration tools for them to be worth your while as a solo developer or even as a small team. If you're interested in this sort of thing you might like &lt;a href="http://dokku.viewdocs.io/dokku/"&gt;Dokku&lt;/a&gt;, which claims to be "the smallest PaaS implementation you've ever seen".&lt;/p&gt;

&lt;h3&gt;
  
  
  End of tour
&lt;/h3&gt;

&lt;p&gt;That's basically everything that I know that I know about how Django can be set up in production. If you're interested in building up your infrastructure skills, then I recommend you try out one of the setups or tools that I've mentioned in this post. Hopefully I've built up your mental models of how Django gets deployed so that the next time someone mentions "task broker" or "autoscaling", you have some idea of what they're talking about.&lt;/p&gt;

&lt;p&gt;If you enjoyed reading this you might also like other things I've written about &lt;a href="https://mattsegal.dev/simple-django-deployment.html"&gt;deploying Django as simply as possible&lt;/a&gt;, how to &lt;a href="https://mattsegal.dev/offline-tasks.html"&gt;get started with offline tasks&lt;/a&gt;, how to start &lt;a href="https://mattsegal.dev/file-logging-django.html"&gt;logging to files&lt;/a&gt; and &lt;a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html"&gt;tracking errors&lt;/a&gt; in prod and my &lt;a href="https://mattsegal.dev/intro-config-management.html"&gt;introduction to configuration management&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you liked the box diagrams in this post check out &lt;a href="https://excalidraw.com/"&gt;Exalidraw&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>django</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>How to polish your GitHub projects when you're looking for a job</title>
      <dc:creator>Matthew Segal</dc:creator>
      <pubDate>Thu, 18 Jun 2020 23:10:34 +0000</pubDate>
      <link>https://dev.to/mattdsegal/how-to-polish-your-github-projects-when-you-re-looking-for-a-job-5afg</link>
      <guid>https://dev.to/mattdsegal/how-to-polish-your-github-projects-when-you-re-looking-for-a-job-5afg</guid>
      <description>&lt;p&gt;When you're going for your first programming job, you don't have any work experience or references to show that you can write code. You might not even have a relevant degree (I didn't). What you &lt;em&gt;can&lt;/em&gt; do is write some code and throw it up on GitHub to demonstrate to employers that you can build a complete app all by yourself.&lt;/p&gt;

&lt;p&gt;A lot of junior devs don't know how to show off their projects on GitHub. They spend &lt;em&gt;hours and hours&lt;/em&gt; writing code and then forget to do some basic things to make their project seem interesting. In this post I want to share some tips that you can apply in a few hours to make an existing project much more effective at getting you an interview.&lt;/p&gt;

&lt;h3&gt;
  
  
  Remove all the clutter
&lt;/h3&gt;

&lt;p&gt;Your project should only contain source code, plus the minimum files required to run it. It should not not contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Editor config files (.idea, .vscode)&lt;/li&gt;
&lt;li&gt;Database files (eg. SQLite)&lt;/li&gt;
&lt;li&gt;Random documents (.pdf, .xls)&lt;/li&gt;
&lt;li&gt;Media files (images, videos, audio)&lt;/li&gt;
&lt;li&gt;Build outputs and artifacts (*.dll files, *.exe, etc)&lt;/li&gt;
&lt;li&gt;Bytecode (eg. *.pyc files for Python)&lt;/li&gt;
&lt;li&gt;Log files (eg. *.log)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having these files in your repo make you look sloppy. Professional developers don't like finding random crap cluttering up their codebase. You can keep these files out of your git repo using a &lt;a href="https://www.atlassian.com/git/tutorials/saving-changes/gitignore"&gt;.gitignore&lt;/a&gt; file. If you already have these files inside your repo, make sure to delete them. If you're using &lt;code&gt;bash&lt;/code&gt; you can use &lt;code&gt;find&lt;/code&gt; to delete all files that match a pattern, like Python bytecode files ending in &lt;code&gt;.pyc&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;find &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;.pyc &lt;span class="nt"&gt;-delete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You can achieve a similar result in Windows PowerShell, but it'll be a little more verbose.&lt;/p&gt;

&lt;p&gt;Sometimes you do need to keep some media files, documents or even small databases in your source control. This is okay to do as long as it's an essential part of running, testing or documenting the code, as opposed to random clutter that you forgot to remove or gitignore. A good example of non-code files that you should keep in source control is website static files, like favicons and fonts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write a README
&lt;/h3&gt;

&lt;p&gt;Your project &lt;em&gt;must&lt;/em&gt; have a README file. This is a file in the root of your project's repository called &lt;code&gt;README.md&lt;/code&gt;. It's a text file written in &lt;a href="https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet"&gt;Markdown&lt;/a&gt; that gives a quick overview of what your project is and what it does. Not having a README makes your project seem crappy, and many people, including me, may close the browser window without checking any code if there isn't one present.&lt;/p&gt;

&lt;p&gt;Here's &lt;a href="https://github.com/anikalegal/clerk"&gt;one I prepared earlier&lt;/a&gt;, and &lt;a href="https://github.com/AnikaLegal/intake"&gt;here's another&lt;/a&gt;. They're not&lt;br&gt;
perfect, but I hope they give you a general idea of what to do.&lt;/p&gt;

&lt;p&gt;One hour of paying attention to your project's README is worth 20 extra hours of coding, when it comes to impressing hiring managers. You know when people mindlessly write that they have "excellent communication skills" on their resume? No one believe that - it's far too easy to just say that. Don't &lt;em&gt;tell them&lt;/em&gt; that you have excellent commuication skills, &lt;em&gt;show them&lt;/em&gt; when you write an excellent README.&lt;/p&gt;

&lt;p&gt;Enough of me waffling about why you should right a README, what do you put in it?&lt;/p&gt;

&lt;p&gt;First, you should describe what your project does at a high level: what problem it solves. It is a command line tool that plays music? Is it a website that finds you low prices on Amazon? Is it a Reddit bot that reminds people? A reader should be able to read the first few sentences and decide if it's something they might want to use. You should summarize the main features of your project in this section.&lt;/p&gt;

&lt;p&gt;A key point to remember is that the employer or recruiter reading your GitHub is both lazy and time-poor. They might not read past the first few sentences... they might not even read the code! They may well assume that your project works without checking anything. Before you rush to pack your README with features that don't exist, you scallywag, note that they may ask you more about your project in a job interview. So, uh... don't lie about anything.&lt;/p&gt;

&lt;p&gt;Beyong a basic overview of your project, it's also good to outline the high-level architecture of your code - how it's structured. For example, in a Django web app, you could explain the different apps that you've implemented and their responsibilities.&lt;/p&gt;

&lt;p&gt;If your project is a website, then you can also talk about the production infrastructure that your website runs on. For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This website is deployed to a DigitalOcean virtual machine. The Django app runs inside a Gunicorn WSGI app server and depends on a Postgres database. A seperate Celery worker process runs offline tasks. Redis is responsible for both caching and serving as a task broker.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or for something a little more simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This project is a static webpage that is hosted on Netlify&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Simply indicating that you know how to deploy your application makes you look good. "Isn't that obvious though?" - you may ask. No, it's not obvious and you need to be explicit.&lt;/p&gt;

&lt;p&gt;A little warning on READMEs: they're for other people to read, not you. Do not include personal to-dos or notes to yourself in your README. Put those somewhere else, like Trello or Workflowy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add a screenshot
&lt;/h3&gt;

&lt;p&gt;Add a screenshot of your website or tool and embed it in the README, it'll take you 10 minutes and it makes it look way better. Store the screenshot in a "docs" folder and embed it in your README using Markdown. If it's a command line app your can use &lt;a href="https://asciinema.org/"&gt;asciinema&lt;/a&gt; to record the tool in action, if your project has a GUI then you can quickly record yourself using the website with &lt;a href="https://www.loom.com/my-videos"&gt;Loom&lt;/a&gt;. This will make your project seem much more impressive for only a small amount of effort.&lt;/p&gt;

&lt;h3&gt;
  
  
  Give instructions for other developers
&lt;/h3&gt;

&lt;p&gt;You should include instructions on how other devs can get started using your project. This is important because it demonstrates that you can document project setup instructions, and also because someone may actually try to run your code. These instructions should state what tools are required to run your project. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You will need Python 3 and pip installed&lt;/li&gt;
&lt;li&gt;You will need yarn and node v11+&lt;/li&gt;
&lt;li&gt;You will need docker and docker-compose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next your should explain the steps, with explicit command line examples if possible, that are required to get the app built or running. If your project has external libraries that need to be installed, then you should have a file that specifies these dependencies, like a &lt;code&gt;requirements.txt&lt;/code&gt; (Python) or &lt;code&gt;package.json&lt;/code&gt; (Node) or &lt;code&gt;Dockerfile&lt;/code&gt; / &lt;code&gt;docker-compose.yaml&lt;/code&gt; (Docker).&lt;/p&gt;

&lt;p&gt;You should also include instructions on how to run your automated tests. You have some tests, right? More on that later.&lt;/p&gt;

&lt;p&gt;If you've scripted your project's deployment, you can mention how to do it here, if you like.&lt;/p&gt;

&lt;h3&gt;
  
  
  Have a nice, readable commit history
&lt;/h3&gt;

&lt;p&gt;If possible, your git commit history should tell a story about what you've been working on. Each commit should represent a distinct unit of work, and the commit message should explain what work was done. For example your commit messages could look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Added smoke tests for payment API&lt;/li&gt;
&lt;li&gt;Refactored image compression&lt;/li&gt;
&lt;li&gt;Added Windows compatibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are differing opions amongst devs on what exactly makes a "good" commit message, but it's very, very clear what bad commit messages look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;zzzz&lt;/li&gt;
&lt;li&gt;add code&lt;/li&gt;
&lt;li&gt;more code&lt;/li&gt;
&lt;li&gt;fuck&lt;/li&gt;
&lt;li&gt;remove shitty code&lt;/li&gt;
&lt;li&gt;fuckfuckfuckfuck&lt;/li&gt;
&lt;li&gt;still broken&lt;/li&gt;
&lt;li&gt;fuck Windows&lt;/li&gt;
&lt;li&gt;zzz&lt;/li&gt;
&lt;li&gt;adsafsf&lt;/li&gt;
&lt;li&gt;broken&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I for one have written my fair share of "zzz"s. This tip is hard to implement if you've already written all your commits. If you're feeling brave, or if you need to remove a few "fucks", you can re-write your commit history with &lt;code&gt;git rebase&lt;/code&gt;. Be warned though, you can lose your code if you screw this up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix your formatting
&lt;/h3&gt;

&lt;p&gt;If I see inconsistent indentation or other poor formatting in someone's code, my opinion of their programming ability drops dramatically. Is this fair? Maybe, maybe not, but that's how it is. Make sure all your code sticks to your language's standard styling conventions. If you don't know what those are, find out, you'll need to learn them eventually. Fixing bad coding style is much easier to do if you use a linter or auto-formatter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add linting or formatting
&lt;/h3&gt;

&lt;p&gt;This one is a bonus, but it's reasonably quick to do. Grab your language community's favorite linter and run it over your code. Something like &lt;code&gt;eslint&lt;/code&gt; for JavaScript or &lt;code&gt;flake8&lt;/code&gt; for Python. For those not in the know, a linter is a program that identifies style issues in your code. You run it over your codebase and it yells at you if you do anything wrong. You think your impostor syndrome is bad? Try using a tool that screams at your about all your shitty style choices. These tools are quite common in-industry and using one will help you stand out from other junior devs.&lt;/p&gt;

&lt;p&gt;Even better than a linter, try using an auto-formatter. I prefer these personally. These tools automatically re-write your code so they conform with a standard style. Examples include &lt;a href="https://golang.org/cmd/gofmt/"&gt;gofmt&lt;/a&gt; for Go, &lt;a href="https://github.com/psf/black"&gt;Black&lt;/a&gt; for Python and&lt;br&gt;
&lt;a href="https://prettier.io/"&gt;Prettier&lt;/a&gt; for JavaScript. I've written more about getting started with Black &lt;a href="https://mattsegal.dev/python-formatting-with-black.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Whatever you choose, make sure you document how to run the linter or formatting tool in your README.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write some tests
&lt;/h3&gt;

&lt;p&gt;Automated code testing is an important part of writing reliable professional-grade software. If you want someone to pay you money to be a professional software developer, then you should demonstrate that you know what a unit test is and how to write one. You don't need to write 100s of tests or get a high test coverage, but write a &lt;em&gt;few&lt;/em&gt; at least.&lt;/p&gt;

&lt;p&gt;Needless to say, explain how to run your tests in your README.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add automated tests
&lt;/h3&gt;

&lt;p&gt;If you want to look super fancy then you can run your automated tests in GitHub Actions. This isn't a must-have but it looks nice. It'll take you 30 minutes if you've already written some tests and you can put a cool "tests passing" badge in your README that looks really good. I've written more on how to do this &lt;a href="https://mattsegal.dev/pytest-on-github-actions.html"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy your project
&lt;/h3&gt;

&lt;p&gt;If your project is a website then make sure it's deployed and available online. If you have deployed it, make sure there's a link to the live site in the README. This could be a large undertaking, taking hours or days, especially if you haven't done this before, so I'll leave it to you do decide if it's worthwhile.&lt;/p&gt;

&lt;p&gt;If your project is a Django app and you want to get it online, then you might like my guide on &lt;a href="https://mattsegal.dev/simple-django-deployment.html"&gt;simple Django deployments&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add documentation
&lt;/h3&gt;

&lt;p&gt;This is a high effort endeavour so I don't really recommend it if you're just trying to quickly improve the appeal of your project. That said, building HTML documentation with something like &lt;a href="https://www.sphinx-doc.org/en/master/"&gt;Sphinx&lt;/a&gt; and hosting it on &lt;a href="https://pages.github.com/"&gt;GitHub Pages&lt;/a&gt; looks pretty pro. This only really makes sense if your app is reasonably complicated and requires documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next steps
&lt;/h3&gt;

&lt;p&gt;I mention GitHub a lot in this post, but the same tips apply for projects hosted on Bitbucket and GitLab. All these tips also apply to employer-supplied coding tests that are hosted on GitHub, although I'd caution you not to spend too much time jazzing up coding tests: too many beautiful submissions end up in the garbage.&lt;/p&gt;

&lt;p&gt;Now you should have a few things you can do to spiff up your projects before you show them to prospective employers. I think it's important to make sure that the code that you've spent hours on isn't overlooked or dismissed because you didn't write a README.&lt;/p&gt;

&lt;p&gt;Good luck, and please don't hesitate to mail me money if this post helps you get a job.&lt;/p&gt;

&lt;p&gt;If you enjoyed reading this, then you might like my other blog posts over at &lt;a href="https://mattsegal.dev/"&gt;mattsegal.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>github</category>
    </item>
  </channel>
</rss>
