<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Greg Schafer</title>
    <description>The latest articles on DEV Community by Greg Schafer (@grschafer).</description>
    <link>https://dev.to/grschafer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F523939%2F7e065caf-1e91-4486-858d-2cb0786d08a1.png</url>
      <title>DEV Community: Greg Schafer</title>
      <link>https://dev.to/grschafer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/grschafer"/>
    <language>en</language>
    <item>
      <title>Abusing Terraform to Upload Static Websites to S3</title>
      <dc:creator>Greg Schafer</dc:creator>
      <pubDate>Wed, 06 Oct 2021 18:27:29 +0000</pubDate>
      <link>https://dev.to/tangramvision/abusing-terraform-to-upload-static-websites-to-s3-pj5</link>
      <guid>https://dev.to/tangramvision/abusing-terraform-to-upload-static-websites-to-s3-pj5</guid>
      <description>&lt;p&gt;S3 has been a great option for hosting static websites for a long time, but it's still a pain to set up by hand. You need to traverse dozens of pages in the AWS Console to create and manage users, buckets, certificates, a CDN, and about a hundred different configuration options. If you do this repeatedly, it gets old fast. We can automate the process with Terraform, a well-known "infrastructure as code" tool, which lets us declare resources (e.g. servers, storage buckets, users, policies, DNS records) and let Terraform figure out how to build and connect them.&lt;/p&gt;

&lt;p&gt;Terraform can create the infrastructure needed for a static website on AWS (e.g. users, bucket, CDN, DNS), &lt;em&gt;and&lt;/em&gt; it can create and update the content (e.g. webpages, CSS/JS files, images), which goes outside the &lt;em&gt;infrastructure&lt;/em&gt; part of "infrastructure as code" and is why I'm labeling it as an abuse or misuse of Terraform. Still, it works and has a few benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can define the bucket, properties, DNS, CDN, etc. in the same place as your content&lt;/li&gt;
&lt;li&gt;You have a fully-automated process for standing up websites that only requires a single tool, Terraform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;... and a few downsides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uploading files is slow compared to something like the &lt;a href="https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html" rel="noopener noreferrer"&gt;AWS CLI's sync command&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Terraform isn't meant for transforming or managing &lt;em&gt;content&lt;/em&gt;, so you may outgrow Terraform's capabilities if you want advanced features or optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article will breeze over the infrastructure parts of creating a static website on AWS and focus more on how to upload content and manage content metadata (MIME types and caching behavior). If you want to learn more about the infrastructure parts (e.g. setting up CloudFront, an SSL certificate, DNS routes), there are many great tutorials out there. Here are a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.alexhyett.com/terraform-s3-static-website-hosting/" rel="noopener noreferrer"&gt;https://www.alexhyett.com/terraform-s3-static-website-hosting/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/modern-stack/5-minute-static-ssl-website-in-aws-with-terraform-76819a12d412" rel="noopener noreferrer"&gt;https://medium.com/modern-stack/5-minute-static-ssl-website-in-aws-with-terraform-76819a12d412&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@dblencowe/hosting-a-static-website-on-s3-using-terraform-0-12-aa5ffe4103e" rel="noopener noreferrer"&gt;https://medium.com/@dblencowe/hosting-a-static-website-on-s3-using-terraform-0-12-aa5ffe4103e&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's get on to the code! If you want just the code, you can find it here: &lt;a href="https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload" rel="noopener noreferrer"&gt;https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Boilerplate
&lt;/h2&gt;

&lt;p&gt;We need &lt;em&gt;some&lt;/em&gt; boilerplate to set up infrastructure before we can upload files to an S3 bucket. So, let's create a bucket with Terraform and the &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs" rel="noopener noreferrer"&gt;AWS provider&lt;/a&gt;. We'll configure the provider and create the bucket in a &lt;code&gt;main.tf&lt;/code&gt; file containing the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform &lt;span class="o"&gt;{&lt;/span&gt;
  required_providers &lt;span class="o"&gt;{&lt;/span&gt;
    aws &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="nb"&gt;source&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      version &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"3.60.0"&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

provider &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;# This should match the profile name in the credentials file described below&lt;/span&gt;
  profile &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"aws_admin"&lt;/span&gt;
  &lt;span class="c"&gt;# Choose the region where you want the S3 bucket to be hosted&lt;/span&gt;
  region  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-west-1"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# To avoid repeatedly specifying the path, we'll declare it as a variable&lt;/span&gt;
variable &lt;span class="s2"&gt;"website_root"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;type&lt;/span&gt;        &lt;span class="o"&gt;=&lt;/span&gt; string
  description &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Path to the root of website content"&lt;/span&gt;
  default     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../content"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

resource &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"my_static_website"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  bucket &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"blog-example-m9wtv64y"&lt;/span&gt;
  acl    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"private"&lt;/span&gt;

  website &lt;span class="o"&gt;{&lt;/span&gt;
    index_document &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"index.html"&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# To print the bucket's website URL after creation&lt;/span&gt;
output &lt;span class="s2"&gt;"website_endpoint"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  value &lt;span class="o"&gt;=&lt;/span&gt; aws_s3_bucket.my_static_website.website_endpoint
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Credentials
&lt;/h3&gt;

&lt;p&gt;To create or interact with AWS resources, we need to provide credentials. The AWS Terraform provider &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs#authentication" rel="noopener noreferrer"&gt;accepts authentication in a variety of ways&lt;/a&gt;, but I'm going to use a credential file. That file is located at &lt;code&gt;~/.aws/credentials&lt;/code&gt; and looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;aws_admin]
aws_access_key_id &lt;span class="o"&gt;=&lt;/span&gt; AKIA...
aws_secret_access_key &lt;span class="o"&gt;=&lt;/span&gt; ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't have credentials handy, you can &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console" rel="noopener noreferrer"&gt;follow AWS documentation to create a new user&lt;/a&gt; with a policy that grants S3 permissions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Uploading Files to S3 with Terraform
&lt;/h2&gt;

&lt;p&gt;Here's where we start using Terraform... creatively, i.e. for managing content instead of just infrastructure. For the content, I've created a &lt;a href="https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload/content" rel="noopener noreferrer"&gt;basic multi-page website&lt;/a&gt; — a couple HTML files, a CSS file, and a couple images. By using Terraform's &lt;a href="https://www.terraform.io/docs/language/functions/fileset.html" rel="noopener noreferrer"&gt;fileset function&lt;/a&gt; and the AWS provider's &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_object" rel="noopener noreferrer"&gt;s3_bucket_object resource&lt;/a&gt;, we can collect all the files in a directory and upload all of them to objects in S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# in main.tf, below the aforementioned boilerplate&lt;/span&gt;
resource &lt;span class="s2"&gt;"aws_s3_bucket_object"&lt;/span&gt; &lt;span class="s2"&gt;"file"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  for_each &lt;span class="o"&gt;=&lt;/span&gt; fileset&lt;span class="o"&gt;(&lt;/span&gt;var.website_root, &lt;span class="s2"&gt;"**"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

  bucket      &lt;span class="o"&gt;=&lt;/span&gt; aws_s3_bucket.my_static_website.id
  key         &lt;span class="o"&gt;=&lt;/span&gt; each.key
  &lt;span class="nb"&gt;source&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  source_hash &lt;span class="o"&gt;=&lt;/span&gt; filemd5&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  acl         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"public-read"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://www.terraform.io/docs/language/meta-arguments/for_each.html" rel="noopener noreferrer"&gt;for_each meta-argument&lt;/a&gt; loops over all files in the website directory tree, binding the file path (&lt;code&gt;index.html&lt;/code&gt;, &lt;code&gt;assets/normalize.css&lt;/code&gt;, etc.) to &lt;code&gt;each.key&lt;/code&gt;, which can be used elsewhere in the block. The &lt;code&gt;source_hash&lt;/code&gt; argument hashes the file, which helps Terraform determine when the file has changed and needs to be re-uploaded to the S3 bucket. (There's a similar &lt;code&gt;etag&lt;/code&gt; argument, but it &lt;a href="https://github.com/hashicorp/terraform-provider-aws/pull/11522" rel="noopener noreferrer"&gt;doesn't work when some kinds of S3 encryption are enabled&lt;/a&gt;.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform Apply
&lt;/h2&gt;

&lt;p&gt;With our trusty &lt;code&gt;main.tf&lt;/code&gt; file in hand, we can now invoke dark and mysterious powers, conjuring infinite computational power out of nothing! With the merest flourish of our terminal, unfathomable forces precipitate to our whim — we are the tactician, the champion and commander over greater numbers than were ever deployed in any Greek myth!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7332ae877ea9e15ce%2F615ddb4010f7309ce247e407_206p.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7332ae877ea9e15ce%2F615ddb4010f7309ce247e407_206p.gif" alt="206p.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ahem... anyway, do the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Initialize terraform in the current directory and download the AWS provider&lt;/span&gt;
terraform init
&lt;span class="c"&gt;# Preview what changes will be made&lt;/span&gt;
terraform plan
&lt;span class="c"&gt;# Make the changes (create and populate the S3 bucket)&lt;/span&gt;
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the end of the output from the apply command, you should see the website endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;...
Apply &lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; Resources: 6 added, 0 changed, 0 destroyed.

Outputs:

website_endpoint &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"blog-example-m9wtv64y.s3-website-us-west-1.amazonaws.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Content Types, MIME Types, Oh My
&lt;/h2&gt;

&lt;p&gt;Let's visit that URL in a browser and...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7332ae877ea9e15ce%2F615ddb3f94487f170cdf91dc_aws_screenshot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7332ae877ea9e15ce%2F615ddb3f94487f170cdf91dc_aws_screenshot.png" alt="aws_screenshot.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's not what we expected. It turns out that S3 assigns a content type of &lt;code&gt;binary/octet-stream&lt;/code&gt; to uploaded files by default. When visiting the website endpoint URL (which serves the &lt;code&gt;index.html&lt;/code&gt; file), the browser sees that &lt;code&gt;Content-Type: binary/octet-stream&lt;/code&gt; header and thinks "This is a binary file, so I'll prompt the user to download it".&lt;/p&gt;

&lt;p&gt;We would prefer the browser to treat our HTML files as HTML, the CSS files as CSS, and so on. For that, we need the browser to receive the correct &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types" rel="noopener noreferrer"&gt;MIME type&lt;/a&gt; (e.g. &lt;code&gt;text/html&lt;/code&gt;, &lt;code&gt;text/css&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;) in the &lt;code&gt;Content-Type&lt;/code&gt; header. The easiest way to do that is to specify the correct &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_object#content_type" rel="noopener noreferrer"&gt;content type when uploading files&lt;/a&gt;. To determine the correct type of our files, there are 2 approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Determining MIME Types with a CLI Tool
&lt;/h3&gt;

&lt;p&gt;The first approach is to use a command-line tool like &lt;code&gt;file&lt;/code&gt;, &lt;code&gt;xdg-mime&lt;/code&gt; or &lt;code&gt;mimetype&lt;/code&gt;. These tools use different approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;file&lt;/code&gt; uses "magic tests" (looking for identifying bits at a small fixed offset into the file) to determine the type of files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;xdg-mime&lt;/code&gt; and &lt;code&gt;mimetype&lt;/code&gt; match against the file extension first, falling back to using &lt;code&gt;file&lt;/code&gt; if the file doesn't have an extension&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The below shell session demonstrates basic usage of each command (a dollar sign is used to distinguish input commands from output results):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Demo of file&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;file &lt;span class="nt"&gt;--brief&lt;/span&gt; &lt;span class="nt"&gt;--mime-type&lt;/span&gt; index.html
text/html
&lt;span class="nv"&gt;$ &lt;/span&gt;file &lt;span class="nt"&gt;--brief&lt;/span&gt; &lt;span class="nt"&gt;--mime-type&lt;/span&gt; assets/normalize.css
text/plain

&lt;span class="c"&gt;# Demo of xdg-mime&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;xdg-mime query filetype index.html
text/html
&lt;span class="nv"&gt;$ &lt;/span&gt;xdg-mime query filetype assets/normalize.css
text/css

&lt;span class="c"&gt;# Demo of mimetype&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;mimetype &lt;span class="nt"&gt;--brief&lt;/span&gt; index.html
text/html
&lt;span class="nv"&gt;$ &lt;/span&gt;mimetype &lt;span class="nt"&gt;--brief&lt;/span&gt; assets/normalize.css
text/css
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A subtle detail in the above is that &lt;code&gt;file&lt;/code&gt; may not label text files very precisely — it outputs the CSS file as &lt;code&gt;text/plain&lt;/code&gt; instead of &lt;code&gt;text/css&lt;/code&gt; because there's no magic test or consistent file header that can identify CSS files (nor the many other variations of text file types).&lt;/p&gt;

&lt;p&gt;To determine MIME types with a CLI tool in our Terraform file, we'll add three pieces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An &lt;a href="https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source" rel="noopener noreferrer"&gt;external data source&lt;/a&gt; which, for each file to be uploaded, will call...&lt;/li&gt;
&lt;li&gt;An external script that calls a CLI tool (e.g. &lt;code&gt;mimetype&lt;/code&gt;) to determine the file's MIME type&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;content_type&lt;/code&gt; argument of the &lt;code&gt;aws_s3_bucket_object&lt;/code&gt; resource to assign the MIME type for each uploaded file&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The external data source is a new block in &lt;code&gt;main.tf&lt;/code&gt; as follows (I've turned the file list into a &lt;a href="https://www.terraform.io/docs/language/values/locals.html" rel="noopener noreferrer"&gt;local value&lt;/a&gt;, because we're using it in multiple places now):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;locals &lt;span class="o"&gt;{&lt;/span&gt;
  website_files &lt;span class="o"&gt;=&lt;/span&gt; fileset&lt;span class="o"&gt;(&lt;/span&gt;var.website_root, &lt;span class="s2"&gt;"**"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

data &lt;span class="s2"&gt;"external"&lt;/span&gt; &lt;span class="s2"&gt;"get_mime"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  for_each &lt;span class="o"&gt;=&lt;/span&gt; local.website_files
  program  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bash"&lt;/span&gt;, &lt;span class="s2"&gt;"./get_mime.sh"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  query &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    filepath : &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_content_filepath&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data source calls &lt;code&gt;bash ./get_mime.sh&lt;/code&gt; once for each file, passing the filepath as &lt;a href="https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source#external-program-protocol" rel="noopener noreferrer"&gt;JSON to stdin&lt;/a&gt;. Using &lt;a href="https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source#processing-json-in-shell-scripts" rel="noopener noreferrer"&gt;the example from the Terraform docs&lt;/a&gt;, we can implement the bash script to grab the JSON filepath from stdin, run &lt;code&gt;mimetype&lt;/code&gt; on the file, and export the result as a JSON object on stdout.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;# Exit if any of the intermediate steps fail&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="c"&gt;# Extract "filepath" from the input JSON into FILEPATH shell variable.&lt;/span&gt;
&lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'@sh "FILEPATH=\(.filepath)"'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Run mimetype on filepath to get the correct mime type.&lt;/span&gt;
&lt;span class="nv"&gt;MIME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;mimetype &lt;span class="nt"&gt;--brief&lt;/span&gt; &lt;span class="nv"&gt;$FILEPATH&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Safely produce a JSON object containing the result value.&lt;/span&gt;
jq &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;--arg&lt;/span&gt; mime &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MIME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s1"&gt;'{"mime":$mime}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally in &lt;code&gt;main.tf&lt;/code&gt;, we associate the correct MIME type from the bash script with the file when uploading to S3&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;resource &lt;span class="s2"&gt;"aws_s3_bucket_object"&lt;/span&gt; &lt;span class="s2"&gt;"file"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  for_each &lt;span class="o"&gt;=&lt;/span&gt; local.website_files

  bucket       &lt;span class="o"&gt;=&lt;/span&gt; aws_s3_bucket.my_static_website.id
  key          &lt;span class="o"&gt;=&lt;/span&gt; each.key
  &lt;span class="nb"&gt;source&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  source_hash  &lt;span class="o"&gt;=&lt;/span&gt; filemd5&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  acl          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"public-read"&lt;/span&gt;
  &lt;span class="c"&gt;# added:&lt;/span&gt;
  content_type &lt;span class="o"&gt;=&lt;/span&gt; data.external.get_mime[each.key].result.mime
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Determining MIME Types with a File Extension Map
&lt;/h3&gt;

&lt;p&gt;The second approach to determining correct MIME types for our files is to simply provide a map of file extensions to MIME types. I first ran into this approach (for uploading files with Terraform) in &lt;a href="https://engineering.statefarm.com/blog/terraform-s3-upload-with-mime/" rel="noopener noreferrer"&gt;this article on the StateFarm engineering blog&lt;/a&gt;, but it's a common approach in general:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://registry.terraform.io/modules/hashicorp/dir/template/latest" rel="noopener noreferrer"&gt;hashicorp/dir/template Terraform module&lt;/a&gt; has a &lt;a href="https://github.com/hashicorp/terraform-template-dir/blob/17b81de441645a94f4db1449fc8269cd32f26fde/variables.tf#L18" rel="noopener noreferrer"&gt;mapping of extensions and MIME types&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;Sidenote: An &lt;a href="https://github.com/hashicorp/terraform/issues/27737" rel="noopener noreferrer"&gt;open Terraform issue&lt;/a&gt; requesting native MIME type detection directs users to use this Terraform module.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The &lt;a href="https://github.com/aws/aws-cli/blob/8df550b8c28c1fa71d5c680f998e46107596f198/awscli/customizations/s3/utils.py#L340" rel="noopener noreferrer"&gt;AWS CLI uses the python mimetypes module&lt;/a&gt;, which has a &lt;a href="https://github.com/python/cpython/blob/97ea18ecede8bfd33d5ab2dd0e7e2aada2051111/Lib/mimetypes.py#L431" rel="noopener noreferrer"&gt;built-in mapping&lt;/a&gt; as a fallback if it can't read a mapping from the system (at &lt;code&gt;/etc/mime.types&lt;/code&gt;)&lt;/li&gt;

&lt;li&gt;In non-desktop environments, the &lt;a href="https://cgit.freedesktop.org/xdg/xdg-utils/tree/scripts/xdg-mime.in#n100" rel="noopener noreferrer"&gt;xdg-mime tool falls back to using the mimetype tool&lt;/a&gt;, which &lt;a href="https://github.com/mbeijen/File-MimeInfo/blob/master/lib/File/MimeInfo/Magic.pm#L31" rel="noopener noreferrer"&gt;checks file extensions before performing magic tests&lt;/a&gt; (for the most part)&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;To use this approach, we add a &lt;code&gt;mime.json&lt;/code&gt; file that maps file extensions to MIME types for whatever files we need to upload. It could be as simple as the below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;".html"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text/html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;".css"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text/css"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;".png"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image/png"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we load that file as a local variable in Terraform and use it when looking up the content type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;locals &lt;span class="o"&gt;{&lt;/span&gt;
  website_files &lt;span class="o"&gt;=&lt;/span&gt; fileset&lt;span class="o"&gt;(&lt;/span&gt;var.website_root, &lt;span class="s2"&gt;"**"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

  mime_types &lt;span class="o"&gt;=&lt;/span&gt; jsondecode&lt;span class="o"&gt;(&lt;/span&gt;file&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mime.json"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

resource &lt;span class="s2"&gt;"aws_s3_bucket_object"&lt;/span&gt; &lt;span class="s2"&gt;"file"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  for_each &lt;span class="o"&gt;=&lt;/span&gt; local.website_files

  bucket       &lt;span class="o"&gt;=&lt;/span&gt; aws_s3_bucket.my_static_website.id
  key          &lt;span class="o"&gt;=&lt;/span&gt; each.key
  &lt;span class="nb"&gt;source&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  source_hash  &lt;span class="o"&gt;=&lt;/span&gt; filemd5&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.key&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  acl          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"public-read"&lt;/span&gt;
  content_type &lt;span class="o"&gt;=&lt;/span&gt; lookup&lt;span class="o"&gt;(&lt;/span&gt;local.mime_types, regex&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.[^.]+$"&lt;/span&gt;, each.key&lt;span class="o"&gt;)&lt;/span&gt;, null&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mapping-based approach has the advantages of being simple and more cross-platform than shelling out to CLI tools. The downside is that you need to make sure all filetypes you're using exist in the extension-to-MIME mapping and are correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing a Stale CloudFront Cache
&lt;/h2&gt;

&lt;p&gt;Now we have &lt;a href="http://blog-example-m9wtv64y.s3-website-us-west-1.amazonaws.com/" rel="noopener noreferrer"&gt;a working static website&lt;/a&gt; that we can visit in our browser! If you don't care about SSL or caching for some reason, you could stop here. But, I would argue that an important part of modern websites is making them secure and fast, so you'll likely want to put a CloudFront distribution in front of your S3 bucket. There are many other tutorials (such as all the ones linked at the top of this article) that cover CloudFront, so I won't dig into the details of that. However, I do want to dig into a problem that you run into when serving a static website via CloudFront: a stale cache.&lt;/p&gt;

&lt;p&gt;By default, CloudFront applies a TTL of 86400 seconds (1 day), meaning CloudFront will fetch website files from your S3 bucket and serve the same files to visitors for a full day before re-fetching from S3. If you update website content (e.g. change CSS styles or javascript behavior) in S3, visitors may continue receiving cached versions from CloudFront and won't see your updates for up to a whole day! We'd prefer visitors to see the latest version of all website content, but we'd also like CloudFront to cache files as long as possible, so files can be served faster (directly from cache).&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Busting
&lt;/h3&gt;

&lt;p&gt;One solution is &lt;a href="https://javascript.plainenglish.io/what-is-cache-busting-55366b3ac022" rel="noopener noreferrer"&gt;cache-busting&lt;/a&gt;, which involves adding a hash (or "fingerprint") to non-HTML files' names. If the files' content changes, then the hash changes, so the browser downloads a completely different file (which can be cached forever).&lt;/p&gt;

&lt;p&gt;I tried to implement this with Terraform, but uh... Terraform isn't meant for this sort of thing. Between the Terraform &lt;a href="https://www.terraform.io/docs/language/functions/filemd5.html" rel="noopener noreferrer"&gt;filemd5&lt;/a&gt; and &lt;a href="https://www.terraform.io/docs/language/functions/regex.html" rel="noopener noreferrer"&gt;regex&lt;/a&gt; functions, you can get close, but I hit a wall when trying to &lt;a href="https://www.terraform.io/docs/language/functions/replace.html" rel="noopener noreferrer"&gt;replace&lt;/a&gt; filenames with their hashed version in all files. This could maybe work if you used &lt;a href="https://www.terraform.io/docs/language/functions/templatefile.html" rel="noopener noreferrer"&gt;template&lt;/a&gt; variables (e.g. &lt;code&gt;&amp;lt;link href="${main.css}"&amp;gt;&lt;/code&gt; instead of &lt;code&gt;&amp;lt;link ref="main.css"&amp;gt;&lt;/code&gt;), but then you can no longer browse your website via the filesystem or a local server. Alas, here dies my ill-advised dream of making a Terraform-based static-site generator/bundler.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7332ae877ea9e15ce%2F615ddb40a9390dfd8a805bba_melting_emoji.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7332ae877ea9e15ce%2F615ddb40a9390dfd8a805bba_melting_emoji.png" alt="melting_emoji.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fun fact: the &lt;a href="https://www.unicode.org/L2/L2020/20072-melting-face-emoji.pdf" rel="noopener noreferrer"&gt;melting face emoji&lt;/a&gt; was recently approved!&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Invalidation
&lt;/h3&gt;

&lt;p&gt;The other solution to a stale CloudFront cache is &lt;a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html" rel="noopener noreferrer"&gt;invalidating files&lt;/a&gt;. This approach does not fit into Terraform's declarative paradigm — there are no resources for invalidations in the AWS provider and no third-party modules either. So, it requires more hacky-ness, in the form of a &lt;a href="https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource" rel="noopener noreferrer"&gt;null_resource&lt;/a&gt; that triggers based on changes in file hashes and &lt;a href="https://www.terraform.io/docs/language/resources/provisioners/local-exec.html" rel="noopener noreferrer"&gt;shells out&lt;/a&gt; to the AWS CLI to create a new invalidation. That approach might look something like the below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;locals &lt;span class="o"&gt;{&lt;/span&gt;
  website_files &lt;span class="o"&gt;=&lt;/span&gt; fileset&lt;span class="o"&gt;(&lt;/span&gt;var.website_root, &lt;span class="s2"&gt;"**"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

  file_hashes &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;filename &lt;span class="k"&gt;in &lt;/span&gt;local.website_files :
    filename &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; filemd5&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.website_root&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;filename&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

resource &lt;span class="s2"&gt;"null_resource"&lt;/span&gt; &lt;span class="s2"&gt;"invalidate_cache"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  triggers &lt;span class="o"&gt;=&lt;/span&gt; locals.file_hashes

  provisioner &lt;span class="s2"&gt;"local-exec"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"aws --profile=aws_admin cloudfront create-invalidation --distribution-id=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;aws_cloudfront_distribution&lt;/span&gt;&lt;span class="p"&gt;.my_distribution.id&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; --paths=/*"&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The null resource is a new provider, so you'll need to run &lt;code&gt;terraform init&lt;/code&gt; again.&lt;/p&gt;

&lt;h2&gt;
  
  
  What About Browser Caching?
&lt;/h2&gt;

&lt;p&gt;We've talked about CloudFront caching, but there's another cache in between your content and your visitor: the browser. The browser cache and the &lt;code&gt;Cache-Control&lt;/code&gt; header are a big topic all on their own; &lt;a href="https://csswizardry.com/2019/03/cache-control-for-civilians/" rel="noopener noreferrer"&gt;Harry Roberts's Cache-Control for Civilians&lt;/a&gt; is a great resource if you want to learn more.&lt;/p&gt;

&lt;p&gt;For the purpose of this article, it's important to note that you shouldn't set an aggressive cache control header (e.g. &lt;code&gt;Cache-Control: public, max-age=604800, immutable&lt;/code&gt;) on your website files without fingerprinting them. Otherwise, visitors' browsers will keep serving a file from their local cache for the &lt;code&gt;max-age&lt;/code&gt; duration (one week, in the above example) before they send a request to CloudFront to check if the file is stale. CloudFront invalidations force CloudFront to fetch fresh content, but have no impact on the caching of visitors' browsers.&lt;/p&gt;




&lt;p&gt;That's all for this adventure — thanks for joining me in pushing Terraform out of its comfort zone! If you have any suggestions or corrections, please let me know or &lt;a href="https://www.twitter.com/tangramvision" rel="noopener noreferrer"&gt;send us a tweet&lt;/a&gt;, and if you’re curious to learn more about how we improve perception sensors, visit us at &lt;a href="https://www.tangramvision.com/" rel="noopener noreferrer"&gt;Tangram Vision&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Creating PostgreSQL Test Data with SQL, PL/pgSQL, and Python</title>
      <dc:creator>Greg Schafer</dc:creator>
      <pubDate>Fri, 30 Apr 2021 21:18:30 +0000</pubDate>
      <link>https://dev.to/tangramvision/creating-postgresql-test-data-with-sql-pl-pgsql-and-python-efj</link>
      <guid>https://dev.to/tangramvision/creating-postgresql-test-data-with-sql-pl-pgsql-and-python-efj</guid>
      <description>&lt;p&gt;After exploring various ways to &lt;a href="https://www.tangramvision.com/blog/loading-test-data-into-postgresql" rel="noopener noreferrer"&gt;load test data into PostgreSQL for my last blog post&lt;/a&gt;, I wanted to dive into different approaches for &lt;em&gt;generating&lt;/em&gt; test data for PostgreSQL. Generating test data, rather than using static manually-created data, can be valuable for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing the logic for generating test data forces you to take a second look at your data model and consider what values are allowed and which values are edge cases.&lt;/li&gt;
&lt;li&gt;Tools for generating test data make it easier to set up data per test. I would argue this is better than the alternatives of (a) hand-creating data per test or (b) trying to maintain a single dataset that is used across the entire test suite. The first option is tedious, and the second option can be brittle. As an example, if you're testing an e-commerce website and your test suite uses hard-coded product details and deactivating the product in your test dataset causes many tests to unexpectedly fail, then those tests were reliant on a pre-condition that happened to be satisfied in your test dataset. Generating data per test can make such pre-conditions more explicit and clear, especially for colleagues who inherit your tests and test data in the future.&lt;/li&gt;
&lt;li&gt;Unless you already have a large dataset from a production environment or a partner company that you can use (hopefully after anonymization!), generating test data is the only way to get large datasets for benchmarking and load testing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similar to the previous article, if you're using an Object-Relational Mapping (ORM) library, then you'll probably create and persist objects into the database using the ORM or use the ORM to dump and restore test data fixtures using JSON or CSV. If you're not using an ORM, the approaches in this article may provide some learning or inspiration for how you can best generate data for your particular testing situation.&lt;/p&gt;

&lt;h1&gt;
  
  
  Follow Along with Docker
&lt;/h1&gt;

&lt;p&gt;Similar to the &lt;a href="https://www.tangramvision.com/blog/loading-test-data-into-postgresql" rel="noopener noreferrer"&gt;previous article&lt;/a&gt;, you can follow along using Docker and the scripts in a subfolder of our Tangram Vision blog repo: &lt;a href="https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.30_GeneratingTestDataInPostgreSQL" rel="noopener noreferrer"&gt;https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.30_GeneratingTestDataInPostgreSQL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unlike the previous article, I've provided a Dockerfile to add Python into the Postgres Docker image so we can run Python inside the PostgreSQL database. As described in the repo's README, you can build the docker image and run examples with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres-test-data-blogpost

&lt;span class="c"&gt;# The base postgres image requires a password to be set, but we'll just be&lt;/span&gt;
&lt;span class="c"&gt;# testing locally, so no need to set a strong password.&lt;/span&gt;
docker run &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;foo &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/schema.sql:/docker-entrypoint-initdb.d/schema.sql &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:/repo &lt;span class="se"&gt;\&lt;/span&gt;
    postgres-test-data-blogpost &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;log_statement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo contains a variety of files that start with &lt;code&gt;add-data-&lt;/code&gt; which demonstrate different ways of loading and generating test data. After the Postgres Docker container is running, you can run &lt;code&gt;add-data-&lt;/code&gt; files in a new terminal window with a command like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--workdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/repo postgres &lt;span class="se"&gt;\&lt;/span&gt;
    psql &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost &lt;span class="nt"&gt;--username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="se"&gt;\&lt;/span&gt;
         &lt;span class="nt"&gt;--file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;add-data-insert-random.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to interactively poke around the database with &lt;code&gt;psql&lt;/code&gt;, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--interactive&lt;/span&gt; &lt;span class="nt"&gt;--tty&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
    psql &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost &lt;span class="nt"&gt;--username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Sample Schema
&lt;/h1&gt;

&lt;p&gt;For example code and data, I'll use the following simple schema again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Musical artists have a name&lt;/li&gt;
&lt;li&gt;An artist can have many albums (one-to-many), which have a title and release date&lt;/li&gt;
&lt;li&gt;Genres have a name&lt;/li&gt;
&lt;li&gt;Albums can belong to many genres (many-to-many)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6anullsha0jkb5skd67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6anullsha0jkb5skd67.png" alt="Sample schema relating musical artists, albums, and genres.&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Sample schema relating musical artists, albums, and genres.&lt;/em&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Generating Data
&lt;/h1&gt;

&lt;p&gt;Using static datasets has advantages (you know exactly what data is in your database), but they can be tedious to maintain over time and impractical to create if you need a lot of data (e.g. for benchmarking or load testing). Generating data is an alternative approach which lets you define how data should look in one place and then generate and use as much data as you like.&lt;/p&gt;

&lt;p&gt;There are a few different tools for generating test data that are worth exploring, from plain ol' SQL to higher-level programming languages like Python.&lt;/p&gt;
&lt;h2&gt;
  
  
  SQL
&lt;/h2&gt;

&lt;p&gt;If you're like me, you may have started this article not expecting SQL to be capable of generating test data. With &lt;code&gt;[generate_series](https://www.postgresql.org/docs/current/functions-srf.html)&lt;/code&gt; and &lt;code&gt;[random](https://www.postgresql.org/docs/current/functions-math.html#FUNCTIONS-MATH-RANDOM-TABLE)&lt;/code&gt; and a little creativity, however, SQL is well-equipped to generate a variety of data.&lt;/p&gt;

&lt;p&gt;To create 5 artists with 8 random hex characters for their names, you can do the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to use random words instead of random hex characters, you can pick words from the system dictionary. I've copied Ubuntu's &lt;code&gt;american-english&lt;/code&gt; word list to &lt;code&gt;/usr/share/dict/words&lt;/code&gt; in the Docker image, so we just need to load it and pick a word randomly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Temporary tables are only accessible to the current psql session and are&lt;/span&gt;
&lt;span class="c1"&gt;-- dropped at the end of the session.&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TEMPORARY&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- The WHERE clauses excludes possessive words (almost 30k of them!)&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'/usr/share/dict/words'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Randomly order the table and pick the first result&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No joke, the first word that the above query returned for me was "bravo". I don't know whether to be encouraged or creeped out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8r46s9u2268z559gzmu.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8r46s9u2268z559gzmu.jpg" alt="Is this a pigeon meme: Generating test data, is this artificial intelligence?"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On a separate note, the dictionary contains words that may be offensive and inappropriate in some settings. If you're pulling test data from the dictionary and don't want these words to pop up in your next demo to customers/bosses, make sure to take appropriate precautions!&lt;/p&gt;

&lt;p&gt;Anyway, moving on... using these tools (and a few more), we can generate interesting test data for all of our tables. Comments in the code below explain extra functions and techniques being used.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-insert-random.sql in the sample code repo&lt;/span&gt;

&lt;span class="c1"&gt;-- Use 8 random hex chars as the genre name.&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="c1"&gt;-- Pick one random word as the artist name.&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;released&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="c1"&gt;-- Select a random artist from the artists table.&lt;/span&gt;
  &lt;span class="c1"&gt;-- NOTE: random() is only evaluated once in this subquery unless it depends on&lt;/span&gt;
  &lt;span class="c1"&gt;-- the outer query, hence the "_g*0" after random().&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

  &lt;span class="c1"&gt;-- Select the first 1-3 rows after randomly sorting the word list, then join&lt;/span&gt;
  &lt;span class="c1"&gt;-- them with spaces between each word and capitalize the first letter of each&lt;/span&gt;
  &lt;span class="c1"&gt;-- word.&lt;/span&gt;
  &lt;span class="n"&gt;initcap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="n"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;

  &lt;span class="c1"&gt;-- Subtract between 0-5 years from today as the album release date.&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'5 years'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;())::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Assign a random album a random genre. Repeat 10 times.&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;album_genres&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;album_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;genre_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;
&lt;span class="c1"&gt;-- If we insert a row that already exists, do nothing (don't raise an error)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTHING&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But that's not all! We can define functions in SQL to reuse logic — if we want genres, artist names, and album titles to all be random words, then we can move random-word-picking into a function and use it in many places:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-insert-random-function.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;generate_random_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_words&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;initcap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="n"&gt;num_words&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_random_title&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;-- Generate 1-2 random words as the artist name.&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_random_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;_g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  PL/pgSQL
&lt;/h2&gt;

&lt;p&gt;If the declarative style of SQL is awkward/difficult, we can turn to &lt;a href="https://www.postgresql.org/docs/current/plpgsql.html" rel="noopener noreferrer"&gt;PL/pgSQL&lt;/a&gt; to generate test data in PostgreSQL using a more procedural/imperative programming style. PL/pgSQL provides familiar programming concepts like variables, conditionals, loops, return statements, and exception handling.&lt;/p&gt;

&lt;p&gt;To demonstrate some of what PL/pgSQL can do, let's specify some more requirements for our generated data — roughly half of our artists should have names starting with "DJ" and all albums by DJ artists should belong to an "Electronic" genre. That implementation might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-plpgsql-insert.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
  &lt;span class="c1"&gt;-- Declare (and optionally assign) variables used in the below code block.&lt;/span&gt;
  &lt;span class="n"&gt;genre_options&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Hip Hop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Jazz'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Rock'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Electronic'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="n"&gt;artist_name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;dj_album&lt;/span&gt; &lt;span class="n"&gt;RECORD&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="c1"&gt;-- Convert each array option into a row and insert them into genres table.&lt;/span&gt;
  &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;unnest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_random_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;-- About 50% of the time, add 'DJ ' to the front of the artist's name.&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt;
      &lt;span class="n"&gt;artist_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'DJ '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;-- ...&lt;/span&gt;

  &lt;span class="c1"&gt;-- Ensure all albums by a 'DJ' artist belong to the Electronic genre.&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;dj_album&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'DJ %'&lt;/span&gt;
  &lt;span class="n"&gt;LOOP&lt;/span&gt;
    &lt;span class="n"&gt;RAISE&lt;/span&gt; &lt;span class="n"&gt;NOTICE&lt;/span&gt; &lt;span class="s1"&gt;'Ensuring DJ album % belongs to Electronic genre!'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quote_literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dj_album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;album_genres&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;album_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;genre_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;dj_album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Electronic'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;-- If we insert a row that already exists, do nothing (don't raise an error)&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTHING&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see in the above code snippet, PL/pgSQL lets us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test conditions with &lt;a href="https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-CONDITIONALS" rel="noopener noreferrer"&gt;IF statements&lt;/a&gt; (which can have ELSIF and ELSE blocks or alternately be represented with CASE statements),&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-INTEGER-FOR" rel="noopener noreferrer"&gt;Loop over a range of integers&lt;/a&gt; with &lt;code&gt;FOR i IN 1..8 LOOP&lt;/code&gt; (which can loop in reverse or with a step),&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING" rel="noopener noreferrer"&gt;Loop over rows from a query&lt;/a&gt;, as in the &lt;code&gt;FOR dj_album IN ...&lt;/code&gt; example above,&lt;/li&gt;
&lt;li&gt;Print helpful log statements with &lt;a href="https://www.postgresql.org/docs/current/plpgsql-errors-and-messages.html" rel="noopener noreferrer"&gt;RAISE&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;and do &lt;a href="https://www.postgresql.org/docs/current/plpgsql-overview.html#PLPGSQL-ADVANTAGES" rel="noopener noreferrer"&gt;all the above in a performant way&lt;/a&gt;, because the client can send the whole code block to the server to execute, rather than serializing and sending each statement to the server one at a time as it would with raw SQL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's much more to &lt;a href="https://www.postgresql.org/docs/current/plpgsql.html" rel="noopener noreferrer"&gt;learn about PL/pgSQL&lt;/a&gt; than I can cover here in a reasonable amount of space, but hopefully the above provides some insight into its capabilities to help you decide what tool makes sense for you!&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Python
&lt;/h2&gt;

&lt;p&gt;PL/pgSQL isn't the only procedural language available with PostgreSQL, it also supports Python! The Python procedural language, &lt;code&gt;plpython3u&lt;/code&gt; for Python 3, is "untrusted" (hence the &lt;code&gt;u&lt;/code&gt; at the end of the name), meaning you must be a superuser to create functions, and Python code can access and do anything that a superuser could. Luckily, we're generating test data in non-production environments, so Python is an acceptable option despite these security concerns.&lt;/p&gt;

&lt;p&gt;To use &lt;code&gt;plpython3u&lt;/code&gt;, we need to install &lt;code&gt;python3&lt;/code&gt; and &lt;code&gt;postgresql-plpython3-$PG_MAJOR&lt;/code&gt; system packages and create the extension in the SQL script with the command below. I've already taken these steps for the Docker image and plpython script in the sample code repo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;plpython3u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main difference to be aware of when using Python in PostgreSQL is that all database access happens via the &lt;code&gt;plpy&lt;/code&gt; module that is automatically imported in &lt;code&gt;plpython3u&lt;/code&gt; blocks. The following example should help clarify some basics of using &lt;code&gt;plpython3u&lt;/code&gt; and the &lt;code&gt;plpy&lt;/code&gt; module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-plpython-intro.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
    &lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"Print statements don't appear anywhere!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Manually&lt;/span&gt; &lt;span class="k"&gt;convert&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quote&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;interpolate&lt;/span&gt;
    &lt;span class="n"&gt;artist_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quote_nullable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"DJ Okawari"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;returned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="nv"&gt;"INSERT INTO artists (name) VALUES ({artist_name})"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;returned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Outputs&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;next&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;
    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PLyResult&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Let&lt;/span&gt; &lt;span class="n"&gt;PostgreSQL&lt;/span&gt; &lt;span class="n"&gt;parameterize&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
    &lt;span class="n"&gt;artist_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"Ella Fitzgerald"&lt;/span&gt;
    &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"INSERT INTO artists (name) VALUES ($1) RETURNING *"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;returned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;artist_name&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;returned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Outputs&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;next&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;
    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PLyResult&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s1"&gt;'artist_id'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Ella Fitzgerald'&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

    &lt;span class="n"&gt;returned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"SELECT * FROM artists"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;returned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Outputs&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;next&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;
    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PLyResult&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="n"&gt;nrows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s1"&gt;'artist_id'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'DJ Okawari'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'artist_id'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Ella Fitzgerald'&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpython3u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are the most important insights from the above code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can't print out debugging information with the Python print statement, you need to use &lt;a href="https://www.postgresql.org/docs/12/plpython-util.html" rel="noopener noreferrer"&gt;logging methods available in the plpy module&lt;/a&gt; (such as &lt;code&gt;info&lt;/code&gt;, &lt;code&gt;warning&lt;/code&gt;, &lt;code&gt;error&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;[plpy.execute&lt;/code&gt; function](&lt;a href="https://www.postgresql.org/docs/12/plpython-database.html" rel="noopener noreferrer"&gt;https://www.postgresql.org/docs/12/plpython-database.html&lt;/a&gt;) can execute a simple string as a query. If you're interpolating variables into the query, you are responsible for converting the variable value into a string and properly &lt;a href="https://www.postgresql.org/docs/12/plpython-util.html" rel="noopener noreferrer"&gt;quoting&lt;/a&gt; it.&lt;/li&gt;
&lt;li&gt;Alternately, use &lt;code&gt;plan = plpy.prepare&lt;/code&gt; then &lt;code&gt;plan.execute&lt;/code&gt; to prepare and execute a query, which allows you to leave data conversion and quoting up to PostgreSQL. As a bonus, you can save plans so the database only has to parse the query string and formulate an execution plan once.&lt;/li&gt;
&lt;li&gt;The return value of &lt;code&gt;plpy.execute&lt;/code&gt; can tell you the &lt;a href="https://github.com/postgres/postgres/blob/c30f54ad732ca5c8762bb68bbe0f51de9137dd72/src/include/executor/spi.h#L81-L97" rel="noopener noreferrer"&gt;status&lt;/a&gt; of the query, how many rows were inserted or returned, and the rows themselves.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have an understanding of how to use Python in PostgreSQL, let's apply it to generating test data for our sample schema. While we could translate the previous section's PL/pgSQL code to Python with very few changes, doing so wouldn't capitalize on the biggest advantage of using Python — the plethora of standard and third-party libraries available.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Faker Package
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://faker.readthedocs.io/en/master/" rel="noopener noreferrer"&gt;Faker&lt;/a&gt; is a Python package that provides many helpers for generating fake data. You can generate realistic-looking first and last names, addresses, emails, URLs, job titles, company names, and much more. Faker also supports generating &lt;a href="https://faker.readthedocs.io/en/master/providers/faker.providers.lorem.html" rel="noopener noreferrer"&gt;random words and sentences&lt;/a&gt;, and generating random data across many different data types (numbers, strings, dates, JSON, and more). Using Faker is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-plpython-faker.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt; &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;choice&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;faker&lt;/span&gt; &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Faker&lt;/span&gt;

    &lt;span class="n"&gt;fake&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Faker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"INSERT INTO artists (name) VALUES ($1)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;

    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Alternately&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;we&lt;/span&gt; &lt;span class="n"&gt;could&lt;/span&gt; &lt;span class="k"&gt;add&lt;/span&gt; &lt;span class="nv"&gt;"RETURNING artist_id"&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;above&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt;
    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;save&lt;/span&gt; &lt;span class="n"&gt;those&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;avoid&lt;/span&gt; &lt;span class="n"&gt;making&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="n"&gt;extra&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="n"&gt;artist_ids&lt;/span&gt;
    &lt;span class="n"&gt;artist_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"artist_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"SELECT artist_id FROM artists"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;"INSERT INTO albums (artist_id, title, released) VALUES ($1, $2, $3)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artist_ids&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;

    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpython3u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The dataclasses Module
&lt;/h3&gt;

&lt;p&gt;If you prefer to create Python objects to represent rows from your different tables, you could use a variety of different packages, such as &lt;a href="https://www.attrs.org/en/stable/" rel="noopener noreferrer"&gt;attrs&lt;/a&gt;, &lt;a href="https://factoryboy.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;factory_boy&lt;/a&gt;, or the built-in module &lt;a href="https://docs.python.org/3/library/dataclasses.html" rel="noopener noreferrer"&gt;dataclasses&lt;/a&gt;. These packages allow you to declare a field per table column and associate data types and factories for generating test data.&lt;/p&gt;

&lt;p&gt;Please note that if you go very far down this path of representing rows as Python objects, you will find yourself re-creating a lot of ORM functionality. In that case, you should probably just use an ORM!&lt;/p&gt;

&lt;p&gt;Here's an example of how you could use the dataclasses module to generate test data for our sample schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-plpython-dataclasses.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
    &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt; &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;choice&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TypeVar&lt;/span&gt;

    &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;faker&lt;/span&gt; &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Faker&lt;/span&gt;

    &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TypeVar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"T"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bound&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;"DataGeneratorBase"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fake&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Faker&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;useful&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tracking&lt;/span&gt; &lt;span class="n"&gt;instances&lt;/span&gt; &lt;span class="n"&gt;so&lt;/span&gt; &lt;span class="n"&gt;we&lt;/span&gt; &lt;span class="n"&gt;can&lt;/span&gt; &lt;span class="n"&gt;use&lt;/span&gt; &lt;span class="n"&gt;them&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;relationships&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;picking&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt; &lt;span class="n"&gt;artist&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="n"&gt;genre&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="k"&gt;foreign&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="n"&gt;DataGeneratorBase&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;def&lt;/span&gt; &lt;span class="n"&gt;__new__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nv"&gt;"Track class instances in a list on the class"&lt;/span&gt;
            &lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;__new__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;ignore&lt;/span&gt;
            &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="nv"&gt;"instances"&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__dict__&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instances&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;dataclass&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DataGeneratorBase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;genre_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;street_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;dataclass&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="n"&gt;Artist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DataGeneratorBase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;dataclass&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="n"&gt;Album&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DataGeneratorBase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;album_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;artist&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Artist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Artist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;released&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;genres&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Use&lt;/span&gt; &lt;span class="n"&gt;Faker&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;pick&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;avoid&lt;/span&gt; &lt;span class="n"&gt;duplicates&lt;/span&gt;
            &lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;unique&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="nv"&gt;"RETURNING id"&lt;/span&gt; &lt;span class="n"&gt;lets&lt;/span&gt; &lt;span class="n"&gt;us&lt;/span&gt; &lt;span class="k"&gt;get&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;generated&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt;
        &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Python&lt;/span&gt; &lt;span class="k"&gt;object&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;later&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt; &lt;span class="k"&gt;without&lt;/span&gt; &lt;span class="n"&gt;needing&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt; &lt;span class="n"&gt;additional&lt;/span&gt;
        &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;"INSERT INTO genres (name) VALUES ($1) RETURNING genre_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;genre_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="nv"&gt;"genre_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;artist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Artist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;"INSERT INTO artists (name) VALUES ($1) RETURNING artist_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;artist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;artist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="nv"&gt;"artist_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;album&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Album&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;"INSERT INTO albums (artist_id, title, released) VALUES ($1, $2, $3) RETURNING album_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;album_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;released&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="nv"&gt;"album_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;Insert&lt;/span&gt; &lt;span class="n"&gt;album_genres&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;g&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;genres&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nv"&gt;"INSERT INTO album_genres (album_id, genre_id) VALUES ($1, $2)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;album_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;genre_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpython3u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above snippet defines classes for each main table in our example schema: Genre, Artist, and Album. Then, it defines fields for each column along with a &lt;code&gt;default_factory&lt;/code&gt; function that tells Python (or the Faker package, in many cases) how to generate suitable test data. I made the Album class the "owner" of the many-to-many relationship with Genres, so when an Album is created, it automatically picks 0-3 existing Genres to associate itself with during initialization.&lt;/p&gt;

&lt;p&gt;The second half of the code passes the Python objects into SQL INSERT queries, returning the primary key IDs (which weren't generated during object creation, due to the &lt;code&gt;init=False&lt;/code&gt; field argument) so they can be saved on the objects and used later when setting foreign keys. This highlights a difficulty with doing this sort of object-relational mapping yourself — you have to figure out dependencies between your types of data and enforce an ordering (in Python &lt;em&gt;and&lt;/em&gt; SQL) so that you have database-created IDs at the right times. This can be a bit tedious and messy, especially if you have circular dependencies or self-referencing relationships in your tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Importing External .py Files
&lt;/h3&gt;

&lt;p&gt;If your data model or data-generation code start to get complex, it can be annoying to have a lot of Python code in SQL files — your IDE won't want to lint, type-check, and auto-format your Python code! Luckily, you can keep your Python code in external &lt;code&gt;.py&lt;/code&gt; files that you import and execute from inside a &lt;code&gt;plpython3u&lt;/code&gt; block, using the technique shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="c1"&gt;-- Excerpt from add-data-plpython-external-pyfile.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
    &lt;span class="n"&gt;import&lt;/span&gt; &lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;util&lt;/span&gt;

    &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="k"&gt;second&lt;/span&gt; &lt;span class="n"&gt;argument&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inside&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;container&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spec_from_file_location&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"add_test_data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"/repo/add_test_data.py"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;add_test_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;importlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;module_from_spec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exec_module&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;add_test_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;add_test_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpython3u&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;add_test_data.py&lt;/code&gt; file can look the exact same as the body of the &lt;code&gt;plpython3u&lt;/code&gt; block from the previous example, but you'll need to wrap the bottom half (which uses &lt;code&gt;plpy&lt;/code&gt; to run queries) in a function that accepts &lt;code&gt;plpy&lt;/code&gt; as an argument, so it looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Excerpt from add_test_data.py in the sample code repo
&lt;/span&gt;
&lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plpy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Other (Trusted) Ways to Use Python
&lt;/h3&gt;

&lt;p&gt;I want to briefly touch on two ways of using Python &lt;em&gt;outside&lt;/em&gt; of PostgreSQL — running Python externally may be preferable if you want or need to avoid the untrusted nature of &lt;code&gt;plpython3u&lt;/code&gt;. These approaches let you maintain your Python code completely independent of the database, which may be beneficial for reusability and maintainability.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You could use Python scripts to generate test data into CSV files and then load those into PostgreSQL with the &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;COPY command&lt;/a&gt;. With this approach, however, you will likely end up with a multi-step process to generate and load test data. If you invoke a Python script (which outputs CSV) within the SQL COPY command, then you can't populate multiple tables with a single command. If you use multiple SQL COPY commands, it becomes convoluted to reference IDs across tables (foreign keys) across multiple Python script executions. The remaining reasonable approach is a multi-step one: run a Python script that saves multiple CSV files to disk (one per database table) and then run an SQL COPY command per CSV file to load the data.&lt;/li&gt;
&lt;li&gt;You could run Python scripts that connect to PostgreSQL via a client library such as &lt;a href="https://www.psycopg.org/docs/" rel="noopener noreferrer"&gt;psycopg2&lt;/a&gt;. The psycopg2 package is used by many ORMs, such as the Django ORM and SQLAlchemy, but it doesn't impose any restrictions on how you handle your data — it just provides a Python interface for connecting to PostgreSQL, sending SQL commands, and receiving results.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Thank you for joining me on this exploration of loading test data (in the &lt;a href="https://www.tangramvision.com/blog/loading-test-data-into-postgresql" rel="noopener noreferrer"&gt;previous blog post&lt;/a&gt;) and generating test data for PostgreSQL! We tried out a variety of approaches and got some hands-on experience with code — I hope this helps you understand how to use these different approaches, weigh their tradeoffs, and choose which approach makes the most sense for your team and project.&lt;/p&gt;

&lt;p&gt;If you have any suggestions or corrections, please let me know or &lt;a href="https://www.twitter.com/tangramvision" rel="noopener noreferrer"&gt;send us a tweet&lt;/a&gt;, and if you’re curious to learn more about how we improve perception sensors, visit us at &lt;a href="https://www.tangramvision.com/" rel="noopener noreferrer"&gt;Tangram Vision&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>postgres</category>
      <category>sql</category>
      <category>python</category>
    </item>
    <item>
      <title>Loading Test Data into PostgreSQL</title>
      <dc:creator>Greg Schafer</dc:creator>
      <pubDate>Wed, 28 Apr 2021 22:47:27 +0000</pubDate>
      <link>https://dev.to/tangramvision/loading-test-data-into-postgresql-5fd3</link>
      <guid>https://dev.to/tangramvision/loading-test-data-into-postgresql-5fd3</guid>
      <description>&lt;p&gt;Most web apps/services that use a relational database are built around a web framework and an Object-Relational Mapping (ORM) library, which typically have conventions that prescribe how to create and load test fixtures/data into the database for testing. If you're building a webapp without an ORM [1], the story for how to create and load test data is less clear. What tools and approaches are available, and which work best? There are a lot of articles around the internet that describe specific techniques or example code in isolation, but few that provide a broader survey of the many different approaches that are possible. I hope this article will help fill that gap, exploring and discussing different approaches for creating and loading test data in PostgreSQL.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[1] Wait a minute, why would you build a webapp without an ORM?! This question could spawn an entire article of its own and in fact, &lt;a href="https://web.archive.org/web/20210114190143/http://blogs.tedneward.com/post/the-vietnam-of-computer-science/" rel="noopener noreferrer"&gt;many&lt;/a&gt; &lt;a href="https://web.archive.org/web/20201101150821/http://blogs.tedneward.com/post/thoughts-on-vietnam-commentary/" rel="noopener noreferrer"&gt;other&lt;/a&gt; &lt;a href="https://blog.codinghorror.com/object-relational-mapping-is-the-vietnam-of-computer-science/" rel="noopener noreferrer"&gt;articles&lt;/a&gt; &lt;a href="https://seldo.com/posts/orm_is_an_antipattern" rel="noopener noreferrer"&gt;have&lt;/a&gt; &lt;a href="https://martinfowler.com/bliki/OrmHate.html" rel="noopener noreferrer"&gt;debated&lt;/a&gt; &lt;a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_impedance_mismatch" rel="noopener noreferrer"&gt;about&lt;/a&gt; &lt;a href="https://stackoverflow.com/questions/494816/using-an-orm-or-plain-sql" rel="noopener noreferrer"&gt;ORMs&lt;/a&gt; for the last couple decades. I won't dive into that debate — it's up to the creator to decide if a project should use an ORM or not, and that decision depends on a lot of project-specific factors, such as the expertise of the creator and their team, the types and velocity of data involved, the performance and scaling requirements, and much more.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you're interested in &lt;em&gt;generating&lt;/em&gt; test data instead of (or in addition to) loading test data, please check out the &lt;a href="https://www.tangramvision.com/blog/creating-postgresql-test-data-with-sql-pl-pgsql-and-python" rel="noopener noreferrer"&gt;follow-up article that explores generating test data for PostgreSQL using SQL, PL/pgSQL, and Python&lt;/a&gt;!&lt;/p&gt;

&lt;h1&gt;
  
  
  Follow Along with Docker
&lt;/h1&gt;

&lt;p&gt;Want to follow along? I've collected sample data and scripts in a subfolder of our Tangram Vision blog repo: &lt;a href="https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.28_LoadingTestDataIntoPostgreSQL" rel="noopener noreferrer"&gt;https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.28_LoadingTestDataIntoPostgreSQL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As described in the repo's README, you can run examples using the &lt;a href="https://hub.docker.com/_/postgres" rel="noopener noreferrer"&gt;official Postgres Docker image&lt;/a&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The base postgres image requires a password to be set, but we'll just be&lt;/span&gt;
&lt;span class="c"&gt;# testing locally, so no need to set a strong password.&lt;/span&gt;
docker run &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;foo &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/schema.sql:/docker-entrypoint-initdb.d/schema.sql &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--volume&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:/repo
    postgres:latest &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;log_statement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To explain this Docker command a bit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The base postgres image requires a password to be set (via the &lt;code&gt;POSTGRES_PASSWORD&lt;/code&gt; environment variable), but we'll just be testing locally, so no need to set a strong password.&lt;/li&gt;
&lt;li&gt;Executable scripts (&lt;code&gt;*.sh&lt;/code&gt; and &lt;code&gt;*.sql&lt;/code&gt; files) in the &lt;code&gt;/docker-entrypoint-initdb.d&lt;/code&gt; folder inside the container will be executed as PostgreSQL starts up. The above command mounts &lt;code&gt;schema.sql&lt;/code&gt; into that folder, so the database tables will be created.&lt;/li&gt;
&lt;li&gt;The repo is also mounted to &lt;code&gt;/repo&lt;/code&gt; inside the container, so example SQL and CSV files are accessible.&lt;/li&gt;
&lt;li&gt;The PostgreSQL server is started with the &lt;code&gt;log_statement=all&lt;/code&gt; config override, which increases the logging verbosity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repo contains a variety of files that start with &lt;code&gt;add-data-&lt;/code&gt; which demonstrate different ways of loading and generating test data. After the Postgres Docker container is running, you can run &lt;code&gt;add-data-&lt;/code&gt; files in a new terminal window with a command like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--workdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/repo postgres &lt;span class="se"&gt;\&lt;/span&gt;
    psql &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost &lt;span class="nt"&gt;--username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres &lt;span class="se"&gt;\&lt;/span&gt;
         &lt;span class="nt"&gt;--file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;add-data-sql-copy-csv.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to interactively poke around the database with &lt;code&gt;psql&lt;/code&gt;, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--interactive&lt;/span&gt; &lt;span class="nt"&gt;--tty&lt;/span&gt; postgres &lt;span class="se"&gt;\&lt;/span&gt;
    psql &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost &lt;span class="nt"&gt;--username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Sample Schema
&lt;/h1&gt;

&lt;p&gt;For example code and data, I'll use the following simple schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Musical artists have a name&lt;/li&gt;
&lt;li&gt;An artist can have many albums (one-to-many), which have a title and release date&lt;/li&gt;
&lt;li&gt;Genres have a name&lt;/li&gt;
&lt;li&gt;Albums can belong to many genres (many-to-many)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7f613e35edb5806ed%2F6089d32cd6d8e7f76d9a7648_postgres-blogpost-sample-data-schema.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7f613e35edb5806ed%2F6089d32cd6d8e7f76d9a7648_postgres-blogpost-sample-data-schema.png" alt="Sample schema"&gt;&lt;/a&gt;&lt;/p&gt;
Sample schema relating musical artists, albums, and genres.



&lt;h1&gt;
  
  
  Loading Static Data
&lt;/h1&gt;

&lt;p&gt;The simplest way to get test data into PostgreSQL is to make a static dataset, which you can save as CSV files or embed in SQL files directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL COPY from CSV Files
&lt;/h2&gt;

&lt;p&gt;In the &lt;a href="https://gitlab.com/tangram-vision-oss/tangram-visions-blog/-/tree/main/2021.04.28_LoadingTestDataIntoPostgreSQL" rel="noopener noreferrer"&gt;code repo accompanying this blogpost&lt;/a&gt;, there are 4 small CSV files, one for each table of the sample schema. The CSV files contain headers and data rows as shown in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7f613e35edb5806ed%2F6089d5275b8967183292eb87_blogpost-csv-tables-v2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuploads-ssl.webflow.com%2F5fff85e7f613e35edb5806ed%2F6089d5275b8967183292eb87_blogpost-csv-tables-v2.jpg" alt="Small static dataset"&gt;&lt;/a&gt;&lt;/p&gt;
A small, static sample dataset of musical artists, albums, and genres.



&lt;p&gt;We can import the data from these CSV files into a PostgreSQL database with the &lt;a href="https://www.postgresql.org/docs/current/sql-copy.html" rel="noopener noreferrer"&gt;SQL COPY&lt;/a&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-copy-csv.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'/repo/artists.csv'&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'/repo/albums.csv'&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'/repo/genres.csv'&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;album_genres&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'/repo/album_genres.csv'&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt; &lt;span class="n"&gt;HEADER&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The COPY command has a variety of options for controlling quoting, delimiters, escape characters, and more. You can even limit which rows are imported with a WHERE clause. One potential downside is you must run it as a database superuser or as a user with permissions to read and write and execute files on the server — this isn't a concern when loading data for local testing, but keep it in mind if you ever want to use it in a more restrictive or production-like environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Psql Copy from CSV Files
&lt;/h2&gt;

&lt;p&gt;The PostgreSQL interactive terminal (called psql) provides a &lt;a href="https://wiki.postgresql.org/wiki/COPY" rel="noopener noreferrer"&gt;copy command&lt;/a&gt; that is very similar to SQL COPY:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-copy-csv.psql in the sample code repo&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="s1"&gt;'artists.csv'&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="s1"&gt;'albums.csv'&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="s1"&gt;'genres.csv'&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;album_genres&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="s1"&gt;'album_genres.csv'&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are some important differences between SQL COPY and psql copy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Like other psql commands, the psql version of the copy command starts with a backslash (&lt;code&gt;\&lt;/code&gt;) and doesn't need to end with a semicolon (&lt;code&gt;;&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;SQL COPY runs in the server environment whereas psql copy runs in the client environment. To clarify, the filepath you provide to SQL COPY should point to a file on the server's filesystem. The filepath you provide to psql copy points to a file on the filesystem where you're running the psql client. If you're following along using the Docker image and commands provided in this blogpost, the server and client are the same container, but if you ever want to load data from your local machine to a database on a remote server, then you'll want to use psql copy.&lt;/li&gt;
&lt;li&gt;As a corollary to the above, psql copy is less performant than SQL COPY, because all the data must travel from the client to the server, rather than being directly loaded by the server.&lt;/li&gt;
&lt;li&gt;SQL COPY requires absolute filepaths, but psql can handle relative filepaths.&lt;/li&gt;
&lt;li&gt;Psql copy runs with the privileges of the user you're connecting to the server as, so it doesn't require superuser or local file read/write/execute permissions like SQL COPY does.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Putting Data in SQL Directly
&lt;/h2&gt;

&lt;p&gt;As an alternative to storing data in separate CSV files (which are loaded with SQL or psql commands), you can store data in SQL files directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL COPY from stdin and pg_dump
&lt;/h3&gt;

&lt;p&gt;The SQL COPY and psql copy commands can load data from stdin instead of a file. They will parse and load all the lines between the copy command and &lt;code&gt;\.&lt;/code&gt; as rows of data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-copy-stdin.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;stdin&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"DJ Okawari"&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Steely Dan"&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Missy Elliott"&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"TWRP"&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Donald Fagen"&lt;/span&gt;
&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"La Luz"&lt;/span&gt;
&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Ella Fitzgerald"&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;album_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;released&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;stdin&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Mirror"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2009&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Pretzel Logic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1974&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;02&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Under Construction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2002&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Return to Wherever"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2019&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;07&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;
&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"The Nightfly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1982&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;
&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"It's Alive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2013&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;
&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nv"&gt;"Pure Ella"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1994&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;02&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In fact, this &lt;code&gt;COPY ... FROM stdin&lt;/code&gt; approach is how &lt;code&gt;[pg_dump](https://www.postgresql.org/docs/current/app-pgdump.html)&lt;/code&gt; outputs data if you're creating a dump or backup from an existing PostgreSQL database. However, &lt;code&gt;pg_dump&lt;/code&gt; uses a tab-separated format by default, rather than the comma-separated format shown above.&lt;/p&gt;

&lt;p&gt;By default, &lt;code&gt;pg_dump&lt;/code&gt; also outputs SQL to re-create everything about the database (tables, constraints, views, functions, reset sequences, etc.), but you can instruct it to output only data with the &lt;code&gt;--data-only&lt;/code&gt; flag. To try out &lt;code&gt;pg_dump&lt;/code&gt; with the example Docker image, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--workdir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/repo postgres &lt;span class="se"&gt;\&lt;/span&gt;
    pg_dump &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localhost &lt;span class="nt"&gt;--username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres postgres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SQL INSERTs
&lt;/h3&gt;

&lt;p&gt;Another way to put data directly in SQL is to use &lt;a href="https://www.postgresql.org/docs/current/sql-insert.html" rel="noopener noreferrer"&gt;INSERT statements&lt;/a&gt;. This approach could look like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-insert-static-ids.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;OVERRIDING&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="n"&gt;VALUE&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'DJ Okawari'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Steely Dan'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Missy Elliott'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'TWRP'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Donald Fagen'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'La Luz'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Ella Fitzgerald'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;album_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;released&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;OVERRIDING&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="n"&gt;VALUE&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mirror'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2009-06-24'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Pretzel Logic'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1974-02-20'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Under Construction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2002-11-12'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Return to Wherever'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2019-07-11'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'The Nightfly'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1982-10-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'It&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s Alive'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2013-10-15'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Pure Ella'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1994-02-15'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;OVERRIDING SYSTEM VALUE&lt;/code&gt; clause lets us INSERT values into the primary key ID columns explicitly even though they are defined as &lt;code&gt;GENERATED ALWAYS&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pg_dump&lt;/code&gt; command's &lt;code&gt;--column-inserts&lt;/code&gt; option will output data as INSERT statements (a separate statement per row), rather than as the default TSV format. Using INSERTs instead of COPY will run much slower when restoring the data, so this is only recommended if you're restoring the data to a database that doesn't support COPY, such as sqlite3. Using INSERTs can be sped up somewhat with the &lt;code&gt;--rows-per-insert&lt;/code&gt; option, allowing you to INSERT many rows at a time per command, reducing the overhead of back-and-forth communication between client and server for every SQL statement.&lt;/p&gt;

&lt;p&gt;Using INSERT statements, we could start moving away from statically declaring everything about our datasets — we could omit the primary key ID columns and lookup IDs as needed when inserting foreign keys, as in the following example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Excerpt from add-data-insert-queried-ids.sql in the sample code repo&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'DJ Okawari'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Steely Dan'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Missy Elliott'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'TWRP'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Donald Fagen'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'La Luz'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Ella Fitzgerald'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;albums&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artist_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;released&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'DJ Okawari'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Mirror'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2009-06-24'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Steely Dan'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Pretzel Logic'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1974-02-20'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Missy Elliott'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Under Construction'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2002-11-12'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'TWRP'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Return to Wherever'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2019-07-11'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Donald Fagen'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'The Nightfly'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1982-10-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'La Luz'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'It&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;s Alive'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2013-10-15'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;artists&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Ella Fitzgerald'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Pure Ella'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1994-02-15'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is hardly convenient, though, because we need to duplicate other row information (such as the artist name) in order to look up the corresponding ID. It gets even more complex if multiple artists have the same name! So, if you have a static dataset I'd suggest sticking to one of the previously mentioned approaches that use SQL COPY or psql copy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting Data in CSVs vs in SQL Files
&lt;/h2&gt;

&lt;p&gt;Is there a reason to prefer putting static datasets in CSVs or directly in SQL files? My thoughts boil down to the following points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CSVs are a widely understood and supported format (just make sure to be clear and consistent with encoding!). If your datasets will be maintained or created by people who prefer spreadsheet programs to database-admin and command-line tools, CSVs may be preferable.&lt;/li&gt;
&lt;li&gt;If you want to keep all your test data and database setup in one place, SQL files are a convenient way to do that.&lt;/li&gt;
&lt;li&gt;If your testing or continuous integration processes use &lt;code&gt;pg_dump&lt;/code&gt; or its output, then you're already using datasets embedded in an SQL file — keep doing what makes sense for you!&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I hope you learned something new and useful about the different approaches and tools available for loading static datasets into PostgreSQL. If you're looking to learn more check out the &lt;a href="https://www.tangramvision.com/blog/creating-postgresql-test-data-with-sql-pl-pgsql-and-python" rel="noopener noreferrer"&gt;follow-up article about &lt;em&gt;generating&lt;/em&gt; test data for PostgreSQL&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;If you have any suggestions or corrections, please let me know or &lt;a href="https://www.twitter.com/tangramvision" rel="noopener noreferrer"&gt;send us a tweet&lt;/a&gt;, and if you’re curious to learn more about how we improve perception sensors, visit us at &lt;a href="https://www.tangramvision.com/" rel="noopener noreferrer"&gt;Tangram Vision&lt;/a&gt;.&lt;/p&gt;
Cover Photo by &lt;a href="https://unsplash.com/@syinq?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Susan Q Yin&lt;/a&gt; on &lt;a href="https://unsplash.com/@syinq?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;



</description>
      <category>postgres</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Exploring Ansible via Setting Up a WireGuard VPN</title>
      <dc:creator>Greg Schafer</dc:creator>
      <pubDate>Thu, 04 Mar 2021 17:41:20 +0000</pubDate>
      <link>https://dev.to/tangramvision/exploring-ansible-via-setting-up-a-wireguard-vpn-3389</link>
      <guid>https://dev.to/tangramvision/exploring-ansible-via-setting-up-a-wireguard-vpn-3389</guid>
      <description>&lt;p&gt;Photo by &lt;a href="https://unsplash.com/@thomasjsn?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Thomas Jensen&lt;/a&gt; on &lt;a href="https://unsplash.com/?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my &lt;a href="https://www.tangramvision.com/blog/what-they-dont-tell-you-about-setting-up-a-wireguard-vpn" rel="noopener noreferrer"&gt;previous blogpost&lt;/a&gt;, we set up a WireGuard VPN server and client and learned about various configuration options for WireGuard, how to improve VPN server uptime, how to relay traffic, and more. Setting up a server and client like that is a lot of work! If the server dies or you want to set up a new server (maybe for a friend or family member this time), you have to go back to the walk-through and follow all the steps, remembering if you deviated from those instructions at any point.&lt;/p&gt;

&lt;p&gt;There's a better way — automation! If you're only going to do a thing once (e.g. set up a VPN), investing in automation probably doesn't make sense. But if you anticipate doing a thing repeatedly, automating it frees up your time to learn and accomplish more in the future. You can also share your automation, empowering others to build and achieve more, faster.&lt;/p&gt;

&lt;p&gt;Automation is the heart of computing, and many different automation tools and approaches have sprung up over time. For our project of automating VPN server setup, we can consider a variety of tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shell scripts

&lt;ul&gt;
&lt;li&gt;The simplest approach from a tooling perspective, writing shell scripts would involve running the commands from the &lt;a href="https://www.tangramvision.com/blog/what-they-dont-tell-you-about-setting-up-a-wireguard-vpn" rel="noopener noreferrer"&gt;previous WireGuard tutorial blogpost&lt;/a&gt;, using &lt;code&gt;ssh&lt;/code&gt; for the commands that run on the server and &lt;code&gt;rsync&lt;/code&gt; to copy configurations files to the server.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;SSH scripting libraries like &lt;a href="https://capistranorb.com/documentation/overview/what-is-capistrano/" rel="noopener noreferrer"&gt;Capistrano&lt;/a&gt; or &lt;a href="http://www.fabfile.org/" rel="noopener noreferrer"&gt;Fabric&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;If shell scripting isn't ideal, there are libraries that expose similar scripting functionality in a more ergonomic interface for developers familiar with higher-level languages like Ruby and Python.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Infrastructure/configuration automation tools like &lt;a href="https://puppet.com/" rel="noopener noreferrer"&gt;Puppet&lt;/a&gt;, &lt;a href="https://www.chef.io/" rel="noopener noreferrer"&gt;Chef&lt;/a&gt;, or &lt;a href="https://www.ansible.com/" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;Tools in this category are even more specialized for automating server infrastructure and configuration, often including an ecosystem of packages and plugins to automatically set up or configure nearly anything you can think of.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Infrastructure-as-code tools like &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure-as-code (IaC) tools have a lot of overlap with the above category, but support provisioning cloud resources in a more first-class/native way.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Containers like &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;You could also run WireGuard in containers, deploying a server-configured container image to a cloud provider and running a client-configured container image locally to connect to the server. There are &lt;a href="https://medium.com/@firizki/running-wireguard-on-docker-container-76355c43787c" rel="noopener noreferrer"&gt;a few&lt;/a&gt; &lt;a href="https://hub.docker.com/r/linuxserver/wireguard" rel="noopener noreferrer"&gt;existing&lt;/a&gt; &lt;a href="https://blog.jessfraz.com/post/installing-and-using-wireguard/" rel="noopener noreferrer"&gt;examples&lt;/a&gt; of this approach.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;For this tutorial, I'm going to focus on the middle category above — infrastructure/configuration automation tools — and specifically, I'll focus on Ansible. There is a &lt;a href="https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-puppet-ansible-saltstack-or-cloudformation-7989dad2865c" rel="noopener noreferrer"&gt;great comparison of different tools in this area&lt;/a&gt; by Gruntwork and, even though that article favors Terraform, Ansible is still a useful general-purpose tool, especially if you're working with servers that aren't "in the cloud", such as a Raspberry Pi at home.&lt;/p&gt;

&lt;p&gt;Let's get started with automating VPN setup with Ansible! By the end of this article, we'll be able to set up a VPN server and client with a single command. Similar to the previous blogpost, I'll use Ubuntu 20.04 and DigitalOcean droplets.&lt;/p&gt;

&lt;h1&gt;
  
  
  Setting up Ansible
&lt;/h1&gt;

&lt;p&gt;Ansible can be &lt;a href="https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html" rel="noopener noreferrer"&gt;installed via an OS package manager&lt;/a&gt; like &lt;code&gt;apt&lt;/code&gt;, but I prefer to use &lt;code&gt;pip&lt;/code&gt; so I can get the latest updates and avoid cluttering system package management with third-party PPAs (Personal Package Archives). We'll also use &lt;code&gt;pyenv&lt;/code&gt; (as suggested by &lt;a href="https://medium.com/@cjolowicz/hypermodern-python-d44485d9d769#6e8a" rel="noopener noreferrer"&gt;Hypermodern Python&lt;/a&gt;) to make sure we're not breaking or cluttering the system Python installation. Install &lt;code&gt;pyenv&lt;/code&gt; with the following:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# From https://github.com/pyenv/pyenv/wiki#suggested-build-environment&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update

&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

curl https://pyenv.run | bash


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It's a good habit when a tutorial gives you &lt;code&gt;curl &amp;lt;url&amp;gt; | bash&lt;/code&gt; to open up that URL and see what it's going to do. In this case, you'll see that it'll download and execute a shell script on GitHub that will clone 6 repos from GitHub to your &lt;code&gt;~/.pyenv&lt;/code&gt; folder and prompt you to add a few lines to your shell's initialization script.&lt;/p&gt;

&lt;p&gt;Follow the output prompt from above, which asks you to put lines like the below in your shell initialization script (e.g. &lt;code&gt;~/.bashrc&lt;/code&gt; if you use the bash shell). Make sure to fill in your own username!&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/home/YOUR_USERNAME/.pyenv/bin:&lt;/span&gt;&lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;pyenv init -&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;pyenv virtualenv-init -&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Install a recent python version:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# List available python versions&lt;/span&gt;
pyenv &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--list&lt;/span&gt;

&lt;span class="c"&gt;# Install a specific version&lt;/span&gt;
pyenv &lt;span class="nb"&gt;install &lt;/span&gt;3.9.2

&lt;span class="c"&gt;# (Suggested) If you want to always use that version when running `python`&lt;/span&gt;
&lt;span class="c"&gt;# in your terminal&lt;/span&gt;
pyenv global 3.9.2


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you want, you can also create a &lt;a href="https://virtualenv.pypa.io/en/latest/" rel="noopener noreferrer"&gt;virtualenv&lt;/a&gt; to further isolate the Ansible installation, and make that virtualenv automatically activate when you're in a particular folder/repo. That would look like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# (Optional)&lt;/span&gt;

&lt;span class="c"&gt;# Feel free to pick a different virtualenv name than "ansible-tutorial"&lt;/span&gt;
pyenv virtualenv 3.9.2 ansible-tutorial

&lt;span class="c"&gt;# Create a .python-version file that pyenv will find when your shell is in the &lt;/span&gt;
&lt;span class="c"&gt;# same directory (or a sub-directory) and automatically activate the named&lt;/span&gt;
&lt;span class="c"&gt;# virtualenv&lt;/span&gt;
pyenv &lt;span class="nb"&gt;local &lt;/span&gt;ansible-tutorial


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Install the &lt;code&gt;ansible&lt;/code&gt; pip package, which will install various command-line tools, including &lt;code&gt;ansible-playbook&lt;/code&gt;, which we'll use to run a "playbook" of commands that will set up a VPN server and client for us.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

pip &lt;span class="nb"&gt;install &lt;/span&gt;ansible

&lt;span class="c"&gt;# Confirm installation worked&lt;/span&gt;
ansible &lt;span class="nt"&gt;--version&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  Get a Server
&lt;/h1&gt;

&lt;p&gt;To use Ansible for a VPN server, we need... a server! Ansible could provision a server from a cloud provider for us (and I'll touch on this briefly later), but we'll keep our playbook hardware-provider-agnostic for now, so you can run it as easily against a cloud server as a Raspberry Pi on your home network. I'm going to &lt;a href="https://www.digitalocean.com/docs/droplets/how-to/create/" rel="noopener noreferrer"&gt;create a $5/month DigitalOcean droplet&lt;/a&gt; to test against, but you could also &lt;a href="https://docs.ansible.com/ansible/latest/scenario_guides/guide_vagrant.html" rel="noopener noreferrer"&gt;use Vagrant&lt;/a&gt; (to test against a local VM) or any server you can SSH to.&lt;/p&gt;

&lt;p&gt;Testing Ansible playbooks against VMs, rather than a bare-metal machine, comes with an advantage — after you've written the playbook, you can start a new, empty VM and test the whole playbook start to finish to ensure that it works consistently.&lt;/p&gt;
&lt;h1&gt;
  
  
  Connecting to the Server with Ansible
&lt;/h1&gt;

&lt;p&gt;Once you have your server or VM, take note of its IP address use it to create an &lt;code&gt;inventory.ini&lt;/code&gt; file like the below:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

[vpn]
vpn_server ansible_host=203.0.113.1 ansible_user=root


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;An &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html" rel="noopener noreferrer"&gt;inventory file&lt;/a&gt; tells Ansible what servers it can act upon and how to access them. Let's use the above inventory file as an example. When we run Ansible and target the &lt;code&gt;vpn&lt;/code&gt; &lt;strong&gt;group&lt;/strong&gt; of servers or the &lt;code&gt;vpn_server&lt;/code&gt; &lt;strong&gt;host&lt;/strong&gt;, it will try to connect to the server using a command like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

ssh root@203.0.113.1


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;So, if you can't SSH to the server, then Ansible won't be able to connect either!&lt;/p&gt;

&lt;p&gt;Connecting to the server with an SSH key is strongly recommended! &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2" rel="noopener noreferrer"&gt;Add your SSH key to your server&lt;/a&gt; to connect without needing a password. If you must connect with a password, you can &lt;code&gt;sudo apt install sshpass&lt;/code&gt; and then provide your SSH password when using Ansible by adding the &lt;code&gt;--ask-pass&lt;/code&gt; flag to all ansible commands.&lt;/p&gt;

&lt;p&gt;Let's test to make sure that Ansible can connect to the server:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

ansible &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini &lt;span class="nt"&gt;-m&lt;/span&gt; ping vpn


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This runs the &lt;a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/ping_module.html" rel="noopener noreferrer"&gt;ping Ansible module&lt;/a&gt;, targeting the &lt;code&gt;vpn&lt;/code&gt; group of servers. You should see "pong" in the output, meaning that Ansible could connect to the server and the server has a Python installation that Ansible can use.&lt;/p&gt;

&lt;h1&gt;
  
  
  Ansible's Built-in Variables and Facts
&lt;/h1&gt;

&lt;p&gt;There are other useful Ansible modules that we can use with the &lt;code&gt;ansible&lt;/code&gt; command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/setup_module.html" rel="noopener noreferrer"&gt;setup module&lt;/a&gt; fetches &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_vars_facts.html" rel="noopener noreferrer"&gt;system information, also known as "facts"&lt;/a&gt;, about the server. You can use these facts as variables in Ansible commands and playbooks.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/debug_module.html" rel="noopener noreferrer"&gt;debug module&lt;/a&gt; can evaluate variables, which is useful for... well, debugging!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Try running both of these modules with your server so you can see what facts and information Ansible makes available:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

ansible &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini &lt;span class="nt"&gt;-m&lt;/span&gt; setup vpn
ansible &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini &lt;span class="nt"&gt;-m&lt;/span&gt; debug &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="s2"&gt;"var=hostvars"&lt;/span&gt; vpn


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This was one of the most confusing parts for me when learning Ansible — figuring out what all these built-in variables and facts (like &lt;code&gt;groups&lt;/code&gt;, &lt;code&gt;inventory_dir&lt;/code&gt;, and &lt;code&gt;ansible_distribution&lt;/code&gt;) were and how to find them.&lt;/p&gt;

&lt;h1&gt;
  
  
  Writing an Ansible Playbook
&lt;/h1&gt;

&lt;p&gt;The &lt;code&gt;ansible&lt;/code&gt; command lets you run &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/intro_adhoc.html" rel="noopener noreferrer"&gt;ad-hoc commands&lt;/a&gt; across groups of servers. This is powerful, but we probably shouldn't try to automate server setup and configuration in a single &lt;code&gt;ansible&lt;/code&gt; command... probably. 🤔 Instead, we can organize multiple tasks in one or multiple YAML files, which we will run with the &lt;code&gt;ansible-playbook&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;Let's write a &lt;code&gt;playbook.yml&lt;/code&gt; file In the same folder as &lt;code&gt;inventory.ini&lt;/code&gt;. Here are its contents:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn_server&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ping&lt;/span&gt;
    &lt;span class="na"&gt;ping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;show variables and facts&lt;/span&gt;
    &lt;span class="na"&gt;debug&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;var=hostvars&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you're not familiar with &lt;a href="https://en.wikipedia.org/wiki/YAML" rel="noopener noreferrer"&gt;YAML&lt;/a&gt;, the above is equivalent to this JSON structure:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="err"&gt;'name':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'setup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;vpn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;server'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;'hosts':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'vpn_server'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;'tasks':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="err"&gt;'name':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'ping'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'ping':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;None&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;'name':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'show&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;variables&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;facts'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'debug':&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;'var=hostvars'&lt;/span&gt;&lt;span class="p"&gt;}]}]&lt;/span&gt;&lt;span class="w"&gt;


&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Breaking down the above:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The top-level structure is a "play" in Ansible lexicon. Our play above has a &lt;code&gt;name&lt;/code&gt;, a &lt;code&gt;hosts&lt;/code&gt; &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/intro_patterns.html#intro-patterns" rel="noopener noreferrer"&gt;pattern&lt;/a&gt; which describes which servers the play will run against, and a list of &lt;code&gt;tasks&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;We have 2 tasks, each has a &lt;code&gt;name&lt;/code&gt; and the name of an Ansible module that will do something.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run the playbook...&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini playbook.yml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;... and you'll see that it gathers facts from the server (just like the &lt;code&gt;ansible -m setup&lt;/code&gt; command above did), and then runs the "ping" task and the "debug" task to show all the gathered facts and variables defined for &lt;code&gt;vpn_server&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There are tons of &lt;a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/index.html#modules" rel="noopener noreferrer"&gt;built-in Ansible modules&lt;/a&gt;, even more &lt;a href="https://docs.ansible.com/ansible/latest/collections/index.html" rel="noopener noreferrer"&gt;curated Ansible community modules&lt;/a&gt;, and even more published to &lt;a href="https://galaxy.ansible.com/home" rel="noopener noreferrer"&gt;Ansible Galaxy&lt;/a&gt; (an open repository for Ansible collections and roles).&lt;/p&gt;

&lt;h1&gt;
  
  
  WireGuard Server Setup
&lt;/h1&gt;

&lt;p&gt;There's much more to learn about Ansible! But let's stop here and apply what we've learned in order to set up a WireGuard server. &lt;/p&gt;

&lt;p&gt;Referring to the steps we took in &lt;a href="https://www.tangramvision.com/blog/what-they-dont-tell-you-about-setting-up-a-wireguard-vpn" rel="noopener noreferrer"&gt;the previous tutorial&lt;/a&gt;, we want to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the &lt;code&gt;wireguard&lt;/code&gt; system package&lt;/li&gt;
&lt;li&gt;Create public and private keys with correct permissions&lt;/li&gt;
&lt;li&gt;Create the server's WireGuard configuration file&lt;/li&gt;
&lt;li&gt;(Optionally) Enable IP forwarding for relaying traffic&lt;/li&gt;
&lt;li&gt;Start the VPN&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Managing the Keys
&lt;/h2&gt;

&lt;p&gt;As &lt;a href="https://www.tangramvision.com/blog/what-they-dont-tell-you-about-setting-up-a-wireguard-vpn" rel="noopener noreferrer"&gt;hinted at in the previous tutorial&lt;/a&gt;, if we want to repeatably deploy the VPN server without needing to reconfigure all VPN clients, we need to use the same private key every time. &lt;/p&gt;

&lt;p&gt;Put another way: if we generated a private key while deploying the server and used the corresponding public key on various clients, and the server ends up dying, we &lt;em&gt;could&lt;/em&gt; deploy it again by generating a new private key. However, all of our VPN clients would then need to update to the &lt;em&gt;new&lt;/em&gt; public key to be able to connect to the &lt;em&gt;new&lt;/em&gt; VPN server. This would be inconvenient! &lt;/p&gt;

&lt;p&gt;Instead, we'll generate the server keys once by hand and use them in the playbook so they're consistent between every deploy. This means we won't include step #2 from above in the Ansible playbook.&lt;/p&gt;

&lt;p&gt;Generate the keys with &lt;code&gt;wg genkey&lt;/code&gt; and &lt;code&gt;wg pubkey&lt;/code&gt; commands. You can output both with the following command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nv"&gt;privkey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;wg genkey&lt;span class="si"&gt;)&lt;/span&gt; sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'echo "
    server_privkey: $privkey
    server_pubkey: $(echo $privkey | wg pubkey)"'&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Copy the output lines and add them to a new &lt;code&gt;vars&lt;/code&gt; mapping under the play in &lt;code&gt;playbook.yml&lt;/code&gt;. Here's what mine looks like now (your keys will be different):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn_server&lt;/span&gt;
  &lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;server_privkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aBYk1JZyP8ck+FeaTjb3xi94U4Nv8V+gWoTW1hRLQlo=&lt;/span&gt;
    &lt;span class="na"&gt;server_pubkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Encrypting the Private Key
&lt;/h3&gt;

&lt;p&gt;It's a good practice to AVOID having secrets in plaintext (like the VPN private key above). This is especially true if those secrets will be shared with anyone else, like via a git repo. Let's prevent this by using &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/vault.html" rel="noopener noreferrer"&gt;Ansible Vault&lt;/a&gt;. Vault is a tool for encrypting secret values and using them in playbooks. Encrypt the private key with:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="s"&gt;ansible-vault encrypt_string --ask-vault-password --stdin-name server_privkey&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You'll be prompted twice for a Vault encryption password, after which you'll paste your &lt;code&gt;privkey&lt;/code&gt; value and hit &lt;code&gt;Ctrl+d&lt;/code&gt; twice. If the command completed after a single &lt;code&gt;Ctrl+d&lt;/code&gt;, try again and make sure you're not copy-pasting an invisible newline character at the end of the &lt;code&gt;privkey&lt;/code&gt; value. Copy the output into your playbook, which will now look like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn_server&lt;/span&gt;
  &lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;server_privkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!vault&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;$ANSIBLE_VAULT;1.1;AES256&lt;/span&gt;
          &lt;span class="s"&gt;646438636565343063343631326136386239623935393637336539653636386135363&lt;/span&gt;
          &lt;span class="s"&gt;663386639393232346534643163656363316234306439306566306534610a31326664&lt;/span&gt;
          &lt;span class="s"&gt;363763663139383034636632343230376365333130333230373866353033326563303&lt;/span&gt;
          &lt;span class="s"&gt;5636138373830633534373033303536303566663166616539360a3936353033663263&lt;/span&gt;
          &lt;span class="s"&gt;336662663034376661616631343661333164363134373061343739633637623739306&lt;/span&gt;
          &lt;span class="s"&gt;465653532383838393662396333623966343165366635353132396332313762343534&lt;/span&gt;
          &lt;span class="s"&gt;65313761623964653532623839356633343838&lt;/span&gt;
    &lt;span class="na"&gt;server_pubkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s"&gt;...&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Make sure to remember your encryption password (and save it in a password manager); you'll need to enter it every time you run the playbook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing and Configuring WireGuard
&lt;/h2&gt;

&lt;p&gt;Next, we'll remove our testing &lt;code&gt;ping&lt;/code&gt; and &lt;code&gt;debug&lt;/code&gt; tasks and write tasks for steps 1, 3, 4, and 5 from the above list. These steps translate neatly into Ansible tasks in our updated &lt;code&gt;playbook.yml&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn_server&lt;/span&gt;
  &lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;server_privkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!vault&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;$ANSIBLE_VAULT;1.1;AES256&lt;/span&gt;
          &lt;span class="s"&gt;646438636565343063343631326136386239623935393637336539653636386135363&lt;/span&gt;
          &lt;span class="s"&gt;663386639393232346534643163656363316234306439306566306534610a31326664&lt;/span&gt;
          &lt;span class="s"&gt;363763663139383034636632343230376365333130333230373866353033326563303&lt;/span&gt;
          &lt;span class="s"&gt;5636138373830633534373033303536303566663166616539360a3936353033663263&lt;/span&gt;
          &lt;span class="s"&gt;336662663034376661616631343661333164363134373061343739633637623739306&lt;/span&gt;
          &lt;span class="s"&gt;465653532383838393662396333623966343165366635353132396332313762343534&lt;/span&gt;
          &lt;span class="s"&gt;65313761623964653532623839356633343838&lt;/span&gt;
    &lt;span class="na"&gt;server_pubkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# https://docs.ansible.com/ansible/latest/collections/ansible/builtin/apt_module.html&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;install wireguard package&lt;/span&gt;
    &lt;span class="na"&gt;apt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wireguard&lt;/span&gt;
      &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt;
      &lt;span class="na"&gt;update_cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;

  &lt;span class="c1"&gt;# https://docs.ansible.com/ansible/latest/collections/ansible/builtin/copy_module.html&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create server wireguard config&lt;/span&gt;
    &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/wg0.conf&lt;/span&gt;
      &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;server_wg0.conf.j2&lt;/span&gt;
      &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0600'&lt;/span&gt;

  &lt;span class="c1"&gt;# https://docs.ansible.com/ansible/latest/collections/ansible/posix/sysctl_module.html&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enable and persist ip forwarding&lt;/span&gt;
    &lt;span class="na"&gt;sysctl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;net.ipv4.ip_forward&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
      &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt;
      &lt;span class="na"&gt;sysctl_set&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
      &lt;span class="na"&gt;reload&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;

  &lt;span class="c1"&gt;# https://docs.ansible.com/ansible/latest/collections/ansible/builtin/systemd_module.html&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start wireguard and enable on boot&lt;/span&gt;
    &lt;span class="na"&gt;systemd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wg-quick@wg0&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
      &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;started&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Ok ok, yes, this is a bit like drawing an owl.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16uohe7hk5tut2mpfbn5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16uohe7hk5tut2mpfbn5.jpg" alt="Draw an owl in 2 steps meme"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Source: &lt;a href="https://29.media.tumblr.com/tumblr_l7iwzq98rU1qa1c9eo1_500.jpg" rel="noopener noreferrer"&gt;https://29.media.tumblr.com/tumblr_l7iwzq98rU1qa1c9eo1_500.jpg&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;...but usually an ansible playbook like the above can be written quickly. I follow a cycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Type "ansible module install package" into a search engine&lt;/li&gt;
&lt;li&gt;Open the &lt;a href="http://docs.ansible.com" rel="noopener noreferrer"&gt;docs.ansible.com&lt;/a&gt; result that looks most helpful&lt;/li&gt;
&lt;li&gt;Read through available parameters and the (often helpful) examples at the bottom&lt;/li&gt;
&lt;li&gt;Copy an example into my playbook and modify parameters as needed&lt;/li&gt;
&lt;li&gt;Go back to step 1, searching for the next task (e.g. "ansible module template file")&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've included a comment line linking to the Ansible docs page for each module used in the &lt;code&gt;playbook.yml&lt;/code&gt; above, in case you want to read about the parameters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing our First Attempt
&lt;/h2&gt;

&lt;p&gt;Let's test our playbook.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="s"&gt;$ ansible-playbook -i inventory.ini --ask-vault-password playbook.yml&lt;/span&gt;
&lt;span class="na"&gt;Vault password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 

&lt;span class="s"&gt;PLAY [setup vpn server] ********************************************************&lt;/span&gt;

&lt;span class="s"&gt;TASK [Gathering Facts] *********************************************************&lt;/span&gt;
&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;TASK [install wireguard package] ***********************************************&lt;/span&gt;
&lt;span class="na"&gt;changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;TASK [create server wireguard config] ******************************************&lt;/span&gt;
&lt;span class="na"&gt;fatal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;FAILED! =&amp;gt; {"changed"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="s"&gt;, "msg"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Could&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;find&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;access&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'server_wg0.conf.j2'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Searched&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;..."&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;

&lt;span class="s"&gt;PLAY RECAP *********************************************************************&lt;/span&gt;
&lt;span class="na"&gt;vpn_server                 &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Oh no! Installing WireGuard was successful, but creating the config failed. Ansible's error messages are usually helpful, and this one indicates that the template file (&lt;code&gt;server_wg0.conf.j2&lt;/code&gt;) we're trying to use to create the server's configuration couldn't be found. Let's create it at &lt;code&gt;templates/server_wg0.conf.j2&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="c1"&gt;# {{ ansible_managed }}&lt;/span&gt;
&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Interface&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="s"&gt;Address = 10.0.1.1/24&lt;/span&gt;
&lt;span class="s"&gt;ListenPort = &lt;/span&gt;&lt;span class="m"&gt;51820&lt;/span&gt;
&lt;span class="s"&gt;PrivateKey = {{ server_privkey }}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;A few notes about the above:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ansible automatically searches in relative paths like &lt;code&gt;templates/&lt;/code&gt; and &lt;code&gt;files/&lt;/code&gt; when running Ansible modules that have a &lt;code&gt;src&lt;/code&gt; parameter. Our &lt;code&gt;template&lt;/code&gt; task has a parameter &lt;code&gt;src: server_wg0.conf.j2&lt;/code&gt;, so Ansible will search for it in the &lt;code&gt;templates/&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt;It's convention to suffix template files with &lt;code&gt;.j2&lt;/code&gt;, to indicate that the file will be &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_templating.html" rel="noopener noreferrer"&gt;templated with Jinja2&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;In Jinja2, values inside double curly braces (&lt;code&gt;{{ variable }}&lt;/code&gt;) will be replaced with the value of the variable. In this template, the &lt;code&gt;server_privkey&lt;/code&gt; variable will be decrypted and its value inserted into the resulting file in place of &lt;code&gt;{{ server_privkey }}&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;{{ ansible_managed }}&lt;/code&gt; text is replaced with the string "Ansible managed". It's a good convention to put this in a comment at the top of templated files, because it signals to anyone reading the file on the server that the file is managed by Ansible — any edits they make could be overwritten when Ansible next runs, so they should find and make edits in the corresponding Ansible playbook and template files instead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's run the test again:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="s"&gt;$ ansible-playbook -i inventory.ini --ask-vault-password playbook.yml&lt;/span&gt;
&lt;span class="na"&gt;Vault password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 

&lt;span class="s"&gt;PLAY [setup vpn server] ********************************************************&lt;/span&gt;

&lt;span class="s"&gt;TASK [Gathering Facts] *********************************************************&lt;/span&gt;
&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;TASK [install wireguard package] ***********************************************&lt;/span&gt;
&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;TASK [create server wireguard config] ******************************************&lt;/span&gt;
&lt;span class="na"&gt;changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;TASK [enable and persist ip forwarding] ****************************************&lt;/span&gt;
&lt;span class="na"&gt;changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;TASK [start wireguard and enable on boot] **************************************&lt;/span&gt;
&lt;span class="na"&gt;changed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;vpn_server&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="s"&gt;PLAY RECAP *********************************************************************&lt;/span&gt;
&lt;span class="na"&gt;vpn_server                 &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It succeeded! The WireGuard interface is now running on the server.&lt;/p&gt;

&lt;p&gt;Notice that the "install wireguard package" step shows &lt;code&gt;ok&lt;/code&gt; instead of &lt;code&gt;changed&lt;/code&gt; this time. The &lt;code&gt;apt&lt;/code&gt; module (and most modules) detect that the server is already in the desired state (the &lt;code&gt;wireguard&lt;/code&gt; package was installed last time we ran the playbook, so it satisfies &lt;code&gt;state=present&lt;/code&gt;) and perform no actions. The task is &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html#desired-state-and-idempotency" rel="noopener noreferrer"&gt;idempotent&lt;/a&gt;, meaning you can run it repeatedly and the outcome is the same. Idempotent tasks make it easy to see what changed and what didn't each time a playbook is run.&lt;/p&gt;

&lt;h1&gt;
  
  
  WireGuard Client Setup
&lt;/h1&gt;

&lt;p&gt;Ansible can also operate on the local machine. To set up our local machine as a client, we want to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the &lt;code&gt;wireguard&lt;/code&gt; system package&lt;/li&gt;
&lt;li&gt;Create public and private keys with correct permissions&lt;/li&gt;
&lt;li&gt;Create the client's WireGuard configuration file, which must include the server's public key&lt;/li&gt;
&lt;li&gt;Start the VPN&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We also need to update the server's configuration file with a &lt;code&gt;[Peer]&lt;/code&gt; section including the client's public key, so the client can connect to the server. The client's public key isn't known until after we create it — we could create client keys manually like we did for the server's keys, but then the playbook wouldn't be able to set up multiple clients without having to manually edit the keys for each client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Acting on Localhost
&lt;/h2&gt;

&lt;p&gt;Because we're targeting a new host (&lt;code&gt;localhost&lt;/code&gt;), we need to write a new play in &lt;code&gt;playbook.yml&lt;/code&gt;. We can put it above the existing play (which targets &lt;code&gt;vpn_server&lt;/code&gt;), so the client's keys are generated before the server config is templated.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn client&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost&lt;/span&gt;
  &lt;span class="na"&gt;connection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
  &lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Use system python so apt package is available&lt;/span&gt;
    &lt;span class="na"&gt;ansible_python_interpreter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/usr/bin/env&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Coming soon&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn&lt;/span&gt;
  &lt;span class="c1"&gt;# Rest of server vars/tasks here...&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Lots of new things here!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We target the local machine with using &lt;code&gt;[localhost](http://localhost)&lt;/code&gt; for the hosts pattern.&lt;/li&gt;
&lt;li&gt;We "connect" locally by using the &lt;code&gt;local&lt;/code&gt; &lt;a href="https://docs.ansible.com/ansible/latest/plugins/connection.html" rel="noopener noreferrer"&gt;connection plugin&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;become: yes&lt;/code&gt; line indicates that the play will run as root, which we need to be able to install the &lt;code&gt;wireguard&lt;/code&gt; package. Ansible will effectively run &lt;code&gt;sudo apt-get install wireguard&lt;/code&gt;, rather than just &lt;code&gt;apt-get install wireguard&lt;/code&gt; (which would fail). Because of this setting, we'll need to run the playbook with the &lt;code&gt;--ask-become-pass&lt;/code&gt; flag. We didn't need this line for the server setup play, because we're already connecting as root via the &lt;code&gt;ansible_user=root&lt;/code&gt; connection variable.&lt;/li&gt;
&lt;li&gt;With the &lt;code&gt;ansible_python_interpreter&lt;/code&gt; var, we tell Ansible to use the system python (which includes the &lt;code&gt;apt&lt;/code&gt; python package). Alternatively, we could &lt;a href="https://github.com/python-poetry/poetry/issues/1363" rel="noopener noreferrer"&gt;install that package&lt;/a&gt; for our current python 3.9.2 installation. If you get a &lt;code&gt;No such file or directory&lt;/code&gt; error, you may need to change the line from &lt;code&gt;python&lt;/code&gt; to &lt;code&gt;python3&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Client Setup Tasks and Config
&lt;/h2&gt;

&lt;p&gt;Writing the Ansible tasks for the client-side VPN setup is similar to the server side.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn clients&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost&lt;/span&gt;
  &lt;span class="na"&gt;connection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
  &lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Use system python so apt package is available&lt;/span&gt;
    &lt;span class="na"&gt;ansible_python_interpreter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/usr/bin/env&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;install wireguard package&lt;/span&gt;
    &lt;span class="na"&gt;apt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wireguard&lt;/span&gt;
      &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt;
      &lt;span class="na"&gt;update_cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;generate private key&lt;/span&gt;
    &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;umask 077 &amp;amp;&amp;amp; wg genkey | tee privatekey | wg pubkey &amp;gt; publickey&lt;/span&gt;
      &lt;span class="na"&gt;chdir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard&lt;/span&gt;
      &lt;span class="na"&gt;creates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/publickey&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get public key&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cat /etc/wireguard/publickey&lt;/span&gt;
    &lt;span class="na"&gt;register&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;publickey_contents&lt;/span&gt;
    &lt;span class="na"&gt;changed_when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;False&lt;/span&gt;

  &lt;span class="c1"&gt;# Save pubkey as a fact, so we can use it to template wg0.conf for the server&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;set public key fact&lt;/span&gt;
    &lt;span class="na"&gt;set_fact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;pubkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;publickey_contents.stdout&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create client wireguard config&lt;/span&gt;
    &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/wg0.conf&lt;/span&gt;
      &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;client_wg0.conf.j2&lt;/span&gt;
      &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0600'&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn_server&lt;/span&gt;
  &lt;span class="c1"&gt;# Rest of server vars/tasks here...&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Breaking this down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing the &lt;code&gt;wireguard&lt;/code&gt; package should look very familiar!&lt;/li&gt;
&lt;li&gt;We generate keys with the &lt;code&gt;shell&lt;/code&gt; module so we can use pipes and file redirection. The keys are only generated if the &lt;code&gt;publickey&lt;/code&gt; file doesn't already exist, thanks to the &lt;code&gt;creates&lt;/code&gt; parameter.&lt;/li&gt;
&lt;li&gt;Next, we need to save the public key so we can add it as a &lt;code&gt;[Peer]&lt;/code&gt; section in the server config. Normally, we'd use &lt;code&gt;{{ lookup('file', '/etc/wireguard/publickey') }}&lt;/code&gt; to look up a value from a file, but the file lookup modules &lt;a href="https://github.com/ansible/ansible/issues/8297#issuecomment-141109132" rel="noopener noreferrer"&gt;seems not to respect &lt;code&gt;become: yes&lt;/code&gt;&lt;/a&gt;; it tries to read the file without escalating to root privileges and fails as a result. So, we instead &lt;code&gt;cat&lt;/code&gt; the file and save the resulting output as a fact.&lt;/li&gt;
&lt;li&gt;Finally, template the client config file. Its contents closely match the &lt;a href="https://www.tangramvision.com/blog/what-they-dont-tell-you-about-setting-up-a-wireguard-vpn" rel="noopener noreferrer"&gt;previous tutorial's&lt;/a&gt;, but we use the &lt;code&gt;ansible_host&lt;/code&gt; IP address of the VPN server from &lt;code&gt;inventory.ini&lt;/code&gt; to set the server's endpoint.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Interface&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# The address your computer will use on the VPN&lt;/span&gt;
&lt;span class="s"&gt;Address = 10.0.0.8/32&lt;/span&gt;

&lt;span class="c1"&gt;# Load your privatekey from file&lt;/span&gt;
&lt;span class="s"&gt;PostUp = wg set %i private-key /etc/wireguard/privatekey&lt;/span&gt;
&lt;span class="c1"&gt;# Also ping the vpn server to ensure the tunnel is initialized&lt;/span&gt;
&lt;span class="s"&gt;PostUp = ping -c1 10.0.0.1&lt;/span&gt;

&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Peer&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# VPN server's wireguard public key&lt;/span&gt;
&lt;span class="s"&gt;PublicKey = {{ server_pubkey }}&lt;/span&gt;

&lt;span class="c1"&gt;# Public IP address of your VPN server (USE YOURS!)&lt;/span&gt;
&lt;span class="c1"&gt;# Use the floating IP address if you created one for your VPN server&lt;/span&gt;
&lt;span class="s"&gt;Endpoint = {{ hostvars['vpn_server'].ansible_host }}:51820&lt;/span&gt;

&lt;span class="c1"&gt;# 10.0.0.0/24 is the VPN subnet&lt;/span&gt;
&lt;span class="s"&gt;AllowedIPs = 10.0.0.0/24&lt;/span&gt;

&lt;span class="c1"&gt;# To also accept and send traffic to a VPC subnet at 10.110.0.0/20&lt;/span&gt;
&lt;span class="c1"&gt;# AllowedIPs = 10.0.0.0/24,10.110.0.0/20&lt;/span&gt;

&lt;span class="c1"&gt;# To accept traffic from and send traffic to any IP address through the VPN&lt;/span&gt;
&lt;span class="c1"&gt;# AllowedIPs = 0.0.0.0/0&lt;/span&gt;

&lt;span class="c1"&gt;# To keep a connection open from the server to this client&lt;/span&gt;
&lt;span class="c1"&gt;# (Use if you're behind a NAT, e.g. on a home network, and&lt;/span&gt;
&lt;span class="c1"&gt;# want peers to be able to connect to you.)&lt;/span&gt;
&lt;span class="c1"&gt;# PersistentKeepalive = 25&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Managing Variables
&lt;/h2&gt;

&lt;p&gt;If we run the playbook now, it will fail with a &lt;code&gt;'server_pubkey' is undefined&lt;/code&gt; error. That's because &lt;code&gt;server_pubkey&lt;/code&gt; is defined for the play that targets the &lt;strong&gt;server&lt;/strong&gt;, it's not available for the play targeting the &lt;strong&gt;client&lt;/strong&gt;. We need to move the variable somewhere so that it's readable by the entire playbook. &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html#splitting-out-vars" rel="noopener noreferrer"&gt;Ansible looks for YAML files in a &lt;code&gt;group_vars/&lt;/code&gt; folder&lt;/a&gt; where the filename matches server groups in the inventory file. So, we could create a &lt;code&gt;group_vars/vpn.yml&lt;/code&gt; file and declare variables in it, which would be directly usable when running a play against any servers in the &lt;code&gt;vpn&lt;/code&gt; group. We don't include &lt;code&gt;localhost&lt;/code&gt; as a host in the &lt;code&gt;vpn&lt;/code&gt; group (though we could). We'll instead use the special &lt;code&gt;group_vars/all.yml&lt;/code&gt; file, which makes variables available to all hosts. &lt;/p&gt;

&lt;p&gt;Move the server keys' variables from &lt;code&gt;playbook.yml&lt;/code&gt; to &lt;code&gt;group_vars.all.yml&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;server_privkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!vault&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;$ANSIBLE_VAULT;1.1;AES256&lt;/span&gt;
      &lt;span class="s"&gt;646438636565343063343631326136386239623935393637336539653636386135363&lt;/span&gt;
      &lt;span class="s"&gt;663386639393232346534643163656363316234306439306566306534610a31326664&lt;/span&gt;
      &lt;span class="s"&gt;363763663139383034636632343230376365333130333230373866353033326563303&lt;/span&gt;
      &lt;span class="s"&gt;5636138373830633534373033303536303566663166616539360a3936353033663263&lt;/span&gt;
      &lt;span class="s"&gt;336662663034376661616631343661333164363134373061343739633637623739306&lt;/span&gt;
      &lt;span class="s"&gt;465653532383838393662396333623966343165366635353132396332313762343534&lt;/span&gt;
      &lt;span class="s"&gt;65313761623964653532623839356633343838&lt;/span&gt;
&lt;span class="na"&gt;server_pubkey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7/6f7bUT+2hWMEP5BxeK51PGuMuTnQ9pRpkxg5jUSTo=&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Your directory should now look like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;.&lt;/span&gt;
├── group_vars
│   ├── all.yml
├── inventory.ini
├── playbook.yml
└── templates
    ├── client_wg0.conf.j2
    └── server_wg0.conf.j2


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Run the playbook and the client should run all its tasks successfully:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini &lt;span class="nt"&gt;--ask-vault-password&lt;/span&gt; &lt;span class="nt"&gt;--ask-become-pass&lt;/span&gt; playbook.yml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The VPN client is now set up. The only remaining step for the client is to start the VPN after the server is running and configured to accept connections from the client (so the client's &lt;code&gt;PostUp&lt;/code&gt; ping will succeed).&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding a Peer to the Server Config
&lt;/h2&gt;

&lt;p&gt;Add a &lt;code&gt;[Peer]&lt;/code&gt; section to the server template at &lt;code&gt;templates/server_wg0.conf.j2&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="c1"&gt;# {{ ansible_managed }}&lt;/span&gt;
&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Interface&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="s"&gt;Address = 10.0.0.1/24&lt;/span&gt;
&lt;span class="s"&gt;ListenPort = &lt;/span&gt;&lt;span class="m"&gt;51820&lt;/span&gt;
&lt;span class="s"&gt;PrivateKey = {{ server_privkey }}&lt;/span&gt;

&lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Peer&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="s"&gt;PublicKey = {{ hostvars['localhost'].pubkey }}&lt;/span&gt;
&lt;span class="s"&gt;AllowedIPs = 10.0.0.8&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We read the &lt;code&gt;{{ server_privkey }}&lt;/code&gt; from &lt;code&gt;group_vars/all.yml&lt;/code&gt; and we read &lt;code&gt;{{ hostvars['localhost'].pubkey }}&lt;/code&gt; from the &lt;code&gt;set_fact&lt;/code&gt; module that runs during the client-targeted play in the playbook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reloading the Server Config
&lt;/h2&gt;

&lt;p&gt;If we run the playbook, the config file on the server will be updated with the new &lt;code&gt;[Peer]&lt;/code&gt; section, but the WireGuard interface is already running and configured based on the old file contents. We need to reload the configuration when it changes. &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_handlers.html" rel="noopener noreferrer"&gt;Handlers&lt;/a&gt; are the Ansible-provided mechanism for this, and they trigger when a task referencing them &lt;em&gt;changes&lt;/em&gt;. Handlers run at the end of the play in which they're notified, so many tasks could notify a "reload config" handler, but the handler would only run once at the end. Let's create a couple handlers in a &lt;code&gt;handlers&lt;/code&gt; list after the &lt;code&gt;tasks&lt;/code&gt; lists in &lt;code&gt;playbook.yml&lt;/code&gt; and notify them from the &lt;code&gt;create client wireguard config&lt;/code&gt; and &lt;code&gt;create server wireguard config&lt;/code&gt; tasks:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

  &lt;span class="c1"&gt;# ...&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create client wireguard config&lt;/span&gt;
    &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/wg0.conf&lt;/span&gt;
      &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;client_wg0.conf.j2&lt;/span&gt;
      &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0600'&lt;/span&gt;
    &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restart wireguard&lt;/span&gt;

  &lt;span class="na"&gt;handlers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# Restarts WireGuard interface, loading any new config and running PostUp&lt;/span&gt;
  &lt;span class="c1"&gt;# commands in the process. Notify this handler on client config changes.&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restart wireguard&lt;/span&gt;
    &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wg-quick down wg0; wg-quick up wg0&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;executable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/bin/bash&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;setup vpn server&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vpn_server&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create server wireguard config&lt;/span&gt;
    &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;dest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/wg0.conf&lt;/span&gt;
      &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wg0.conf.j2&lt;/span&gt;
      &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
      &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0600'&lt;/span&gt;
    &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reload wireguard config&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="na"&gt;handlers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# Reloads config without disrupting current peer sessions, but does not&lt;/span&gt;
  &lt;span class="c1"&gt;# re-run PostUp commands. Notify this handler on server config changes.&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reload wireguard config&lt;/span&gt;
    &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wg syncconf wg0 &amp;lt;(wg-quick strip wg0)&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;executable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/bin/bash&lt;/span&gt;
&lt;span class="c1"&gt;# ...&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;template&lt;/code&gt; Ansible module only performs an action and marks the task as &lt;em&gt;changed&lt;/em&gt; if the config file changes — it is idempotent. Idempotence is valuable when used with handlers, because the handler will only run when the task changes. Notifying a handler on a task that isn't idempotent may result in the handler always running (e.g. a service is unnecessarily restarted everytime the playbook is run).&lt;/p&gt;

&lt;h2&gt;
  
  
  Start the VPN Client
&lt;/h2&gt;

&lt;p&gt;Add one final play to the end of the playbook to start the client VPN now that the server is configured to accept its connection:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start vpn on clients&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost&lt;/span&gt;
  &lt;span class="na"&gt;connection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local&lt;/span&gt;
  &lt;span class="na"&gt;become&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;
  &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start vpn&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wg-quick up wg0&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  Automation Complete!
&lt;/h1&gt;

&lt;p&gt;Now we can run the whole playbook and — whether the server and client are brand-new or in some intermediate state — this single command will set up a WireGuard VPN server and client!&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

ansible-playbook &lt;span class="nt"&gt;-i&lt;/span&gt; inventory.ini &lt;span class="nt"&gt;--ask-vault-password&lt;/span&gt; &lt;span class="nt"&gt;--ask-become-pass&lt;/span&gt; playbook.yml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The complete Ansible code can be found at: &lt;a href="https://gitlab.com/tangram-vision-oss/tangram-visions-blog" rel="noopener noreferrer"&gt;https://gitlab.com/tangram-vision-oss/tangram-visions-blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are many improvements that could be made:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provision a cloud server automatically, using an Ansible module such as &lt;a href="https://docs.ansible.com/ansible/2.10/collections/community/digitalocean/digital_ocean_droplet_module.html" rel="noopener noreferrer"&gt;community.digitalocean.digital_ocean_droplet&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Automatically &lt;a href="https://docs.ansible.com/ansible/2.10/collections/community/digitalocean/digital_ocean_floating_ip_module.html" rel="noopener noreferrer"&gt;update a floating IP address&lt;/a&gt; when provisioning a new cloud VPN server.&lt;/li&gt;
&lt;li&gt;Configure multiple clients automatically. One approach is to add a &lt;code&gt;vpn_clients&lt;/code&gt; group to the inventory, define VPN IPs in the inventory (e.g. &lt;code&gt;vpn_ip=10.0.0.8&lt;/code&gt;), and use those host variables in the config templates. When templating the server config, &lt;a href="https://jinja.palletsprojects.com/en/2.11.x/templates/#for" rel="noopener noreferrer"&gt;loop&lt;/a&gt; over hostnames in the clients group, adding a new &lt;code&gt;[Peer]&lt;/code&gt; block for each.&lt;/li&gt;
&lt;li&gt;Organize the playbook as &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_roles.html" rel="noopener noreferrer"&gt;roles&lt;/a&gt;, one for the server and one for the client. Roles are more reusable and shareable than playbooks.&lt;/li&gt;
&lt;li&gt;Test and lint with &lt;a href="https://www.jeffgeerling.com/blog/2018/testing-your-ansible-roles-molecule" rel="noopener noreferrer"&gt;molecule&lt;/a&gt; and &lt;a href="https://ansible-lint.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;ansible-lint&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for joining me on this Ansible-learning journey! If you have any suggestions or corrections, please let me know or &lt;a href="https://www.twitter.com/tangramvision" rel="noopener noreferrer"&gt;send us a tweet&lt;/a&gt;, and if you’re curious to learn more about how we improve perception sensors, visit us at &lt;a href="https://www.tangramvision.com/" rel="noopener noreferrer"&gt;Tangram Vision&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>devops</category>
      <category>wireguard</category>
      <category>ansible</category>
    </item>
    <item>
      <title>What They Don’t Tell You About Setting Up A WireGuard VPN</title>
      <dc:creator>Greg Schafer</dc:creator>
      <pubDate>Tue, 12 Jan 2021 19:34:03 +0000</pubDate>
      <link>https://dev.to/tangramvision/what-they-don-t-tell-you-about-setting-up-a-wireguard-vpn-1h2g</link>
      <guid>https://dev.to/tangramvision/what-they-don-t-tell-you-about-setting-up-a-wireguard-vpn-1h2g</guid>
      <description>&lt;p&gt;&lt;a href="https://www.wireguard.com/" rel="noopener noreferrer"&gt;WireGuard&lt;/a&gt; is a relatively new VPN implementation that was added to the Linux 5.6 kernel in 2020 and is faster and simpler than other popular VPN options like IPsec and OpenVPN.&lt;/p&gt;

&lt;p&gt;We'll walk through setting up an IPv4-only WireGuard VPN server on DigitalOcean, and I'll highlight tips and tricks and educational asides that should help you build a deeper understanding and, ultimately, save you time compared to "just copy these code blocks" WireGuard tutorials.&lt;/p&gt;

&lt;h1&gt;
  
  
  Let's get a server!
&lt;/h1&gt;

&lt;p&gt;To set up a VPN, we need two computers that we want to connect. One of these is typically a desktop/laptop/phone in your possession. If you're looking to remotely access company intranet sites and services, the other computer would be a server in an office or on a company cloud network. If you're looking to remotely access your own home network, privately network with family/friends, or encrypt all of your internet traffic, then the other computer would be a personal server on a cloud provider like DigitalOcean or AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fe%2Fe8%2FVPN_overview-en.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fe%2Fe8%2FVPN_overview-en.svg" alt="VPN Connectivity Overview"&gt;&lt;/a&gt;&lt;/p&gt;
VPN connectivity overview. CC BY-SA 4.0, Image attribution: &lt;a href="https://creativecommons.org/licenses/by-sa/4.0/legalcode" rel="noopener noreferrer"&gt;Creative Commons License&lt;/a&gt;



&lt;p&gt;For this walkthrough, we'll use a new Ubuntu 20.04 server on DigitalOcean, though you could follow similar steps using any cloud provider. To create a new DigitalOcean server, follow &lt;a href="https://www.digitalocean.com/docs/droplets/how-to/create/" rel="noopener noreferrer"&gt;their guide to creating a droplet&lt;/a&gt;. A "droplet" is the term DigitalOcean uses for a "server" or a "VM" or an "instance".&lt;/p&gt;

&lt;h2&gt;
  
  
  VPCs and Private Networks
&lt;/h2&gt;

&lt;p&gt;DigitalOcean servers are automatically created in a Virtual Private Cloud aka &lt;a href="https://www.digitalocean.com/docs/networking/vpc/" rel="noopener noreferrer"&gt;VPC&lt;/a&gt; (most cloud providers have VPC or private networking functionality), meaning they have an additional network interface (&lt;code&gt;eth1&lt;/code&gt; in addition to &lt;code&gt;eth0&lt;/code&gt;) and an additional private IP address. All servers, databases, and load balancers created in the same VPC can communicate with each other via their private IP addresses, which is a boost to security because all inbound traffic from the public internet (on &lt;code&gt;eth0&lt;/code&gt;) can be blocked with a firewall.&lt;/p&gt;

&lt;p&gt;You can use your VPN server as a sort of &lt;a href="https://en.wikipedia.org/wiki/Bastion_host" rel="noopener noreferrer"&gt;bastion host&lt;/a&gt; to access other resources inside your VPC using their private IP addresses. That is, your VPN server can route traffic to any IP address in the VPC and all the servers in your VPC can accept traffic only to their private IP addresses (to &lt;code&gt;eth1&lt;/code&gt;), which protects those servers and the services they run from all sorts of attacks. The server configuration section below will mention how to set up this sort of architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can I keep my VPN server up?
&lt;/h2&gt;

&lt;p&gt;Given the importance of VPN uptime — especially if it serves as the only way to access important servers in a VPC or remote company network — it's worth considering how to handle or avoid downtime. There is a range of options and tradeoffs to consider, ordered below in increasing complexity/effort:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do nothing! If you set up a server on DigitalOcean, install and configure the VPN, and take no further actions, then your VPN will go down when the server does. It's not uncommon for DigitalOcean to migrate droplets between physical machines due to hardware issues, and the VPN will be unavailable if the migration can't be performed without downtime. If a more serious issue causes downtime (e.g. accidental &lt;code&gt;rm -rf /&lt;/code&gt;, networking misconfiguration, or a successful attack), then you'll need to set up and configure a new server from scratch to bring your VPN back up. If you didn't save the VPN server's private key offline, you'll need to generate a new private key and reconfigure all VPN clients to be able to connect to the new VPN server.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.digitalocean.com/docs/images/backups/how-to/enable/" rel="noopener noreferrer"&gt;Enable droplet backups&lt;/a&gt;. You can enable backups for an extra +20% of the droplet price, which will take weekly snapshots of the server. If the droplet ends up horribly broken or unresponsive, you can restore the latest backup and your VPN will be working again (in about 1 minute for a 1 GB droplet).&lt;/li&gt;
&lt;li&gt;Set up manual failover. Set up the VPN server and &lt;a href="https://www.digitalocean.com/docs/images/snapshots/how-to/" rel="noopener noreferrer"&gt;take a snapshot&lt;/a&gt;, then restore the snapshot to a new droplet. &lt;a href="https://www.digitalocean.com/docs/networking/floating-ips/how-to/create/" rel="noopener noreferrer"&gt;Point a floating IP&lt;/a&gt; to one of the servers and use that IP address when connecting to the VPN. When the primary/active VPN server goes down for any reason, you can update the floating IP to point to the secondary/standby VPN server and your VPN will work again!&lt;/li&gt;
&lt;li&gt;Set up automatic failover / high-availability. The next step up in sophistication is to either:

&lt;ul&gt;
&lt;li&gt;detect when the VPN server goes down and automatically switch (point a floating IP address) to a healthy standby using something like &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-corosync-pacemaker-and-floating-ips-on-ubuntu-14-04" rel="noopener noreferrer"&gt;Pacemaker&lt;/a&gt;, or&lt;/li&gt;
&lt;li&gt;put a UDP load balancer in front of multiple VPN servers, but... you might need some network trickery to allow multiple active VPN servers with the same IP address and you might also need sticky sessions, which breaks down for roaming clients without some &lt;a href="https://blog.cloudflare.com/warp-technical-challenges/" rel="noopener noreferrer"&gt;protocol-level changes like Cloudflare made for WARP&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  Set up a WireGuard server
&lt;/h1&gt;

&lt;p&gt;With your shiny new server running, let's install and configure WireGuard. For non-Linux platforms, follow the &lt;a href="https://www.wireguard.com/install/" rel="noopener noreferrer"&gt;WireGuard website's instructions and links&lt;/a&gt;. For this walkthrough, I'll show instructions for Ubuntu 20.04, starting with installing the &lt;code&gt;wireguard&lt;/code&gt; package:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;wireguard


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The wireguard package installs two binaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;wg&lt;/code&gt; — a tool for managing configuration of WireGuard interfaces&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;wg-quick&lt;/code&gt; — a convenience script for easily starting and stopping WireGuard interfaces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I encourage reading the manpages (&lt;code&gt;man wg&lt;/code&gt; and &lt;code&gt;man wg-quick&lt;/code&gt;), because they are concise, well-written, and contain a lot of information that is glossed over in most WireGuard tutorials!&lt;/p&gt;

&lt;p&gt;To encrypt and decrypt packets, we need keys. 🔑&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Change to the root user&lt;/span&gt;
&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt;

&lt;span class="c"&gt;# Make sure files created after this point are accessible only to the root user&lt;/span&gt;
&lt;span class="nb"&gt;umask &lt;/span&gt;077

&lt;span class="c"&gt;# Generate keys in /etc/wireguard&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /etc/wireguard
wg genkey | &lt;span class="nb"&gt;tee &lt;/span&gt;privatekey | wg pubkey &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; publickey


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now we have a private key (which only the server should possess and know about) and a public key (which should be shared to all VPN clients that will connect to this server).&lt;/p&gt;

&lt;p&gt;Next, create a configuration file at &lt;code&gt;/etc/wireguard/wg0.conf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If we use &lt;code&gt;wg-quick&lt;/code&gt; (spoiler: we will) to start/stop the VPN interface, it will create the interface with &lt;code&gt;wg0&lt;/code&gt; as the name. You can create other interface config files with other names, such as &lt;code&gt;wg1.conf&lt;/code&gt;, &lt;code&gt;my-company-vpn.conf&lt;/code&gt;, or &lt;code&gt;us_east_1.conf&lt;/code&gt;. The &lt;code&gt;wg-quick&lt;/code&gt; script will create interfaces with names that match the config filename (minus the &lt;code&gt;.conf&lt;/code&gt; part), as long as the name fits the regex tested in &lt;code&gt;/usr/bin/wg-quick&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Print out your private key with &lt;code&gt;cat /etc/wireguard/privatekey&lt;/code&gt; and then add the following to the configuration file:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# /etc/wireguard/wg0.conf on the server&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;Interface]
Address &lt;span class="o"&gt;=&lt;/span&gt; 10.0.0.1/24
ListenPort &lt;span class="o"&gt;=&lt;/span&gt; 51820
&lt;span class="c"&gt;# Use your own private key, from /etc/wireguard/privatekey&lt;/span&gt;
PrivateKey &lt;span class="o"&gt;=&lt;/span&gt; WCzcoJZaxurBVM/wO1ogMZgg5O5W12ON94p38ci+zG4&lt;span class="o"&gt;=&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We'll add the public keys of clients that are allowed to connect to the VPN later, but the above is all you need to run the VPN server for now. Here's what it means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Address = 10.0.0.1/24&lt;/code&gt; — The server will have an IP address in the VPN of &lt;code&gt;10.0.0.1&lt;/code&gt;. The &lt;code&gt;/24&lt;/code&gt; at the end of the IP address is a &lt;a href="https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing" rel="noopener noreferrer"&gt;CIDR mask&lt;/a&gt; and means that the server will relay other traffic in the &lt;code&gt;10.0.0.1-10.0.0.254&lt;/code&gt; range to peers in the VPN.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ListenPort = 51820&lt;/code&gt; — The port that WireGuard will listen to for inbound UDP packets.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PrivateKey = ...&lt;/code&gt; — The private key of the VPN server, used for encryption/decryption.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point, you can start the VPN!&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# This will run a few commands with "ip" and "wg" to&lt;/span&gt;
&lt;span class="c"&gt;# create the interface and configure it&lt;/span&gt;
wg-quick up wg0

&lt;span class="c"&gt;# To see the WireGuard-specific details of the interface&lt;/span&gt;
wg

&lt;span class="c"&gt;# To start the VPN on boot&lt;/span&gt;
systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;wg-quick@wg0


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Find more example commands for inspecting the interface at &lt;a href="https://github.com/pirate/wireguard-docs#inspect" rel="noopener noreferrer"&gt;https://github.com/pirate/wireguard-docs#inspect&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Relaying traffic
&lt;/h2&gt;

&lt;p&gt;Recall from above that &lt;code&gt;Address = 10.0.0.1/24&lt;/code&gt; means the server will relay traffic to peers in the subnet. That is, if you connect to the VPN and &lt;code&gt;ping 10.0.0.14&lt;/code&gt; (and a server exists on the VPN at that address), then your ping will go to the VPN server at &lt;code&gt;10.0.0.1&lt;/code&gt; and be forwarded on to the machine at &lt;code&gt;10.0.0.14&lt;/code&gt;. However, this won't work without one additional piece of configuration: IP Forwarding.&lt;/p&gt;

&lt;p&gt;To enable IP Forwarding, open &lt;code&gt;/etc/sysctl.conf&lt;/code&gt; and uncomment or add the line:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

net.ipv4.ip_forward&lt;span class="o"&gt;=&lt;/span&gt;1


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Then apply the settings by running:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

sysctl &lt;span class="nt"&gt;-p&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now, the VPN server should be able to relay traffic to other VPN hosts. From my understanding, running &lt;code&gt;ping 10.0.0.14&lt;/code&gt; will follow the left-to-right path shown in the diagram below. The diagram doesn't show the ping response from Peer C to Peer A, but you can mentally reverse all the arrows to see what the returning response path would look like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.ibb.co%2FpKrvrpG%2FNetwork-Packet-Paths.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.ibb.co%2FpKrvrpG%2FNetwork-Packet-Paths.png" alt="Network Packet Paths"&gt;&lt;/a&gt;&lt;/p&gt;
The path of network packets from a ping command on Peer A to the destination server, Peer C. The packets enter the VPN at Peer A and route to the VPN server (Peer B), which relays the packets to Peer C via the VPN.



&lt;h2&gt;
  
  
  Troubleshooting relayed traffic
&lt;/h2&gt;

&lt;p&gt;There are many places where something could go wrong, especially when relaying traffic between multiple servers as in the diagram above. When network requests are failing, &lt;code&gt;tcpdump&lt;/code&gt; is a great tool for finding the source of failures and misconfigurations. If you wanted a complete view of the flow in the diagram above, you could run the following &lt;code&gt;tcpdump&lt;/code&gt; commands on each machine:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;sudo &lt;/span&gt;tcpdump &lt;span class="nt"&gt;-nn&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; wg0
&lt;span class="nb"&gt;sudo &lt;/span&gt;tcpdump &lt;span class="nt"&gt;-nn&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; eth0 udp and port 51820


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Just be aware that clocks on servers might be slightly out-of-sync, so comparing timestamps in &lt;code&gt;tcpdump&lt;/code&gt; output between servers could be misleading!&lt;/p&gt;

&lt;p&gt;If you're debugging network packets on a machine with a display like your desktop or laptop, you can use &lt;a href="https://www.wireshark.org/" rel="noopener noreferrer"&gt;Wireshark&lt;/a&gt;, which is a graphical, user-friendly alternative to &lt;code&gt;tcpdump&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For more insight into WireGuard itself, you can enable debug logging by following the instructions at &lt;a href="https://www.wireguard.com/quickstart/#debug-info" rel="noopener noreferrer"&gt;https://www.wireguard.com/quickstart/#debug-info&lt;/a&gt; and then running &lt;code&gt;tail -f /var/log/syslog&lt;/code&gt; to see the log messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Relaying traffic to a VPC or the internet
&lt;/h2&gt;

&lt;p&gt;In addition to using a VPN server to relay traffic between VPN clients, you can use a VPN server as a way to access servers in a VPC (on DigitalOcean or AWS, for example) that are firewalled off from the public internet. This approach requires no change in WireGuard configuration on the server, but you will need to enable masquerading so that responses on one network (e.g. the VPC) can be mapped to the requesting machine on the other network (e.g. the VPN). If you're unfamiliar with masquerading, check out this &lt;a href="https://superuser.com/questions/935969/what-is-masquerade-made-for/935988#935988" rel="noopener noreferrer"&gt;brief explanation&lt;/a&gt;. Assuming your VPN server is connected to the VPC on its &lt;code&gt;eth1&lt;/code&gt; interface, you can enable masquerading on the VPN server with:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

iptables &lt;span class="nt"&gt;-t&lt;/span&gt; nat &lt;span class="nt"&gt;-A&lt;/span&gt; POSTROUTING &lt;span class="nt"&gt;-s&lt;/span&gt; 10.0.0.0/24 &lt;span class="nt"&gt;-o&lt;/span&gt; eth1 &lt;span class="nt"&gt;-j&lt;/span&gt; MASQUERADE


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now, a VPN client such as your laptop should be able to ping servers in the VPC, as in the diagram below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.ibb.co%2F9vK8pr3%2FPing-Command-A.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.ibb.co%2F9vK8pr3%2FPing-Command-A.png" alt="Network Packets Path A"&gt;&lt;/a&gt;&lt;/p&gt;
The path of network packets from a ping command on Peer A to the destination server, Peer C. The packets enter the VPN at Peer A and route to the VPN server (Peer B), which terminates the VPN connection and relays the packets to Peer C via the VPC.



&lt;p&gt;If you want to relay traffic through the VPN server to the internet (in which case, the VPN server is often labeled a &lt;em&gt;bounce server&lt;/em&gt;), enable masquerading on the public-internet-facing interface (e.g. &lt;code&gt;eth0&lt;/code&gt;) of the VPN server:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

iptables &lt;span class="nt"&gt;-t&lt;/span&gt; nat &lt;span class="nt"&gt;-A&lt;/span&gt; POSTROUTING &lt;span class="nt"&gt;-s&lt;/span&gt; 10.0.0.0/24 &lt;span class="nt"&gt;-o&lt;/span&gt; eth0 &lt;span class="nt"&gt;-j&lt;/span&gt; MASQUERADE


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now, a VPN client such as your laptop can visit public internet sites via your VPN — if you're on an unsecured coffeeshop wifi connection or you &lt;a href="https://www.businessinsider.com/trump-fcc-privacy-rules-repeal-explained-2017-4" rel="noopener noreferrer"&gt;don't trust your ISP&lt;/a&gt;, all they'll see is an encrypted VPN connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.ibb.co%2FQ81nLwv%2FPing-Command-B.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.ibb.co%2FQ81nLwv%2FPing-Command-B.png" alt="Network Packets Path B"&gt;&lt;/a&gt;&lt;/p&gt;
The path of network packets from a ping command on Peer A to the destination server on the internet. The packets enter the VPN at Peer A and route to the VPN server (Peer B), which terminates the VPN connection and relays the packets over the public internet to the destination server.



&lt;h2&gt;
  
  
  Firewall rules
&lt;/h2&gt;

&lt;p&gt;We've used &lt;code&gt;iptables&lt;/code&gt; above for masquerading, but &lt;code&gt;iptables&lt;/code&gt; is also important for managing the VPN server's firewall. You can use &lt;code&gt;ufw&lt;/code&gt; instead, but learn and use &lt;code&gt;iptables&lt;/code&gt; if you have the time — &lt;code&gt;iptables&lt;/code&gt; is more foundational and powerful. Regardless of how you manage your firewall (I like &lt;a href="https://vmalli.com/managing-custom-iptables-rules-on-a-debian-docker-host/" rel="noopener noreferrer"&gt;this sort of approach&lt;/a&gt;), you'll need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;allow UDP traffic to the WireGuard ListenPort (51820 in the sample server config above)&lt;/li&gt;
&lt;li&gt;allow traffic forwarded to or from the WireGuard interface &lt;code&gt;wg0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;iptables&lt;/code&gt; commands for those changes are:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

iptables &lt;span class="nt"&gt;-A&lt;/span&gt; INPUT &lt;span class="nt"&gt;-p&lt;/span&gt; udp &lt;span class="nt"&gt;-m&lt;/span&gt; udp &lt;span class="nt"&gt;--dport&lt;/span&gt; 51820 &lt;span class="nt"&gt;-j&lt;/span&gt; ACCEPT

iptables &lt;span class="nt"&gt;-A&lt;/span&gt; FORWARD &lt;span class="nt"&gt;-i&lt;/span&gt; wg0 &lt;span class="nt"&gt;-j&lt;/span&gt; ACCEPT
iptables &lt;span class="nt"&gt;-A&lt;/span&gt; FORWARD &lt;span class="nt"&gt;-o&lt;/span&gt; wg0 &lt;span class="nt"&gt;-j&lt;/span&gt; ACCEPT


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Many WireGuard tutorials suggest putting these &lt;code&gt;iptables&lt;/code&gt; commands in the &lt;code&gt;PostUp&lt;/code&gt; lines of the server WireGuard configuration, meaning the commands will be run when the &lt;code&gt;wg0&lt;/code&gt; interface is created. Be warned that, depending on how you manage your firewall, you may end up erasing these commands if you restart your firewall while the WireGuard interface is running, thereby making the VPN unreachable. Consider managing WireGuard firewall rules in the same place and with the same tool that you manage all your other firewall rules.&lt;/p&gt;

&lt;h1&gt;
  
  
  Set up a WireGuard client
&lt;/h1&gt;

&lt;p&gt;Similar to the server setup, install WireGuard (follow the &lt;a href="https://www.wireguard.com/install/" rel="noopener noreferrer"&gt;WireGuard website's instructions and links&lt;/a&gt; for non-Linux platforms):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;wireguard


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Generate keys, similar to server setup:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Change to the root user&lt;/span&gt;
&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt;

&lt;span class="c"&gt;# Make sure files created after this point are accessible only to the root user&lt;/span&gt;
&lt;span class="nb"&gt;umask &lt;/span&gt;077

&lt;span class="c"&gt;# Generate keys in /etc/wireguard&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /etc/wireguard
wg genkey | &lt;span class="nb"&gt;tee &lt;/span&gt;privatekey | wg pubkey &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; publickey


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Next, create a configuration file at &lt;code&gt;/etc/wireguard/wg0.conf&lt;/code&gt; with the following content:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# /etc/wireguard/wg0.conf on the client&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;Interface]
&lt;span class="c"&gt;# The address your computer will use on the VPN&lt;/span&gt;
Address &lt;span class="o"&gt;=&lt;/span&gt; 10.0.0.8/32

&lt;span class="c"&gt;# Load your privatekey from file&lt;/span&gt;
PostUp &lt;span class="o"&gt;=&lt;/span&gt; wg &lt;span class="nb"&gt;set&lt;/span&gt; %i private-key /etc/wireguard/privatekey
&lt;span class="c"&gt;# Also ping the vpn server to ensure the tunnel is initialized&lt;/span&gt;
PostUp &lt;span class="o"&gt;=&lt;/span&gt; ping &lt;span class="nt"&gt;-c1&lt;/span&gt; 10.0.0.1

&lt;span class="o"&gt;[&lt;/span&gt;Peer]
&lt;span class="c"&gt;# VPN server's wireguard public key (USE YOURS!)&lt;/span&gt;
PublicKey &lt;span class="o"&gt;=&lt;/span&gt; CcZHeaO08z55/x3FXdsSGmOQvZG32SvHlrwHnsWlGTs&lt;span class="o"&gt;=&lt;/span&gt;

&lt;span class="c"&gt;# Public IP address of your VPN server (USE YOURS!)&lt;/span&gt;
&lt;span class="c"&gt;# Use the floating IP address if you created one for your VPN server&lt;/span&gt;
Endpoint &lt;span class="o"&gt;=&lt;/span&gt; 123.123.123.123:51820

&lt;span class="c"&gt;# 10.0.0.0/24 is the VPN subnet&lt;/span&gt;
AllowedIPs &lt;span class="o"&gt;=&lt;/span&gt; 10.0.0.0/24

&lt;span class="c"&gt;# To also accept and send traffic to a VPC subnet at 10.110.0.0/20&lt;/span&gt;
&lt;span class="c"&gt;# AllowedIPs = 10.0.0.0/24,10.110.0.0/20&lt;/span&gt;

&lt;span class="c"&gt;# To accept traffic from and send traffic to any IP address through the VPN&lt;/span&gt;
&lt;span class="c"&gt;# AllowedIPs = 0.0.0.0/0&lt;/span&gt;

&lt;span class="c"&gt;# To keep a connection open from the server to this client&lt;/span&gt;
&lt;span class="c"&gt;# (Use if you're behind a NAT, e.g. on a home network, and&lt;/span&gt;
&lt;span class="c"&gt;# want peers to be able to connect to you.)&lt;/span&gt;
&lt;span class="c"&gt;# PersistentKeepalive = 25&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;There's lots to talk about here!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Address = ...&lt;/code&gt; — Set the IP address of this client in the VPN. Packets sent to the VPN server with a destination of this address will be sent to whatever public IP address (endpoint) this client was last seen at.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PostUp = wg set %i private-key ...&lt;/code&gt; — Load the private key from the file after the &lt;code&gt;wg0&lt;/code&gt; interface is up. You can copy-paste the contents of the private key file into a &lt;code&gt;PrivateKey&lt;/code&gt; line directly (as in the server config) if you prefer. I suggest &lt;strong&gt;not&lt;/strong&gt; loading the private key via &lt;code&gt;PostUp&lt;/code&gt; in the VPN &lt;strong&gt;server&lt;/strong&gt; config however, because reloading the config (e.g. after adding a new client/peer) does not re-run &lt;code&gt;PostUp&lt;/code&gt; commands, so the VPN will no longer know its private key and the VPN won't work as a result.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PostUp = ping -c1 10.0.0.1&lt;/code&gt; — Ping the VPN server after the &lt;code&gt;wg0&lt;/code&gt; interface is up to test that the VPN connection was successful. If the ping fails, &lt;code&gt;wg-quick&lt;/code&gt; will take the interface back down. In my testing, sending traffic from the VPN server to the client didn't work until &lt;em&gt;something&lt;/em&gt; was sent from the client to the server — sending 1 ping packet to the server with &lt;code&gt;PostUp&lt;/code&gt; does the trick.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[Peer]&lt;/code&gt; — There can be multiple peer sections in the config, one for each VPN peer you wish to connect directly to. Often, the VPN server will be the only peer in a client's config file. Lines under the &lt;code&gt;[Peer]&lt;/code&gt; header define how and where the client will connect to the peer.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PublicKey = ...&lt;/code&gt; — The public key of the VPN server.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EndPoint = ...&lt;/code&gt; — The (usually publicly-accessible) IP address of your VPN server. This could be a floating IP address if you're using a cloud provider like DigitalOcean or AWS.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AllowedIPs = ...&lt;/code&gt; — For incoming packets from the VPN server, their source IP address must match the addresses or ranges in &lt;code&gt;AllowedIPs&lt;/code&gt;. For outgoing packets, the &lt;code&gt;AllowedIPs&lt;/code&gt; is the mapping that tells WireGuard what peer (specifically their public key and endpoint) should be used when encrypting and sending. The last example (&lt;code&gt;AllowedIPs = 0.0.0.0/0&lt;/code&gt;) would enable WireGuard to send traffic destined for &lt;strong&gt;any&lt;/strong&gt; IP address to the VPN server. With &lt;code&gt;AllowedIPs = 0.0.0.0/0&lt;/code&gt;, &lt;code&gt;wg-quick up&lt;/code&gt; will conveniently run &lt;code&gt;ip route&lt;/code&gt; and &lt;code&gt;ip rule&lt;/code&gt; commands to route all your traffic through the VPN (useful in the aforementioned unsecured coffeeshop wifi or malicious ISP scenarios). For more info on how &lt;code&gt;AllowedIPs&lt;/code&gt; works, check out &lt;a href="https://www.wireguard.com/#cryptokey-routing" rel="noopener noreferrer"&gt;WireGuard's documentation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PersistentKeepalive = 25&lt;/code&gt; — Send a packet to the VPN server every 25 seconds, to ensure that the server can successfully route traffic to the client when the client doesn't have a public or stable IP address. Without this setting, the client can still send traffic to the VPN server and receive responses, but routers between the client and the server only keep their NAT/masquerade mapping for a few dozen seconds. After the mapping expires, the server won't be able to send anything to the client until the client sends something first. You typically won't enable this setting, unless you want to allow new connections from other devices on the VPN — for example, you would enable this on your home desktop if you wanted to connect to it from your laptop or phone while traveling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before starting the VPN on the client, the VPN server needs to be configured to allow connections from the client. Open &lt;code&gt;/etc/wireguard/wg0.conf&lt;/code&gt; on the VPN server again and update the contents to match:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# /etc/wireguard/wg0.conf on the server&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;Interface]
Address &lt;span class="o"&gt;=&lt;/span&gt; 10.0.0.1/24
ListenPort &lt;span class="o"&gt;=&lt;/span&gt; 51820
&lt;span class="c"&gt;# Use your own private key, from /etc/wireguard/privatekey&lt;/span&gt;
PrivateKey &lt;span class="o"&gt;=&lt;/span&gt; WCzcoJZaxurBVM/wO1ogMZgg5O5W12ON94p38ci+zG4&lt;span class="o"&gt;=&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;Peer]
&lt;span class="c"&gt;# VPN client's public key&lt;/span&gt;
PublicKey &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;lIINA9aXWqLzbkApDsg3cpQ3m4LnPS0OXogSasNW5RY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="c"&gt;# VPN client's IP address in the VPN&lt;/span&gt;
AllowedIPs &lt;span class="o"&gt;=&lt;/span&gt; 10.0.0.8/32


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The added &lt;code&gt;[Peer]&lt;/code&gt; section enables the VPN server to coordinate encryption keys with the client and validate that traffic from and to the client is allowed. To apply these changes, you can restart the WireGuard interface on the server:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

wg-quick down wg0 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; wg-quick up wg0


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you want to avoid disrupting or dropping active VPN connections, reload the config with:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

wg syncconf wg0 &amp;lt;&lt;span class="o"&gt;(&lt;/span&gt;wg-quick strip wg0&lt;span class="o"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;At this point, you can start the VPN on the client!&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="c"&gt;# This will run a few commands with "ip" and "wg" to&lt;/span&gt;&lt;br&gt;
&lt;span class="c"&gt;# create the interface and configure it&lt;/span&gt;&lt;br&gt;
wg-quick up wg0&lt;/p&gt;

&lt;p&gt;&lt;span class="c"&gt;# To see the WireGuard-specific details of the interface&lt;/span&gt;&lt;br&gt;
wg&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Connecting from a Chromebook&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;If you're connecting to a WireGuard VPN from a Chromebook, I suggest using the &lt;a href="https://play.google.com/store/apps/details?id=com.wireguard.android" rel="noopener noreferrer"&gt;official Android WireGuard app&lt;/a&gt;. My efforts to run WireGuard under &lt;a href="https://github.com/dnschneid/crouton" rel="noopener noreferrer"&gt;crouton&lt;/a&gt; failed, because crouton uses a chroot, so I was stuck with the Chromebook's old Linux kernel (4.19) and unable to add kernel modules or network interfaces from within crouton. Similarly, &lt;a href="https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md" rel="noopener noreferrer"&gt;crostini&lt;/a&gt; doesn't allow updating or using custom kernel modules, but it does provide a great way to SSH into VPN-accessible servers while the Android WireGuard app is active.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting from other devices
&lt;/h2&gt;

&lt;p&gt;If you want to connect to a VPN from devices where you don't have root access, you can try installing a userspace implementation of WireGuard such as &lt;a href="https://git.zx2c4.com/wireguard-go/about/" rel="noopener noreferrer"&gt;wireguard-go&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to connect to a VPN from devices you don't control (e.g. smart TVs, IoT sensors), look into setting up WireGuard on your router (e.g. &lt;a href="https://openwrt.org/docs/guide-user/services/vpn/wireguard/start" rel="noopener noreferrer"&gt;instructions for OpenWRT&lt;/a&gt;), so you can route all those devices' outbound traffic through a VPN.&lt;/p&gt;




&lt;p&gt;Thanks for reading! Hopefully, I’ve saved you time by passing on some of the insights and tips that I learned while digging deeper into the many facets of setting up a WireGuard VPN. If you have any suggestions or corrections, please let me know or &lt;a href="https://www.twitter.com/tangramvision" rel="noopener noreferrer"&gt;send us a tweet&lt;/a&gt;, and if you’re curious to learn more about how we improve perception sensors, visit us at &lt;a href="https://www.tangramvision.com/" rel="noopener noreferrer"&gt;Tangram Vision&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're setting up multiple VPNs or multiple VPN clients — or if you're interested in learning about infrastructure and configuration automation — check out the next tutorial I wrote: &lt;a href="https://www.tangramvision.com/blog/exploring-ansible-via-setting-up-a-wireguard-vpn" rel="noopener noreferrer"&gt;Exploring Ansible via Setting Up a WireGuard VPN&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Corrections
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;2020-01-13: Previously, my explanation of what &lt;code&gt;AllowedIPs&lt;/code&gt; does and how to route all traffic through the VPN was incomplete/misleading. Thanks to &lt;a href="https://twitter.com/thatcks/status/1349439066048180226" rel="noopener noreferrer"&gt;Chris Siebenmann on Twitter&lt;/a&gt; for catching that!&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  References
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.wireguard.com/install/" rel="noopener noreferrer"&gt;https://www.wireguard.com/install/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wireguard.com/papers/wireguard.pdf" rel="noopener noreferrer"&gt;https://www.wireguard.com/papers/wireguard.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pirate/wireguard-docs#Address" rel="noopener noreferrer"&gt;https://github.com/pirate/wireguard-docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ckn.io/blog/2017/11/14/wireguard-vpn-typical-setup/" rel="noopener noreferrer"&gt;https://www.ckn.io/blog/2017/11/14/wireguard-vpn-typical-setup/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/" rel="noopener noreferrer"&gt;https://stanislas.blog/2019/01/how-to-setup-vpn-server-wireguard-nat-ipv6/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>digitalocean</category>
      <category>wireguard</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
