<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Paul Brabban</title>
    <description>The latest articles on DEV Community by Paul Brabban (@brabster).</description>
    <link>https://dev.to/brabster</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F17336%2F2849fb93-c69c-40cb-99ef-a571727b6bf0.jpeg</url>
      <title>DEV Community: Paul Brabban</title>
      <link>https://dev.to/brabster</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brabster"/>
    <language>en</language>
    <item>
      <title>How to get pwned with --extra-index-url</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sat, 06 Dec 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/how-to-get-pwned-with-extra-index-url-462g</link>
      <guid>https://dev.to/brabster/how-to-get-pwned-with-extra-index-url-462g</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbq89zputl32uj6i0jby4.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbq89zputl32uj6i0jby4.webp" alt="A diagram illustrating a dependency confusion attack using Python's pip. The layout compares a " width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Python's built-in pip package manager is unsafe when used with the &lt;code&gt;--extra-index-url&lt;/code&gt; flag (there are other dangerous variants too). An attacker can publish a malicious package with the same name and a higher version to PyPI, and their package will be installed.&lt;/p&gt;

&lt;p&gt;This post confirms that the vulnerability (&lt;a href="https://nvd.nist.gov/vuln/detail/cve-2018-20225" rel="noopener noreferrer"&gt;CVE-2018-20225&lt;/a&gt;) is still a problem today. &lt;a href="https://github.com/pypa/pip/issues/12874" rel="noopener noreferrer"&gt;Despite the CVSS 7.8 (High) CVSS score, the maintainers have refused to change the behaviour&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I also introduce &lt;a href="https://github.com/brabster/cve-2018-20225" rel="noopener noreferrer"&gt;a test suite and publicly-available test packages&lt;/a&gt; that you can use to more easily confirm the safety - or not - of your own setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two variants of package &lt;code&gt;example-package-cve-2018-20225&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;I've written two variants of a new package that I'll use to demonstrate the problem. The package is essentially a single &lt;code&gt;__init__.py&lt;/code&gt; file that prints a message to show which package has been installed when it's imported, along with minimal metadata required to publish the package to a registry.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "safe" variant
&lt;/h3&gt;

&lt;p&gt;The "safe" variant of the package is at version &lt;code&gt;0.0.1&lt;/code&gt;. It prints &lt;code&gt;this is the safe, private package&lt;/code&gt; when imported.&lt;/p&gt;

&lt;p&gt;This package stands in for your intended, usually private, package. I've &lt;a href="https://gitlab.com/api/v4/projects/76907979/packages/pypi/simple" rel="noopener noreferrer"&gt;published it to GitLab&lt;/a&gt; and made the registry public for the convenience of testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "malicious" variant
&lt;/h3&gt;

&lt;p&gt;The "malicious" variant of the package is at version &lt;code&gt;1.0.0&lt;/code&gt;. There's nothing special about &lt;code&gt;1.0.0&lt;/code&gt;, it's just "higher" than &lt;code&gt;0.0.1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This package prints &lt;code&gt;oops, this is the malicious package&lt;/code&gt; when imported. It's &lt;a href="https://pypi.org/project/example-package-cve-2018-20225/" rel="noopener noreferrer"&gt;published to PyPI&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing approach
&lt;/h2&gt;

&lt;p&gt;I've created a &lt;a href="https://github.com/brabster/cve-2018-20225/blob/main/.github/workflows/test_cases.yml" rel="noopener noreferrer"&gt;GitHub actions workflow to test a variety of install and update scenarios&lt;/a&gt;. There are far too many potential tools and combinations to test them all, which is why I've made these packages available publicly. You can use them to test whatever specific scenario you want.&lt;/p&gt;

&lt;p&gt;!!! warning&lt;br&gt;
    The usual disclaimers apply. My intentions are good, but that could change or I could be compromised in the future. Take whatever precautions you can to establish trustworthiness - I've kept the packages simple to aid manual audit.&lt;/p&gt;

&lt;p&gt;All the tests are run against the latest versions (at time of writing) of the package management software. The tests report failure if the malicious package is installed. &lt;a href="https://github.com/brabster/cve-2018-20225/actions" rel="noopener noreferrer"&gt;You can see the current latest test run in the repo's GitHub actions tab&lt;/a&gt;. You can also see the packages and how I published them to PyPI and GitLab in the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test scenarios
&lt;/h2&gt;

&lt;p&gt;I'm trying out a few scenarios I'm interested in. What happens when you specify various combinations of flags (including forgetting the flags) with pip?&lt;/p&gt;

&lt;h3&gt;
  
  
  pip with and without flags
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pip install ${PACKAGE}&lt;/code&gt;: 🚨 Malicious (Default behaviour if the flags are forgotten)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install ${PACKAGE} --index-url ${GITLAB_INDEX_URL}&lt;/code&gt;: ✅ Safe (Replaces PyPI with GitLab as the only source)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install ${PACKAGE} --extra-index-url ${GITLAB_INDEX_URL}&lt;/code&gt;: 🚨 Malicious (Searches both PyPI and GitLab, installs highest version)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install ${PACKAGE} --index-url ${PYPI_INDEX_URL} --extra-index-url ${GITLAB_INDEX_URL}&lt;/code&gt;: 🚨 Malicious (Sets PyPI as primary, GitLab as extra, same behaviour when flag order reversed)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;export PIP_EXTRA_INDEX_URL=${GITLAB_INDEX_URL}; pip install ...&lt;/code&gt;: 🚨 Malicious (Uses environment variable instead of CLI flag)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install -r requirements.txt&lt;/code&gt; (File contains &lt;code&gt;${PACKAGE}&lt;/code&gt;): 🚨 Malicious (Installs from PyPI)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pip install ${PACKAGE} --index-url ...; pip install -U ${PACKAGE} --extra-index-url&lt;/code&gt;: 🚨 Malicious (Installs "safe", then runs update with both indexes to get "malicious")&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GitLab's PyPI pass-through behaviour
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.gitlab.com/user/packages/pypi_repository/#package-request-forwarding-security-notice" rel="noopener noreferrer"&gt;A GitLab registry will pass through requests for packages that it doesn't hold to PyPI. This is flagged as a security risk&lt;/a&gt;. If you're exposed to this vulnerability, it seems like a solid step forward to me. It resolves the dependency confusion problem, works with different package managers and is easy for users.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pip install ${PACKAGE} requests --index-url ${GITLAB_INDEX_URL}&lt;/code&gt;: ✅ Safe (Installs target package + public lib (&lt;code&gt;requests&lt;/code&gt;) from GitLab index only, succeeds and installs the right package)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Is &lt;code&gt;uv&lt;/code&gt; vulnerable?
&lt;/h3&gt;

&lt;p&gt;There are many Python package managers. &lt;code&gt;uv&lt;/code&gt; is the current darling of the community and has &lt;a href="https://docs.astral.sh/uv/reference/cli/#uv-pip-install--extra-index-url" rel="noopener noreferrer"&gt;different behaviour when this flag is used in a pip-like manner&lt;/a&gt;. I've added a couple of tests to confirm the behaviour.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;uv pip install ${PACKAGE} --extra-index-url ${GITLAB_INDEX_URL}&lt;/code&gt;: ✅ Safe (Uses &lt;code&gt;uv&lt;/code&gt; with the risky flag)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;uv pip install ${PACKAGE} --index-strategy unsafe-best-match&lt;/code&gt;: 🚨 Malicious (Uses &lt;code&gt;uv&lt;/code&gt; but forces legacy &lt;code&gt;pip&lt;/code&gt; behaviour)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What about lockfiles?
&lt;/h3&gt;

&lt;p&gt;Assuming you didn't already lock the malicious package, lockfiles only offer temporary protection. When you update, if you update unsafely, you get the malicious package.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;There are many ways to put yourself at risk of CVE-2018-20225; if you get it wrong, an attacker has a trivially easy route onto your computer or infrastructure.&lt;/li&gt;
&lt;li&gt;Being confident that what you're doing is safe isn't trivial; I've provided source code, a suite of scenario results and a test harness to help you.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Avoid
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Using &lt;code&gt;--extra-index-url&lt;/code&gt; with &lt;code&gt;pip&lt;/code&gt; 🚨&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;--index-strategy&lt;/code&gt; with &lt;code&gt;uv&lt;/code&gt; 🚨&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Consider
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Using a private registry with a PyPI pass-through as the &lt;strong&gt;only index&lt;/strong&gt;, which I demonstrated with GitLab.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I doubt I would have known about this problem had it not been for a &lt;a href="https://tempered.works/posts/2024/05/18/handling-cve-2018-20225/" rel="noopener noreferrer"&gt;vulnerability scanner alerting me to it last year&lt;/a&gt;. If you're currently exposed to this problem, you're certainly not alone. When I asked ChatGPT how to safely use a private package registry, the response it generated (based, of course on the content it's been trained on) included using &lt;code&gt;--extra-index-url&lt;/code&gt; with no mention of this risk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37a2gfg50bhavumwrjro.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37a2gfg50bhavumwrjro.webp" alt="ChatGPT conversation showing a response recommending pip install with --extra-index-url flag for private package registry, with text explaining this allows searching both PyPI and private registry" width="800" height="231"&gt;&lt;/a&gt;&lt;br&gt;ChatGPT recommending the vulnerable --extra-index-url approach
 &lt;/p&gt;

</description>
      <category>automation</category>
      <category>operations</category>
      <category>security</category>
    </item>
    <item>
      <title>BigQuery, safer by default from September 2025</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Thu, 17 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/bigquery-safer-by-default-from-september-2025-1bpa</link>
      <guid>https://dev.to/brabster/bigquery-safer-by-default-from-september-2025-1bpa</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfmqweqxwpl2t9amgzof.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfmqweqxwpl2t9amgzof.webp" alt="Snippet from Google's email, hopefully ok for copyright!" width="477" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On September 1st 2025, Google will make BigQuery a lot safer by default, changing the default quotas for projects under the default on-demand pricing model. Instead of unlimited financial damage, the default for new projects will be around $1000 per day. Existing projects will be updated to a custom limit based on prior 30-day usage. No changes to existing limits.&lt;/p&gt;

&lt;p&gt;I received an email entitled "[Action Advised] Review and set appropriate daily usage limit for BigQuery projects before Sep 1, 2025." yesterday. The content is a big step forward for making BigQuery safer by default. Unfortunately, Google's copyrighted the content so I can't share it in full here, and the quota documentation doesn't cover the full content, but I can summarize the key points.&lt;/p&gt;

&lt;h2&gt;
  
  
  New projects
&lt;/h2&gt;

&lt;p&gt;As of September 1st, 2025, new projects using the default on-demand BigQuery pricing model will have a daily usage limit of 200TiB. The current model has no default limit, potentially leading to unlimited financial exposure - as I've written about in &lt;a href="https://tempered.works/posts/2024/02/08/1370-gone-in-sixty-seconds/" rel="noopener noreferrer"&gt;$1,370 Gone in Sixty Seconds&lt;/a&gt;, explaining how a single query could run up a bill over $1,300 in less than a minute, and in &lt;a href="https://tempered.works/posts/2024/02/16/the-bigquery-safety-net/" rel="noopener noreferrer"&gt;The BigQuery Safety Net&lt;/a&gt;, explaining how the quota system can mitigate these risks.&lt;/p&gt;

&lt;p&gt;200TiB is still a lot, and at the $5/TiB current pricing, it's $1000 per day. Most SMEs won't be affected, but an unexpected spike could still be painful. I see no indication that any smart adjustments will be made automatically based on usage patterns, so I'll still be setting my own values for these quotas anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Existing projects
&lt;/h2&gt;

&lt;p&gt;If a project already has a quota value other than &lt;code&gt;unlimited&lt;/code&gt; set for &lt;code&gt;QueryUsagePerDay&lt;/code&gt; or &lt;code&gt;QueryUsagePerUserPerDay&lt;/code&gt;, then no changes will be made. Otherwise, Google will set a quota based on usage in the last 30 days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Auditing
&lt;/h2&gt;

&lt;p&gt;The changes will be visible in audit logs when they occur.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;There are a whole bunch of blog posts out there that are going to need updating!&lt;/p&gt;

&lt;p&gt;Here are the &lt;a href="https://cloud.google.com/bigquery/quotas#query_jobs" rel="noopener noreferrer"&gt;updated BigQuery quota-related docs&lt;/a&gt;. BigQuery's defaults are about to get a lot safer, but could still hurt to the tune of $1000/day. I'll still be setting my own much lower values when I set up projects that use BigQuery.&lt;/p&gt;

</description>
      <category>insights</category>
      <category>operations</category>
    </item>
    <item>
      <title>GitHub Codespaces, one year later</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sat, 07 Jun 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/github-codespaces-one-year-later-3ji9</link>
      <guid>https://dev.to/brabster/github-codespaces-one-year-later-3ji9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha2b4fbcg1zhtvr5i0ud.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha2b4fbcg1zhtvr5i0ud.webp" alt="A screenshot of the GitHub web UI option to create a new Codespace on main" width="304" height="155"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back in early 2024 I tried GitHub Codespaces, and quickly ditched my local dev setup entirely. This post shares my experience going cloud-native for development: benefits for onboarding, agility, and security, alongside the real-world snags like network dependency and unexpected billing quirks.&lt;/p&gt;

&lt;p&gt;A year ago, I wrote about &lt;a href="https://tempered.works/posts/2024/04/23/why-try-codespaces/" rel="noopener noreferrer"&gt;why I was giving GitHub Codespaces a serious try&lt;/a&gt; - mainly dealing with my security concerns. Fast forward to today, and that "try" has become my &lt;strong&gt;exclusive&lt;/strong&gt; development environment. I have not touched my local VSCode setup for personal or professional work since then. From open-source contributions and evaluating new GitHub repos to building a product for a startup client who happened to be using GitHub, Codespaces has been my daily driver.&lt;/p&gt;

&lt;p&gt;I found a few stumbling blocks here and there, but I see those as minor inconveniences. In my opinion, the benefits far outweigh the problems. Let's start with the rainbows before moving onto the rainclouds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Works on my machine, and yours
&lt;/h2&gt;

&lt;p&gt;I joined a team and found another new team member who had been struggling to get their local development environment working correctly for a while. It was consuming a lot of their time, destroying their productivity, and the rest of the small team were unable to help.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/devcontainers" rel="noopener noreferrer"&gt;Codespaces uses the devcontainer spec&lt;/a&gt;. I had already created a &lt;a href="https://containers.dev/implementors/json_reference/" rel="noopener noreferrer"&gt;&lt;code&gt;devcontainer.json&lt;/code&gt;&lt;/a&gt; for the repo we were working on to get myself up and running. When I paired with my team-mate and saw the kinds of difficulties they were having, I suggested trying a Codespace. A couple of clicks plus a minute or so later, they were up and running in their web browser, able to start orienting themselves with the work in progress and very quickly started making significant contributions. It is not just the IDE - the OS version, supporting software like &lt;code&gt;gcloud&lt;/code&gt; or &lt;code&gt;awscli&lt;/code&gt; tooling to authenticate, IDE plugins - all of this can be quite easily managed as part of the repo.&lt;/p&gt;

&lt;p&gt;There are different ways of running Codespaces, including plugging my local VSCode into the Codespace running in the cloud. (I think some &lt;a href="https://blog.jetbrains.com/blog/2022/11/09/remote-development-in-jetbrains-ides-now-available-to-github-codespaces-users/" rel="noopener noreferrer"&gt;other IDEs like IntelliJ are supported by Codespaces&lt;/a&gt;, but I have not tried them myself.) I used this option temporarily to continue working when an &lt;a href="https://tempered.works/posts/2024/10/27/google-chrome-oct-15-update-broke-github-codespaces/" rel="noopener noreferrer"&gt;obscure privacy setting I had enabled broke Codespaces and a number of other websites via a Chrome browser update back in October&lt;/a&gt;. The devcontainer configuration can also be used with a local container runtime, although I was unable to get this working owing to my use of rootless Docker (and Podman) being unsupported at the time.&lt;/p&gt;

&lt;p&gt;I've been using the browser-based VSCode option, and it places no more demand on my physical computer than Google Sheets. I can get away with a relatively cheap, light machine, and I have been stripping away software that I used to install and no longer need. The fact that I can source-control OS, software installation and configuration declaratively in the devcontainer setup and then anyone with a functional computer that can run a web browser can become productive in minutes feels like a huge step forward. I've certainly spent plenty of time over the years dealing with local environment problems on my own and other people's computers!&lt;/p&gt;

&lt;h2&gt;
  
  
  Context-switching made easy
&lt;/h2&gt;

&lt;p&gt;Switching between repos and branches was always a minor pain, occasionally a significant one. Do I have the repo cloned already? Is it up to date? Do I need to switch branches? Install dependencies? Urgh I have to stash everything before I can switch, and so on.&lt;/p&gt;

&lt;p&gt;With Codespaces, I can stop what I'm doing and spin up a new instance on any branch at any point. I know I'm getting a consistent, isolated computing environment that's up to date. I can also keep a Codespace around for a while if I want, but I tend to spin one up for each new unit of work for team-based work. There's even a nice button to delete the associated Codespace when a PR is merged.&lt;/p&gt;

&lt;p&gt;I automate dependency updates as part of my &lt;code&gt;devcontainer&lt;/code&gt; initialization. I update OS packages when the Codespace starts, and I have scripts in the repos that automatically run to update project-specific packages every time I open a terminal. This ensures I and anyone else I'm working with are always using the latest versions, including security updates. I've talked about why in &lt;a href="//../2024-05-01-how-i-do-python-supply-chain-security/index.md"&gt;how I do Python data supply chain security&lt;/a&gt;. A 30-second &lt;code&gt;pip&lt;/code&gt; update and security scan is usually not noticeable running in the terminal window as I orient myself to the code and start thinking about how to proceed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimising blast radius
&lt;/h2&gt;

&lt;p&gt;Before Codespaces, I felt that my local environment had a nasty concentrating effect on supply chain risk - all supply chains, for everything I'd worked on, brought together in once place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfbz3cc135i8m3f5xj5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnfbz3cc135i8m3f5xj5h.png" alt="A diagram showing the risks of local development environments, with arrows pointing from various sources of supply chain risk towards a central laptop, illustrating how vulnerabilities can concentrate on a single machine" width="553" height="720"&gt;&lt;/a&gt;&lt;/p&gt;
Local development environments concentrate supply chain risk



&lt;p&gt;Now, with Codespaces, I feel that my local environment is well-isolated from each Codespace I use, exposed only to the user interface and interactions. Each Codespace is exposed to the minimal risk necessary for that specific piece of software.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixxrngc1qxnx3034aic5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixxrngc1qxnx3034aic5.png" alt="A diagram showing how GitHub Codespaces isolates supply chain risks, with separate arrows pointing to individual cloud-based environments rather than a single local machine" width="746" height="707"&gt;&lt;/a&gt;&lt;/p&gt;
Codespaces isolates supply chain risk to individual environments



&lt;p&gt;If I'm experimenting with a new library or evaluating an open-source repo that I don't trust, I can spin it up in a Codespace with a click. It has no access to anything other than what's in that empty Codespace. It's not the first time I've seen recommendations for software only to spin it up in a blank Codespace and find it's running three-year old dependencies that are riddled with critical vulnerabilities. Erm - I'll pass thanks.&lt;/p&gt;

&lt;p&gt;Infostealing malware is all the rage for the bad actors these days. &lt;a href="https://www.scworld.com/brief/most-credentials-stolen-using-infostealing-malware-report-finds" rel="noopener noreferrer"&gt;A recent report says that infostealers stole 2.1 billion credentials in 2024&lt;/a&gt;. I'm scared of these things, I want to minimise the risk of being exposed to one, and the impact if it should happen.&lt;/p&gt;

&lt;p&gt;If a package, IDE extension or some other software contains credential-stealing malware, it can only steal secrets that &lt;em&gt;that specific Codespace&lt;/em&gt; has access to. There's still risk there as my Codespaces often contain powerful credentials like cloud authentication tokens, but I keep secrets Codespace-specific and I minimise the number of secrets and how long they are valid for. As far as I know, my local machine and web browser, logged into countless personal and professional services, is completely out of reach of malware that might slip into a Codespace.&lt;/p&gt;

&lt;p&gt;The devcontainer setup for this website's repo is an example. &lt;a href="https://github.com/brabster/tw-site-mkdocs/blob/main/.devcontainer/devcontainer.json" rel="noopener noreferrer"&gt;&lt;code&gt;devcontainer.json&lt;/code&gt;&lt;/a&gt; points to a Dockerfile, setups up githooks and grants read access to a private fork of my mkdocs theme. Things are locked down by default and I need to explicity add permissions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dockerfile"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Dockerfile"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"postCreateCommand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git config --local core.hooksPath .githooks/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customizations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"codespaces"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"repositories"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"brabster/mkdocs-material-insiders"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"read"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The associated &lt;a href="https://github.com/brabster/tw-site-mkdocs/blob/main/.devcontainer/Dockerfile" rel="noopener noreferrer"&gt;Dockerfile&lt;/a&gt; just updates the OS and installs a few packages related to image optimisation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; mcr.microsoft.com/devcontainers/python:3&lt;/span&gt;

&lt;span class="c"&gt;# image processing dependencies for optimize plugin&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; webp imagemagick bash-completion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I did not want to install those packages on my local machine, as I felt I couldn't justify the risk just to optimise my blog images, but in a Codespace, I don't see much risk. If one of those packages is compromised the bad actors might get credentials that let them read my private fork of my theme. Much less scary then the same compromise on my local machine.&lt;/p&gt;

&lt;p&gt;Onto the rainclouds.&lt;/p&gt;

&lt;h3&gt;
  
  
  The network tether
&lt;/h3&gt;

&lt;p&gt;Codespaces, being cloud-based, demands consistent network connectivity. For me in the UK, on good home broadband or a stable phone hotspot, the latency is rarely noticeable. It feels just like working locally. There's no lag at the terminal, navigating directory structures, opening and scrolling through files is smooth. I've put together a &lt;a href="https://www.youtube.com/watch?v=4sDDyr3Uhak" rel="noopener noreferrer"&gt;short video to illustrate what the experience of opening and starting work in a Codespace is like on a good network connection&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When connectivity is slow or spotty, like when I'm on the train over to Manchester or up to Leeds, the experience deteriorates quickly. It's a bit of a "cliff edge": the terminal becomes laggy, file operations introduce noticeable spinners and delays, and eventually, Codespaces will pop up a "reconnecting" modal, halting all work. Working on trains over a patchy mobile network, for example, is usually frustrating. To be fair, I had already found working on the move through spotty network areas was frustrating. I normally need documentation or access to cloud-based services at some point - and that point will always be immediately after we head into a tunnel.&lt;/p&gt;

&lt;p&gt;I've accepted the limitation and use that time for catching up on the reading I lament never having time to do, or working on Google Docs things in offline mode instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cost
&lt;/h3&gt;

&lt;p&gt;GitHub offers a monthly free tier for personal accounts (&lt;a href="https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-github-codespaces/about-billing-for-github-codespaces" rel="noopener noreferrer"&gt;see GitHub Codespaces pricing&lt;/a&gt; for the latest details). Currently, this is 120 core hours and 15GB-month storage for Free plans, 180 core hours and 20GB for Pro plans. The smallest instance size you can get is two-core, so that's 60 hours by the clock on the wall, or about 3 hours a day for a 20-day working month (4.5 for the pro plan).&lt;/p&gt;

&lt;p&gt;I did exceed these limits sometimes. You can get instances with more CPU cores and memory, and they are billed at higher rates, but I never needed to use one - the basic 2-core instance worked perfectly with the repos I needed. Codespaces automatically suspend after a configurable timeout (I think the default is 30 minutes, I reduced mine), but spending a lot of hours with a Codespace will eventually exceed the free tier. Running more than one Codespace at a time also runs up a bill faster, so I adapted to avoid doing that.&lt;/p&gt;

&lt;p&gt;The 2-core instance bills at $0.18/hr (so the per-core price is currently $0.07/hr). Maybe a dollar a day for fairly heavy use, which aligns with the billing information I have available. GitHub changed their billing system in April 2025, and I've been unable to get any info from before that. What I do have is the bill for the startup I worked for, which came in at around $18 (orgs do not get the free tiers) for the heaviest month.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc26lm2sruqegaruivooe.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc26lm2sruqegaruivooe.webp" alt="Billing information from GitHub showing codespaces core usage at 93.91 hours and a gross bill of $16.90, with storage at 16.79 GB-hr billing at $1.18" width="800" height="128"&gt;&lt;/a&gt;&lt;br&gt;GitHub codespaces billing
 &lt;/p&gt;

&lt;p&gt;So far, I've never run up enough usage or storage to get billed outside the free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sharp edges in billing
&lt;/h3&gt;

&lt;p&gt;I hit a particularly frustrating "sharp edge" with GitHub's billing system. Twice, despite being well within the free Codespaces tier, I was denied access to Codespaces because I had an insufficient balance to cover &lt;em&gt;other&lt;/em&gt; minor GitHub charges (like sponsorships). Resolving this involved hours of contacting support to expedite the process. I now ensure my linked payment account always has enough to cover GitHub charges, and I've raised the issue with GitHub support. I'm sure they do not want to disincentivise people paying for the pro tier or sponsorships!&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts and advice
&lt;/h2&gt;

&lt;p&gt;If you're considering GitHub Codespaces as your primary development environment, here's my advice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try it out on personal projects. Take advantage of the free tier. I write these blog posts in Codespaces!&lt;/li&gt;
&lt;li&gt;Codespaces (and &lt;a href="https://containers.dev/" rel="noopener noreferrer"&gt;&lt;code&gt;devcontainer.json&lt;/code&gt;&lt;/a&gt; in general) provide an environment that is far more similar to CI/CD systems like GitHub Actions than a traditional local setup. I'm experimenting with ways to reuse more between Codespaces and GitHub Actions.&lt;/li&gt;
&lt;li&gt;It's a far better experience than full virtual desktops. In my experience, full virtual desktop environments have always been sluggish and awkward to work with. Codespaces slots into my day-to-day working without getting in the way.&lt;/li&gt;
&lt;li&gt;Consider local devcontainers as a stepping stone. Using devcontainers for local development could bring some of the consistency and security benefits without, or before, fully shifting to the cloud. I tried local devcontainers first but ran into issues with my rootless container runtimes.&lt;/li&gt;
&lt;li&gt;Explore alternatives if GitHub isn't your ecosystem. While my experience has been tied to GitHub, similar concepts exist. For example, &lt;a href="https://www.gitpod.io/" rel="noopener noreferrer"&gt;Gitpod&lt;/a&gt; is a cloud development environment that I've seen integrated with GitLab.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My year with GitHub Codespaces has been overwhelmingly positive. The day-to-day experience has been consistently good, sufficiently similar to local working that I've not been irritated by it. The challenges have not been a big deal and it's definitely my preferred option today given the choice!&lt;/p&gt;

</description>
      <category>operations</category>
      <category>security</category>
    </item>
    <item>
      <title>GROUP BY ALL solves a really annoying SQL problem</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Thu, 29 May 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/group-by-all-solves-a-really-annoying-sql-problem-4664</link>
      <guid>https://dev.to/brabster/group-by-all-solves-a-really-annoying-sql-problem-4664</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhs7sppkw9b7nmg3msu1.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhs7sppkw9b7nmg3msu1.webp" alt="hero image" width="737" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Does your SQL still copy most of your columns from &lt;code&gt;SELECT&lt;/code&gt; after &lt;code&gt;GROUP BY&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;Behold: &lt;code&gt;GROUP BY ALL&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;A simple example of the problem looks like this. I have a table of page views, one row per view. I want to know how many downloads I had each day, so I write some SQL like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;view_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;num_views&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;the_raw_views_table&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;view_date&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See how I have to repeat &lt;code&gt;view_date&lt;/code&gt; in the &lt;code&gt;GROUP BY&lt;/code&gt; clause? It's required, and it's pretty much the only appropriate simple value. I must add any columns that I'm not aggregating (I used the aggregate function &lt;code&gt;COUNT()&lt;/code&gt; here) to the &lt;code&gt;GROUP BY&lt;/code&gt; clause for the query to be valid.&lt;/p&gt;

&lt;p&gt;Grouping is something we do all the time. It's a minor irritation when there are only a couple of columns to add, but I've seen queries where there are tens, maybe even hundreds of columns that have to be carefully kept synchronised.&lt;/p&gt;

&lt;p&gt;A chunkier example from GitHub:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;the_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;countryname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;twitter_trend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;google_trend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latcent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;longcent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The poor solution
&lt;/h2&gt;

&lt;p&gt;I've seen a bad solution around, where you don't need to actually name the columns but can instead use the column's ordinal number. The previous query would look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a bad idea for readability, and now you have a list of sequential numbers to keep in sync instead. Here's an example I found on GitHub for how silly it can get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indnatts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The good solution
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/bigquery/docs/release-notes#May_13_2025" rel="noopener noreferrer"&gt;On 23 May 2024, Google made &lt;code&gt;GROUP BY ALL&lt;/code&gt; generally available&lt;/a&gt;, and I totally missed it. Now, I can just say what I mean 🎉.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;view_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;num_views&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;the_raw_views_table&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It doesn't matter if you have one plain select column or 100. &lt;code&gt;GROUP BY ALL&lt;/code&gt; infers the list. &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_all" rel="noopener noreferrer"&gt;The full documentation explains the specifics and how the inference works&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supporting platforms
&lt;/h2&gt;

&lt;p&gt;I actually found &lt;a href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-qry-select-groupby#parameters" rel="noopener noreferrer"&gt;&lt;code&gt;GROUP BY ALL&lt;/code&gt; first on the Databricks platform&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's also &lt;a href="https://docs.snowflake.com/en/sql-reference/constructs/group-by#parameters" rel="noopener noreferrer"&gt;available on Snowflake&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/r_GROUP_BY_clause.html" rel="noopener noreferrer"&gt;Amazon Redshift has it&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://duckdb.org/docs/stable/sql/query_syntax/groupby.html#group-by-all" rel="noopener noreferrer"&gt;DuckDB has GROUP BY ALL too&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I don't think Trino (and by extension AWS Athena) have &lt;code&gt;GROUP BY ALL&lt;/code&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>My path to consultancy</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Fri, 11 Apr 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/my-path-to-consultancy-718</link>
      <guid>https://dev.to/brabster/my-path-to-consultancy-718</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5xhg3h676jamgfxhg3y.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5xhg3h676jamgfxhg3y.webp" alt="A country road stretches off into the sunny, leafy distance as my little boy cycles his favourite route home from school" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I had big doubts about becoming a consultant or contractor. Could I do it? Would I find work? Could I run my own business? Would I need to change who I am, wear a suit, or buy a briefcase? Seven years have flown by since I took the plunge, so I'm going to share my story in case it's helpful for you!&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Becoming a contractor and consultant back in 2018 was one of the best moves I've made.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I didn't need to wear a suit, or buy a briefcase.&lt;/li&gt;
&lt;li&gt;I had no problems finding work - contractors are just humans too, and my experience was up to the task.&lt;/li&gt;
&lt;li&gt;The income was a big step up at a time when I needed it.&lt;/li&gt;
&lt;li&gt;Setting up and running a business was easy enough, and felt pretty good.&lt;/li&gt;
&lt;li&gt;I wasted time on a custom website before going with a static site generator and trustworthy theme.&lt;/li&gt;
&lt;li&gt;Accounting and tax were more complicated than I realised, causing some hassle and unexpected, painful tax bills. If I were doing it again, I'd look for a well-recommended human accountant to show me the ropes and help me through the first year or two.&lt;/li&gt;
&lt;li&gt;I've been able to pursue opportunities with teams and organisations that probably wouldn't have been open to me otherwise.&lt;/li&gt;
&lt;li&gt;The work has been challenging in good ways, and I feel my professional development has been accelerated.&lt;/li&gt;
&lt;li&gt;My fears about becoming mercenary and distant from the people I worked with were unfounded.&lt;/li&gt;
&lt;li&gt;I've gained access to a great network of people who challenge and push me, and I've been able to stay in touch with the teams I've worked with over the years and see how my decisions played out.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I took a permanent job with Equal Experts when I was offered it at a time of great personal and professional uncertainty in 2020 - new baby, COVID, and IR35. I've never been fired or let go from a role throughout my career - so far - and I don't feel like I'm at significantly lower risk of being unemployed as an employee than I was as an independent associate.&lt;/p&gt;

&lt;p&gt;I feel like I've got the best of both worlds now, getting to be more consultant than contractor whilst being part of a great organisation. Tempered Works Ltd. is still alive but dormant in case I want to go back to being my own boss one day.&lt;/p&gt;

&lt;p&gt;I recognise that I don't belong to an underrepresented group in the tech industry and I can't speak to the challenges others may face. I hope sharing my experiences can still provide some value or insight for those considering a similar path. I'm also happy to answer any questions you might have that I can help with, so please feel free to give me a shout on &lt;a href="https://www.linkedin.com/in/paulbrabban/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. I will not share anything from any conversation without permission.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prologue
&lt;/h2&gt;

&lt;p&gt;They waited until the end of the first year at uni to tell us that there weren't many jobs in physics, so that was my dream of writing maths on whiteboards for a living up the spout. My mum had gone to university after the divorce, getting a degree in computing and working as a contractor before getting a job in IT with a bank. I was too young to understand much about it at the time.&lt;/p&gt;

&lt;p&gt;A few years later, she told twenty-one-year-old me to apply for the bank's graduate scheme. I got in, and that set up my next thirteen years with HSBC, starting out in IT Security (first job: stuffing &lt;a href="https://en.wikipedia.org/wiki/HCL_Notes" rel="noopener noreferrer"&gt;Lotus Notes&lt;/a&gt; passwords into envelopes) and ending up programming in Global Banking and Markets. I was aware of some contractors here and there, but I don't recall ever meeting one, let alone talking to them about contracting or anything much else.&lt;/p&gt;

&lt;p&gt;The other contact with contractors or consultants I can remember from these early days was my avid reading of Martin Fowler, Sam Newman, and the other luminaries at &lt;a href="https://en.wikipedia.org/wiki/Thoughtworks" rel="noopener noreferrer"&gt;Thoughtworks&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  First contact - Thoughtworks
&lt;/h2&gt;

&lt;p&gt;I went from a huge multinational to a &lt;a href="https://wakelet.com/" rel="noopener noreferrer"&gt;little company in Manchester&lt;/a&gt; that was trying to grow back in 2014. We were having trouble trying to sort out an unwieldy, monolithic codebase running on bare metal servers that resisted any extension and struggled under any sort of load. Our efforts to get the team aligned and flow moving just weren't working. What was I doing wrong?&lt;/p&gt;

&lt;p&gt;I suggested we engage Thoughtworks and a couple of their people came to talk to us about our needs. I needed someone experienced to talk to, someone who could help me figure out what I was doing wrong and how to put it right. I asked for advice and guidance, but it turned out that wasn't an option. We'd need to pay for a team of at least two full-time contractors as a minimum engagement. I can't remember what the $$$ number was, but it was comically out of any budget I could justify, so that didn't go anywhere. Disappointing, but my first real contact with the contracting world.&lt;/p&gt;

&lt;p&gt;I worked with some great people there. It was a real struggle, but I learned a lot, and we did manage to rewrite the beast and move everything to AWS. I've really enjoyed seeing them succeed since I moved on. One of the other backend devs contacted me recently to let me know that the thing we hastily built together had been the foundations for &lt;strong&gt;seven years of growth&lt;/strong&gt; without major problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  First contact - Equal Experts
&lt;/h2&gt;

&lt;p&gt;Next stop - Sky Betting and Gaming, building a new service for the Italian market from Sheffield's Electric Works. This was the first time I really worked with contractors. No suits, no briefcases. A friendly bunch, always happy to share their experience, knowledge, and opinions. I went along to my first Equal Experts "Expert Talks" event there.&lt;/p&gt;

&lt;p&gt;I think this is where I first really considered that I could "do" contracting. I was still a bit hung up on whether it would be right for me though. Running my own company seemed like an interesting but daunting prospect, and I recall dismissing the idea in a conversation because I wanted to be invested in what I was doing. If I'm honest, I also didn't feel like I was good enough.&lt;/p&gt;

&lt;p&gt;I felt that contracting might be a bit mercenary for me, and I wanted to be invested in what the company I worked for was doing. Then again, I wasn't staying with companies for very long before moving on. I'd consciously started thinking in terms of "going in, making a difference, making myself redundant, and moving on" to broaden my experience more quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Need some money, quick
&lt;/h2&gt;

&lt;p&gt;My next role was with a six-person startup. I really like the broad scope, urgency of value and need for adaptation that comes with startup roles! Alongside rapidly building the system we needed, I was getting involved with product development and marketing. This was my first contact with things like AB testing, and I had to deal with GDPR that landed towards the end of my tenure there.&lt;/p&gt;

&lt;p&gt;Then, as far as I know, the startup ran short of money. The other people there were young compared to me, and on much lower pay. I agreed to go without salary for a month to keep the company going. Then a second month. We'd been having conversations about how the company probably needed more product development guidance than I could provide, so when I didn't get paid for a third month, I resigned - I honestly don't think there was anything more I could do. That's startups for you.&lt;/p&gt;

&lt;p&gt;Bills to pay.&lt;/p&gt;

&lt;p&gt;Time to embrace my inner mercenary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tempered Works Ltd.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/paulbrabban/details/experience/" rel="noopener noreferrer"&gt;LinkedIn says I started Tempered Works Ltd. in May 2018&lt;/a&gt;. I knew a couple of contractor people (as well as my mum, although she'd long since gone permanent!) well enough to buy them a coffee and pick their brains about what to do next and got advice on things like how to find work, how to set up my company, and how the banking and accounting side of things works. It's a bit of a blur now, but here's what I remember.&lt;/p&gt;

&lt;p&gt;I remember the coffee shop I was sitting in before an interview with Infinity Works where I was deciding on a company name. I was going to use &lt;a href="https://crossedstreams.com/" rel="noopener noreferrer"&gt;the domain I got for my old blog, crossedstreams.com&lt;/a&gt;, but Urban Dictionary warned me about other meanings related to "crossing the streams" that have nothing to do with the Ghostbusters movies. Scratch that idea then. Sheffield has loads of places today built in old industrial buildings called things like "Cutlery Works" and "Electric Works". We've got a proud history of steelmaking, and &lt;a href="https://en.wikipedia.org/wiki/Tempering_(metallurgy)" rel="noopener noreferrer"&gt;tempering is a process to make steel tougher and less likely to break under stress&lt;/a&gt;. I liked that idea as an analogy for what I like to think I do, so Tempered Works Ltd. was born.&lt;/p&gt;

&lt;p&gt;I'll be honest. Day-rate money was frankly ridiculous compared to the salaries I'd been working for before. I'd been worried about not making enough pension contributions in the latter part of my career to date, and I was able to pay the bills and start building a better retirement buffer very quickly. On the advice of one of the contractors I spoke to, I went with &lt;a href="https://www.crunch.co.uk/" rel="noopener noreferrer"&gt;crunch.co.uk&lt;/a&gt; for accounting. I didn't have the confidence to do "my books" on my own, but I didn't want to pay for an actual human accountant. Despite my best efforts, I did get some things wrong here and there and ended up with occasional hassle and painful, unexpected tax bills.&lt;/p&gt;

&lt;p&gt;I remember heading to the library to take photocopies of documents (maybe I even had to fax them like it was the late 1900s) to Metro Bank to get my bank account set up. Recording expenses and filing paperwork was a worry, but that part turned out to be pretty easy. It all takes a bit of time, but it's not a big deal.&lt;/p&gt;

&lt;p&gt;If I were doing it again, I'd pay for a human accountant for the first year or two until I'd got a better handle on how accounting and tax works.&lt;/p&gt;

&lt;h2&gt;
  
  
  My first gig
&lt;/h2&gt;

&lt;p&gt;It didn't take long to get my first engagement. I can't remember how I advertised my services, perhaps it was just on LinkedIn. I had a fair bit of AWS experience, I was a certified solution architect, but the gig needed Google Cloud. I remember standing outside a bus stop in the sunshine with the recruiter telling me not to worry about it, my skills would transfer and I'd be fine. I took the gig. There was a tech test and interview to get through, but they were no different to what I'd expect for a permanent role.&lt;/p&gt;

&lt;p&gt;He was right, and I had a really enjoyable first engagement with &lt;a href="https://www.dunnhumby.com/" rel="noopener noreferrer"&gt;Dunnhumby&lt;/a&gt;, a super-cool data science company. I'm normally not sure if I can directly link myself to a client but in this case, I'd had writing published on their public blog, so the link is already out there.&lt;/p&gt;

&lt;p&gt;I decided I wanted to diversify my experience, so I politely refused an extension to go and do something else. I recall the person paying for me saying that I wasn't like other contractors he'd worked with, more like part of the team. It was great to get that feedback, reinforcing that I could succeed as a contractor without being aloof and distant!&lt;/p&gt;

&lt;h2&gt;
  
  
  Evolving this site
&lt;/h2&gt;

&lt;p&gt;I wasted a lot of time building my own &lt;a href="https://tempered.works/posts/2018/08/19/setting-up-this-site-with-gatsbyjs-and-netlify/" rel="noopener noreferrer"&gt;custom React-based site in the early days&lt;/a&gt;. I wanted to have my own look and feel and thought that exercising my React skills would be helpful in finding work. After a while, I realised that I was spending too much time on the site and not enough on why the site was there - to support my business and an accessible medium for me to share my experience and thoughts.&lt;/p&gt;

&lt;p&gt;I eventually found a &lt;a href="https://tempered.works/posts/2024/03/29/now-on-mkdocs-material/" rel="noopener noreferrer"&gt;neat, blog-capable static site generator in mkdocs-material&lt;/a&gt; that provided me the kind of publishing workflow, flexibility and clean look and feel I wanted. It's also got a lot of sponsorship and funding and a maintainance team that seems responsible and responsive, so I think it's a trustworthy option.&lt;/p&gt;

&lt;h2&gt;
  
  
  Equal Experts - Associate
&lt;/h2&gt;

&lt;p&gt;I'd met lots of inspiring Equal Experts people at events like that Expert Talks I mentioned earlier. Everyone seemed to speak highly of EE, and I applied. If I recall correctly, there was a fairly easy take-home test and then an interview, with the first half a pairing exercise and the second a whiteboard session.&lt;/p&gt;

&lt;p&gt;I did not enjoy the pairing exercise much. I really don't work well in that time-pressure situation, and it's not the kind of pressure you actually get in real life. I like to let a problem settle in my mind a little before I start racing to write code and tests. Feedback afterwards was that I didn't do very well in that part. Meh 🤷 fortunately for me, my performance in the whiteboard session impressed more. I'd enthusiastically explained how we'd built out capabilities at the last startup and taken iterative, fail-fast approaches to get value out fast.&lt;/p&gt;

&lt;p&gt;!!! note&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EE's changed its recruitment process a great deal since then. The pairing-under-pressure is a thing of the past, phew.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;My first engagement with EE wasn't a great fit, being quite slow-moving (I'm not known for my patience), but I just flagged that it wasn't really working for me and got lots of support finding and moving onto something that suited me better. My new engagement was fantastic, working in a diverse team with a large UK retailer on search optimisation. I was directly involved with delivering and measuring significant revenue uplifts and learned a huge amount from that engagement. It was the first time I'd seen all the things I'd read about as practices and philosophies come together to make my team and the teams around us really fly!&lt;/p&gt;

&lt;h2&gt;
  
  
  Equal Experts - Employee
&lt;/h2&gt;

&lt;p&gt;In 2020 I was offered a permanent position as a consultant with EE and I took it, despite the drop in income. Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I had a great experience working with Equal Experts.&lt;/li&gt;
&lt;li&gt;My son had just been born.&lt;/li&gt;
&lt;li&gt;COVID had just hit.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://en.wikipedia.org/wiki/IR35" rel="noopener noreferrer"&gt;IR35 legislation&lt;/a&gt; was about to be implemented in the private sector.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whilst I had done nothing to avoid tax, I remember feeling unsure that I knew enough to be on the right side of the IR35 legislation. I felt that the legalese and opinions surrounding it were confusing and that if I got something wrong for any period of time, the financial consequences could be severe. Taking the job felt like a safe option, and I hoped to learn more about the inner workings of a successful consultancy.&lt;/p&gt;

&lt;p&gt;Tempered Works Ltd. still exists but is dormant at the moment. I'm able to keep the company alive for less than a couple of hundred GBP per year. I was also able to get a tweak to my contract so I could wrap up a light engagement with &lt;a href="https://thedeveloperacademy.com/" rel="noopener noreferrer"&gt;The Developer Academy&lt;/a&gt;, providing some course materials and teaching for their new data science bootcamp. I write on the EE blog, but I'm still able to write independently here, and I just credit Equal Experts where my writing crosses over significantly or involves some time in working hours, like this:&lt;/p&gt;

&lt;p&gt;As an associate and an employee, I've had access to lots of other people and their experiences through our Slack. That's been really useful in collecting up other people's ideas and putting my own out there to be challenged.&lt;/p&gt;

&lt;p&gt;I've also had the huge benefit of staying in communication with the teams I've worked with over the years, and in particular the people who saw how my decisions played out after I moved on. By and large, the things I leave behind for the next person seem to have worked out positively in my absence! I'm very grateful for those long-term perspectives where I've been able to get them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;If you've got a few years of experience and you're considering contracting or consulting, I'm happy to answer any questions you might have that I can help with, so please feel free to give me a shout on &lt;a href="https://www.linkedin.com/in/paulbrabban/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. I will not share anything from any conversation without permission.&lt;/p&gt;

&lt;p&gt;I'd also really recommend taking a look at joining the Equal Experts network, for all the reasons I talked about earlier. &lt;a href="https://www.equalexperts.com/join-us/" rel="noopener noreferrer"&gt;Up-to-date info about joining is available on the site&lt;/a&gt;. I'll signpost &lt;a href="https://www.equalexperts.com/about-us/our-values/" rel="noopener noreferrer"&gt;our values&lt;/a&gt;, which are really important to all of us and haven't really changed since I first read them back in the day.&lt;/p&gt;

</description>
      <category>operations</category>
    </item>
    <item>
      <title>Generating portable and user-friendly identifiers</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sat, 08 Mar 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/generating-portable-and-user-friendly-identifiers-b2m</link>
      <guid>https://dev.to/brabster/generating-portable-and-user-friendly-identifiers-b2m</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-03-08-generating-portable-and-user-friendly-identifiers%2Fassets%2Fimage.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-03-08-generating-portable-and-user-friendly-identifiers%2Fassets%2Fimage.webp" alt="A screenshot of the BigQuery console, with example SQL for generating an identfier from a string value as I outline below" width="721" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'll share how I generate unique identifiers from data in 2025, avoiding the pitfalls I've seen along the way. TL;DR: I'm using &lt;a href="https://en.wikipedia.org/wiki/MD5" rel="noopener noreferrer"&gt;MD5&lt;/a&gt; to produce a digest from a string or bytes value, then I'm using plain old &lt;a href="https://en.wikipedia.org/wiki/Hexadecimal" rel="noopener noreferrer"&gt;hexadecimal&lt;/a&gt; encoding of that digest, specifying upper or lowercase for the alpha characters. This solution best meets the needs I describe next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem I'm solving
&lt;/h2&gt;

&lt;p&gt;I have some chunk of data, often a string of text, but it can be more complex like an array or a JSON dictionary. I want to produce a value that I can use to refer to this data in user-friendly ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  User-facing needs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;It produces clean-looking URLs and copy-pastes easily.
The value might end up in front of users in a URL or an API endpoint. That means it might end up being copy-pasted into messaging systems, tickets, spreadsheets and the like, so I want it to be easy and convenient to copy-paste the full value correctly. &lt;a href="https://en.wikipedia.org/wiki/Percent-encoding" rel="noopener noreferrer"&gt;Characters that require URL-encoding&lt;/a&gt; are bad. Characters that require click/tap-and-drag to select across are also bad - double-click or tap-and-hold should select the whole value.&lt;/li&gt;
&lt;li&gt;It doesn't carry any inappropriate meaning.
I don't want any confusion about the value being an identifier. It should not be likely that the value could be confused with, for example, a natural language word.&lt;/li&gt;
&lt;li&gt;It isn't possible to reverse the process to reveal the original value.
I don't need cryptographic protection from this kind of reverse engineering, but I would prefer to hide how the value is computed in case there are details in the original value that should not be available to inspect. Being able to compute and compare the identifier, given a candidate value, isn't a problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation needs
&lt;/h3&gt;

&lt;p&gt;In addition to these more user-facing needs, I have a couple of constraints to ensure the solution doesn't create problems down the road.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It must be deterministic and stable over time.
These values might end up being stored directly in other systems that interface with the system I'm working on, or internally between different parts of the system. An example would be a feature where a user can "favourite" something this value refers to. We don't want those values to change!&lt;/li&gt;
&lt;li&gt;It can generate identifiers for other data structures than strings.
Sometimes I need to make an ID for an array of values, a JSON dictionary, an image, and so on.&lt;/li&gt;
&lt;li&gt;It doesn't create lock-in.
I would like to avoid locking into whatever ecosystem I'm using today so that ID generation does not become a factor in future decisions. I should be able to generate the same value for the same input over as broad a range of technology as possible.&lt;/li&gt;
&lt;li&gt;It avoids depending on new software supply chains.
Software supply chains bring multiple types of risk, which I talk about a little in an earlier post touching on &lt;a href="//../2024-05-01-how-i-do-python-supply-chain-security/index.md#assessing-dependency-risk"&gt;supply chain security&lt;/a&gt;. I'd rather not trust any source of software that I can avoid.&lt;/li&gt;
&lt;li&gt;Collisions are very unlikely.
It needs to be unlikely that two different pieces of data produce the same identifier. I can design the system to make this scenario an irritation rather than a catastrophe, but the less likely it is, the better.&lt;/li&gt;
&lt;li&gt;It's efficient.
I'd like an efficient solution that meets the other constraints. I don't want significant latency, nor do I want to waste compute power without good reason.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quite the list for such an apparently simple problem!&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;The general solution I use is to first produce a hash from the value, then encode the hash in a way that meets my usability needs. I'll quickly summarise the steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hashing the value
&lt;/h3&gt;

&lt;p&gt;The general solution I use first produces a hash from the value. &lt;a href="https://en.wikipedia.org/wiki/Hash_function" rel="noopener noreferrer"&gt;Hashing is a bit of computing magic that takes a chunk of data of any size and produces a fixed-size output&lt;/a&gt;. Information is lost in the process, so you can't "undo" a hash to retrieve the original value. The idea has been around forever, and I've yet to come across a computing platform that doesn't support the concept. Which hash function to choose? There are many &lt;a href="https://en.wikipedia.org/wiki/List_of_hash_functions" rel="noopener noreferrer"&gt;hash functions&lt;/a&gt;, and most would break my portability and supply chain needs.&lt;/p&gt;

&lt;p&gt;I've fallen into this trap before. I needed to generate hashes in &lt;a href="https://docs.aws.amazon.com/athena/" rel="noopener noreferrer"&gt;AWS Athena&lt;/a&gt; (based on the &lt;a href="https://trino.io/" rel="noopener noreferrer"&gt;Trino open-source engine&lt;/a&gt;), so I followed the advice in the &lt;a href="https://trino.io/docs/current/functions/binary.html#hashing-functions" rel="noopener noreferrer"&gt;Trino hashing functions documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For general purpose hashing, use &lt;code&gt;xxhash64()&lt;/code&gt;, as it is much faster and produces a better quality hash.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Unfortunately, when we migrated to BigQuery, &lt;code&gt;xxhash64&lt;/code&gt; was not natively available! I could have used a Javascript UDF, but that would have introduced complexity and new supply chains, so... no. I switched to a hashing algorithm that BigQuery did support, then generated a mapping table for existing xxhash64-based IDs in Athena until all the original values had been found and replaced.&lt;/p&gt;

&lt;p&gt;Learning from that experience I use &lt;a href="https://en.wikipedia.org/wiki/MD5" rel="noopener noreferrer"&gt;MD5 hashing&lt;/a&gt; now. MD5 has been around since the 1990s and is supported everywhere I've looked. MD5 might be a poor choice for use in crypto, but I see no reason why it's not suitable for the purpose of producing a fixed-length identifier. &lt;a href="https://www.avira.com/en/blog/md5-the-broken-algorithm" rel="noopener noreferrer"&gt;MD5: The broken algorithm&lt;/a&gt; suggests that the chance of an accidental collision between two values is on the order of 10&lt;sup&gt;-29&lt;/sup&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But, as you can imagine, the probability of collision of hashes even for MD5 is terribly low. That probability is lower than the number of water drops contained in all the oceans of the earth together.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's small enough that I'm not going to worry about it. I've certainly never seen an accidental hash collision in real life in the last 25 years of my career. The same article also says that MD5 is the faster algorithm compared to SHA1 and SHA256, although not by enough that I'd consider it a factor. The last benefit of MD5 over those other algorithms for my purposes is that it produces a shorter value - 16 bytes compared to 20 or 32. The shorter the value is, the shorter the identifier will be.&lt;/p&gt;

&lt;p&gt;That brings us neatly to part two - encoding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Encoding the hashed value
&lt;/h3&gt;

&lt;p&gt;Hash functions produce a "digest" - a little blob, with a specific length, of binary data. For example running: &lt;code&gt;SELECT MD5('tempered.works')&lt;/code&gt; on BigQuery produces (in binary): &lt;code&gt;00101100110101000000000110111101110001010101101010011011100111110111011100110111111100100001110010000111010111010010111000100110&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Not a great identifier, as it is so long! The second part of the process is to encode this value in a larger alphabet than ones and zeroes to get something a bit easier to work with. This is where I hit the next two facepalm moments. Firstly, not all encodings meet my usability needs. The obvious choice (and indeed the default representation if you run that query in BigQuery) is &lt;a href="https://en.wikipedia.org/wiki/Base64" rel="noopener noreferrer"&gt;Base64 encoding&lt;/a&gt;, which produces &lt;code&gt;LNQBvcVam593N/Ich10uJg==&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That value is not user-friendly - try selecting it! The &lt;code&gt;/&lt;/code&gt; character breaks the selection in the middle, as does the &lt;code&gt;==&lt;/code&gt; padding at the end. It also causes usability issues if those values end up as &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html" rel="noopener noreferrer"&gt;keys in S3-like storage systems&lt;/a&gt; - yes, I hit that one too. On top of that it needs encoding again to be used as a URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;bqutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn_eu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cw_url_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_BASE64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MD5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tempered.works'&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="n"&gt;LNQBvcVam593N&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;FIch10uJg&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;D&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So Base64 is a bad choice, despite being commonly available in databases and programming languages. &lt;a href="https://en.wikipedia.org/wiki/Base32" rel="noopener noreferrer"&gt;Base32&lt;/a&gt; works better, being an alphabet without those funny symbols. It still has the problem of those equals symbols padding the end, but there's a worse problem - it's not very portable. Again, Base32 isn't available on BigQuery, for example.&lt;/p&gt;

&lt;p&gt;So the encoding I've chosen is boring old hexadecimal. It's available everywhere, only includes the characters 0-9 and a-f, and produces values of a manageable size. One last gotcha - hex conversion functions may return upper or lowercase a-f, so be aware of the need to pick one and upper/lower accordingly. The non-SQL examples in the following table require imports, but the modules are part of the runtime, not third-party dependencies.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Ecosystem&lt;/th&gt;
&lt;th&gt;Expression&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BigQuery&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT UPPER(TO_HEX(MD5('tempered.works')))&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2CD401BDC55A9B9F7737F21C875D2E26&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Athena/Trino&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT TO_HEX(MD5(TO_UTF8('tempered.works')))&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2CD401BDC55A9B9F7737F21C875D2E26&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snowflake&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT UPPER(MD5('tempered.works'))&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2CD401BDC55A9B9F7737F21C875D2E26&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hashlib.md5(b'tempered.works').hexdigest().upper()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2CD401BDC55A9B9F7737F21C875D2E26&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NodeJS&lt;/td&gt;
&lt;td&gt;&lt;code&gt;crypto.createHash('md5').update('tempered.works').digest('hex').toUpperCase()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2CD401BDC55A9B9F7737F21C875D2E26&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pretty portable!&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;So how does this solution compare to my list of needs?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It produces clean-looking URLs and copy-pastes easily ✓&lt;/li&gt;
&lt;li&gt;It doesn't carry any inappropriate meaning ✓&lt;/li&gt;
&lt;li&gt;It isn't possible to reverse the process to reveal the original value ✓&lt;/li&gt;
&lt;li&gt;It must be deterministic and stable over time ✓&lt;/li&gt;
&lt;li&gt;It can generate identifiers for other data structures than strings ✓&lt;/li&gt;
&lt;li&gt;It doesn't create lock-in ✓&lt;/li&gt;
&lt;li&gt;It avoids depending on new software supply chains ✓&lt;/li&gt;
&lt;li&gt;Collisions are very unlikely ✓&lt;/li&gt;
&lt;li&gt;It's efficient ✓&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking good. I'll update this post if I come across any drawbacks in the future.&lt;/p&gt;

</description>
      <category>insights</category>
      <category>operations</category>
      <category>performance</category>
    </item>
    <item>
      <title>Using AWS billing to track down lost resources</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sun, 09 Feb 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/using-aws-billing-to-track-down-lost-resources-1dgn</link>
      <guid>https://dev.to/brabster/using-aws-billing-to-track-down-lost-resources-1dgn</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_console.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_console.webp" alt="hero image" width="608" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My AWS bill was higher than I expected, and it wasn't immediately clear what was driving the cost. Here's how I tracked down the culprits.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mystery bill
&lt;/h2&gt;

&lt;p&gt;I'd been running an &lt;a href="https://github.com/brabster/northwind_dms_cdc/tree/main/cloudformation" rel="noopener noreferrer"&gt;RDS + DMS CloudFormation stack&lt;/a&gt; for a few months as part of my CDC writing. Expect a post at some point going over the obvious and more subtle costs involved. When I tore the stack down in January 2025, I was still getting a bill of a couple of dollars a day and I wasn't sure why.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_console.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_console.webp" alt="Screenshot of the AWS console homepage billing section on my account on 2024-02-09" width="608" height="384"&gt;&lt;/a&gt;&lt;br&gt;Cost and usage report showing current month costs at $14.45 and forecasted end costs at $64.68. Includes bar chart of various service costs across five months.
 &lt;/p&gt;

&lt;p&gt;When I tore the stack down, one resource was left behind - the AWS bucket that DMS wrote the change data capture stream into. It contains very little data though, so it shouldn't be running up a noticeable bill. I could do without a $64-per-month bill for nothing. How can I find these resources?&lt;/p&gt;

&lt;h2&gt;
  
  
  Billing - cost explorer
&lt;/h2&gt;

&lt;p&gt;My next step is to click the offending dollar value on the homepage, which takes me to the cost explorer. Here, I see a fairly useless chart of the current month's costs, with a very useful list of costs by service.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I've learned a lot over the years by being curious about costs - and I've used that knowledge to save clients money. If you can see these actual/predicted cost numbers when you head to the homepage, I'd recommend clicking in and taking a look around. It'll help you understand what's expensive and what's not, and there's a chance you'll spot something consuming money for nothing - a prime target for quick and easy cost-efficiency improvements. If you can't see these numbers and would like to, try clicking the AWS console home link (Top left in the console) or asking your administrator.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_breakdown_service.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_breakdown_service.webp" alt="Screenshot of the AWS cost explorer for the current month" width="800" height="405"&gt;&lt;/a&gt;&lt;br&gt;Cost breakdown table showing current month expenses for various services, including VPC, tax, Secrets Manager, S3, and Relational Database Service.
 &lt;/p&gt;

&lt;p&gt;This view is a useful first step. The costs are in the VPC service, ruling out the S3 bucket as the cause. Unfortunately, VPC covers a lot of different kinds of resources - how do I track down the ones that are costing me money?&lt;/p&gt;

&lt;h2&gt;
  
  
  Tag editor
&lt;/h2&gt;

&lt;p&gt;AWS is a big, complex beast and I've had problems sometimes tracking down deployed resources that cost money - particularly tricky when the resources are deployed in a region you're not expecting! The &lt;a href="https://docs.aws.amazon.com/tag-editor/latest/userguide/tagging.html" rel="noopener noreferrer"&gt;Tag Editor&lt;/a&gt; allows the enumeration of resources across regions, but as far as I know, it's not straightforward to use to track down these resources. I have very little deployed into this account, but when I search across all regions for all resources, I see the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_tag_editor.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_tag_editor.webp" alt="Screenshot of the AWS tag editor, after searching all regions for all resources" width="800" height="406"&gt;&lt;/a&gt;&lt;br&gt;The top six resource search results are listed in Tag Editor, showing items with identifiers, types, and regions. Types include Stack, Trail, and DHCPOptions. Regions include eu-west-1, 2 and 3. There are no tags assigned to the resources.
 &lt;/p&gt;

&lt;p&gt;The vast majority of the resources I see are, I believe, network-related resources that AWS deploys by default into all regions for my account. Perhaps I should go through and delete all that stuff, assuming it won't cause any problems in my account. There are no tags against any resources other than the S3 bucket I already know about and an account I custom-tagged with a client identifier. None of it helps me understand what's costing me this money.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have had the thought that maybe I should be tagging stuff to help in this kind of situation. Whilst it might help, I don't think it would guarantee anything - I suspect there would still be resources that get created automatically as part of my efforts that don't get tagged. Whilst custom tagging might be useful, I still need a way of finding stuff that falls outside or pre-dates my tagging strategy anyway.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Back to cost explorer
&lt;/h2&gt;

&lt;p&gt;I've found the most effective tool to track down these resources is back in the billing console.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost by usage type
&lt;/h3&gt;

&lt;p&gt;I switch the breakdown to usage type in the report parameters section, like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_usage_types.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_usage_types.webp" alt="Screenshot of the AWS cost explorer report parameters panel" width="401" height="433"&gt;&lt;/a&gt;&lt;br&gt;Interface showing report parameters: Date range set to the last week; monthly granularity; grouped by usage type.
 &lt;/p&gt;

&lt;p&gt;Again, the chart isn't very useful, but the breakdown panel underneath is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_usage_types_breakdown.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_usage_types_breakdown.webp" alt="Screenshot of the AWS cost explorer usage breakdown table" width="800" height="301"&gt;&lt;/a&gt;&lt;br&gt;Table showing cost and usage breakdown with totals for various AWS usage types. The usage types reveal specific information about exactly what resources in which regions are driving the cost.
 &lt;/p&gt;

&lt;p&gt;Aha. This table is packed with useful information.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;USE1-VpcEndpoint-Hours&lt;/code&gt; is responsible for almost all the cost -  VPC endpoints in &lt;code&gt;us-east-1&lt;/code&gt;. That makes sense - I had to have private endpoints to get from the RDS database locked away in a private VPC to the S3 and Secrets Manager services. I expected them to get cleaned up as part of my CloudFormation destroy operation, but it seems they did not.&lt;/p&gt;

&lt;p&gt;I'm also paying for &lt;code&gt;USE1-AWSSecretsManager-Secrets&lt;/code&gt; - makes sense, I had a secret in there, the RDS database password, which hasn't been cleaned up in the destroy. &lt;code&gt;Aurora:BackupUsage&lt;/code&gt; also makes sense - I have some backups of the RDS database. Aside from being in &lt;code&gt;eu-west-2&lt;/code&gt;, the culprit behind &lt;code&gt;EUW2-Requests-Tier1&lt;/code&gt; isn't clear. I can switch the breakdown from "Usage type" to "API operation" to get another useful perspective.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost by API operation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_api_op_breakdown.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_api_op_breakdown.webp" alt="Screenshot of the AWS cost explorer API operation breakdown table" width="800" height="355"&gt;&lt;/a&gt;&lt;br&gt;Table displaying API cost breakdown, with total costs and individual operation expenses listed.
 &lt;/p&gt;

&lt;p&gt;&lt;code&gt;PutObject&lt;/code&gt; and &lt;code&gt;CreateDBInstance:0021&lt;/code&gt; look like possible culprits. There's no region information available here, so I find API operation and usage type work well together to shed actionable insights on costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  API operation and usage type together
&lt;/h3&gt;

&lt;p&gt;Where I'm confused about what a billing item is, I can combine usage type and API operation, with one as dimension and the other as a filter, to clear up the confusion. To understand &lt;code&gt;CreateDBInstance:0021&lt;/code&gt;, I set usage type as the dimension and API operation as a filter in the report parameters. There's exactly one item in the output table now.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API operation&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Aurora:BackupUsage&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That clears that one up. How about &lt;code&gt;EUW2-Requests-Tier1&lt;/code&gt;? Resetting the filters to usage type &lt;code&gt;EUW2-Requests-Tier1&lt;/code&gt; and setting API operation as the dimension, I'm told:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API operation&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PutObject costs&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PutObject usage&lt;/td&gt;
&lt;td&gt;5,340.00 Requests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Not sure what's causing those PutObject requests in this account, but at least I know what I'm looking for now!&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost by resource
&lt;/h3&gt;

&lt;p&gt;There is another option in the "Dimension" drop-down - "Resource". When I click this option, I get access denied&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_resource_breakdown.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2025-02-09-using-aws-billing-to-track-down-lost-resources%2Fassets%2Faws_billing_resource_breakdown.webp" alt="Screenshot of the AWS cost explorer resource dimension error" width="800" height="254"&gt;&lt;/a&gt;&lt;br&gt;Resource dimension error message.
 &lt;/p&gt;

&lt;p&gt;I need to turn on &lt;a href="https://docs.aws.amazon.com/cost-management/latest/userguide/ce-resource-daily.html" rel="noopener noreferrer"&gt;resource-level billing information&lt;/a&gt; to use this feature. It may contain even more explicit information, but I've found I get can get enough out of the on-by-default views I've already covered to solve my problems. I've turned it on anyway but it takes a while to become effective. I'll follow up if the information or cost associated with this feature is worth shouting about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;AWS Billing's cost explorer reporting is packed with actionable information to track down costs, including service, region and API operation usage. Using these different perspectives together can help quickly find the resources responsible, regardless of which region or service they're hiding in.&lt;/p&gt;

</description>
      <category>operations</category>
    </item>
    <item>
      <title>Google Chrome Oct 15 update broke GitHub Codespaces</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sun, 27 Oct 2024 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/google-chrome-oct-15-update-broke-github-codespaces-3mo2</link>
      <guid>https://dev.to/brabster/google-chrome-oct-15-update-broke-github-codespaces-3mo2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2Fcodespaces_broken.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2Fcodespaces_broken.webp" alt="Codespaces error 'Connection error unauthorised client refused' appears when connecting to Codespace" width="598" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Oct 15, 2024 update of Google Chrome stable (130.0.6723.58) suddenly broke some sites, such as GitHub Codespaces and 1Password, due to a JavaScript-related setting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Connection error: unauthorised client refused&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This was the error message I was greeted with on Friday 18th Oct when I tried to get back to work in my GitHub Codespace after a four-day break. Everything was just fine when I pushed my last PR from the ferry in Europoort Rotterdam on Monday morning! Could it be something to do with the &lt;a href="//../2024-02-27-automated-laptop-build-intro/index.md"&gt;six-monthly rebuild&lt;/a&gt; I'd just done? A problem with GitHub? Was there a Chrome update between Monday morning when I shut down and Friday morning when I ran my usual start of day &lt;code&gt;sudo update&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;The short version: there's a Chrome setting &lt;code&gt;DefaultJavaScriptJitSetting&lt;/code&gt; that disables the JS JIT compiler when set to value &lt;code&gt;2&lt;/code&gt;. &lt;a href="https://chromeenterprise.google/intl/en_uk/policies/#DefaultJavaScriptJitSetting" rel="noopener noreferrer"&gt;&lt;code&gt;DefaultJavaScriptJitSetting&lt;/code&gt; is documented in the Chrome enterprise policies&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;You can check the effective settings on your browser by navigating to chrome://policy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2Fchrome_policies.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2Fchrome_policies.webp" alt="Chrome policies showing DefaultJavaScriptJitSetting" width="800" height="290"&gt;&lt;/a&gt;&lt;br&gt;Chrome policies showing DefaultJavaScriptJitSetting
 &lt;/p&gt;

&lt;p&gt;Setting this value to &lt;code&gt;1&lt;/code&gt; allows JIT on all sites, or you can add exceptions on a site-by-site basis. I've updated my installation to &lt;a href="https://github.com/brabster/xubuntu-workstation/blob/7bb3d528e62f4cc79bcc6a8f7e1c1fd03ef3ee27/roles/chrome-browser/files/recommended.json#L8" rel="noopener noreferrer"&gt;allow JIT on the specific sites I need&lt;/a&gt; for now, and here's the relevant snippet from my Chrome policies file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"DefaultJavaScriptJitSetting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"JavaScriptJitAllowedForSites"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"[*.]github.dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"[*.]1password.eu"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"[*.]1password.com"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've &lt;a href="https://issues.chromium.org/issues/374469562" rel="noopener noreferrer"&gt;raised an issue against Chromium&lt;/a&gt; which has been reproduced successfully. Hopefully, the problem will be resolved soon and the exceptions will no longer be needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remote codespace in local VSCode
&lt;/h2&gt;

&lt;p&gt;I didn't make much progress against it on Friday other than discovering that I could plug my local VSCode installation into the remote Codespace successfully. Microsoft provide &lt;a href="https://docs.github.com/en/codespaces/developing-in-a-codespace/using-github-codespaces-in-visual-studio-code" rel="noopener noreferrer"&gt;how-to documentation&lt;/a&gt; and you need the GitHub Codespaces extension installed locally to do that - but it's provided by Microsoft, who also provide the VSCode application and GitHub Codespaces service, so there's no additional supply chain exposure there. It's nice to know this option is available and works well, and it pointed very clearly at a problem with Google Chrome (my browser of choice, again for reasons of supply chain trust).&lt;/p&gt;

&lt;h2&gt;
  
  
  Investigation
&lt;/h2&gt;

&lt;p&gt;I eventually narrowed the issue down to a privacy-related Chrome setting I use.&lt;/p&gt;

&lt;p&gt;I needed to sort the problem out - there's a ton of client work to do and it's going to hurt to have my productivity impacted. I'm also worried about any implications for the choices I've made to use this browser and services like Codespaces, so I dug into the problem on Friday night.&lt;/p&gt;

&lt;p&gt;I spun up my Xubuntu install USB stick. When you boot into it you have a working, ephemeral Xubuntu installation with root privileges, unlike my locked-down persistent installation. Between this and other testing, I established that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The issue was reproducible on multiple machines.&lt;/li&gt;
&lt;li&gt;The issue only manifested when I applied my policy settings. The vanilla browser was fine, which would explain why others weren't seeing the problem. Just me with my rather risk-averse settings.&lt;/li&gt;
&lt;li&gt;A process of elimination narrowed the problem down to a specific setting.&lt;/li&gt;
&lt;li&gt;Firefox manifested the same issue on Codespaces. Turned out to be a different privacy-related setting, &lt;a href="https://github.com/brabster/xubuntu-workstation/blob/7bb3d528e62f4cc79bcc6a8f7e1c1fd03ef3ee27/roles/firefox/files/policies.json#L8" rel="noopener noreferrer"&gt;EnableTrackingProtection&lt;/a&gt;. That was pretty confusing.&lt;/li&gt;
&lt;li&gt;The issue is present in beta and dev channels for Chrome, not just current stable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Other affected sites
&lt;/h2&gt;

&lt;p&gt;I also found issues with &lt;a href="https://regex101.com" rel="noopener noreferrer"&gt;regex101.com&lt;/a&gt;, &lt;a href="https://start.1password.com" rel="noopener noreferrer"&gt;1password&lt;/a&gt; and even &lt;a href="https://sheets.google.com" rel="noopener noreferrer"&gt;Google Sheets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2Fregex101_error.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2Fregex101_error.webp" alt="regex101 reporting an error and suggesting upgrading the browser" width="725" height="414"&gt;&lt;/a&gt;&lt;br&gt;"Unfortunately it seems your browser does not meet the criteria to properly render and utilize this website." regex101.com not working on the latest Chrome
 &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2F%2F1password_error.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2F%2F1password_error.webp" alt="1password reporting an error and suggesting upgrading the browser" width="724" height="635"&gt;&lt;/a&gt;&lt;br&gt;"Update your browser to keep using 1password" 1password not working on latest Chrome
 &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2F%2Fgoogle_sheets_error.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-10-20-google-chrome-oct-15-update-break%2Fassets%2F%2Fgoogle_sheets_error.webp" alt="Google Sheets reporting an error and suggesting to clear cookies and site data" width="632" height="350"&gt;&lt;/a&gt;&lt;br&gt;"Troubleshoot this issue by clearing application resources" Google Sheets not working on latest Chrome
 &lt;/p&gt;

&lt;p&gt;WebAssembly seems to be mentioned in error messages more often than coincidence would suggest, and it might make sense for an issue related to compilation and WebAssembly to manifest like this. We'll see, but at least there's a workaround, even if it does involve "dropping the shields" on some sites.&lt;/p&gt;

&lt;p&gt;I already have to place significant trust in GitHub Codespaces and 1password by the nature of the service they provide. Sorry, but I can live without regex101.com - I'm using &lt;a href="https://regexr.com" rel="noopener noreferrer"&gt;regexr.com&lt;/a&gt; instead for now.&lt;/p&gt;




&lt;p&gt;If you want to get in touch with me about the content in this post, you can find me on &lt;a href="https://www.linkedin.com/in/paulbrabban" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://github.com/brabster/tw-site-mkdocs/discussions" rel="noopener noreferrer"&gt;raise an issue/start a discussion in the GitHub repo&lt;/a&gt;. I'll be happy to credit you for any corrections or additions!&lt;/p&gt;

&lt;p&gt;If you liked this, you can find content from other great consultants on the &lt;a href="https://www.equalexperts.com/network-blogs/" rel="noopener noreferrer"&gt;Equal Experts network blogs page&lt;/a&gt; 🎉&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Map over an array in BigQuery</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Wed, 31 Jul 2024 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/map-over-an-array-in-bigquery-4ngp</link>
      <guid>https://dev.to/brabster/map-over-an-array-in-bigquery-4ngp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3z2mdv91tx1n5p641on7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3z2mdv91tx1n5p641on7.png" alt="SQL snippet for map explained in post" width="581" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This walkthrough shows how I can use the functional programming techniques &lt;code&gt;map&lt;/code&gt; and &lt;code&gt;filter&lt;/code&gt; that I already know and love in SQL engines like BigQuery. These techniques give me a lot of processing power whilst keeping my SQL simple and relatively easy to understand. Unlike custom code, I can use the same SQL and infrastructure I'd use to process ten rows to process ten billion rows in seconds.&lt;/p&gt;

&lt;p&gt;BigQuery supports array-typed columns but doesn't provide an obvious &lt;code&gt;map&lt;/code&gt; or a &lt;code&gt;filter&lt;/code&gt; function. I missed these functions until I realised that you can use these functional programming concepts, it's not just not obvious how.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why &lt;code&gt;map&lt;/code&gt; over an array in BigQuery?
&lt;/h2&gt;

&lt;p&gt;An associate recently mentioned that they'd found a problem with a SQL pipeline I'd written some time ago. I'd used some fiddly &lt;a href="https://github.com/kristopolous/Porter-Stemmer/blob/master/PorterStemmer1980.js" rel="noopener noreferrer"&gt;open source JavaScript code to implement a Porter stemmer&lt;/a&gt; as &lt;a href="https://cloud.google.com/bigquery/docs/user-defined-functions#javascript-udf-structure" rel="noopener noreferrer"&gt;a JavaScript UDF&lt;/a&gt;. The text I was stemming was often multiple words, and I'd assumed the stemmer would stem all the words.&lt;/p&gt;

&lt;p&gt;Nope. It turns out that the stemmer code I'd used would stem whatever text was given as one word. For example, if the text to stem was "connecting connected connections", then it got stemmed to "connecting connected connect" instead of&lt;br&gt;
"connect connect connect".&lt;/p&gt;

&lt;p&gt;Oops. Sorry, person who had to clean up my mess. The solution? Break the text up on whitespace, then &lt;code&gt;map&lt;/code&gt; the stemming function over the words.&lt;/p&gt;

&lt;p&gt;It doesn't matter what the function we're going to &lt;code&gt;map&lt;/code&gt; over the array actually does, so rather than complicate things with a custom stemmer, I'll use a built-in function for this example. I'll use &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#left" rel="noopener noreferrer"&gt;the LEFT function to simulate a stemmer&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to &lt;code&gt;map&lt;/code&gt; in BigQuery
&lt;/h2&gt;

&lt;p&gt;First - a couple of example strings&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;-- stemmed version is 'connect connect connect'&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'connecting connected connections'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;the_text&lt;/span&gt;
    &lt;span class="c1"&gt;-- stemmed version is 'eardrum eardrum'&lt;/span&gt;
    &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'eardrummed eardrums'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;the_text&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;the_text&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;connecting connected connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;eardrummed eardrums&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Split the text to get an array
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;words&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;[connecting,connected,connections]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;[eardrummed,eardrums]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  UNNEST and SELECT
&lt;/h3&gt;

&lt;p&gt;I think about the next step as making a little table out of the array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'connecting'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'connected'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'connections'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;UNNEST&lt;/code&gt; turns the array into a table, and I have aliased the resulting rows as &lt;code&gt;word&lt;/code&gt;. Then, I can select &lt;code&gt;word&lt;/code&gt;, which maps the identity function over the array. If I put that back into my query...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I get an error &lt;code&gt;Scalar subquery produced more than one element&lt;/code&gt;. That's because I haven't collected the little table back up into an array yet. That's easy to do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we have the table we started with back:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;words&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;[connecting,connected,connections]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;[eardrummed,eardrums]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Mapping the dummy stemmer
&lt;/h3&gt;

&lt;p&gt;The next step is straightforward, using &lt;code&gt;LEFT(text, 7)&lt;/code&gt; as a dummy stemmer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which produces the output we want:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;words&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;[connect,connect,connect]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;[eardrum,eardrum]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Applying &lt;code&gt;filter&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;You've already figured this out, but once I've unnested my array into a table, I can use &lt;code&gt;WHERE&lt;/code&gt; to filter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'connected'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which filters out the word 'connected' from my arrays.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;words&lt;/th&gt;
&lt;th&gt;stemmed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;[connecting,connections]&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;[eardrummed,eardrums]&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Naturally, I can combine map and filter operations in the same statement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multiple return values
&lt;/h2&gt;

&lt;p&gt;"But wait" I hear you say, "it looks like I can select multiple columns in my &lt;code&gt;map&lt;/code&gt;!". You're right, let's try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get an error: &lt;code&gt;ARRAY subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values at [9:5]&lt;/code&gt;. At least this error gives a clue how to proceed. Just like a regular &lt;code&gt;map&lt;/code&gt; operation, you can't return multiple arrays, you have to pack the values into a single structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LEFT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;stemmed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPLIT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;the_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting structure isn't easily represented in a table, so I'll &lt;a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#to_json_string" rel="noopener noreferrer"&gt;format the structs as JSON&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;jsonified&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;[{"word":"connecting","stemmed":"connect"},{"word":"connected","stemmed":"connect"},{"word":"connections","stemmed":"connect"}]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;[{"word":"eardrummed","stemmed":"eardrum"},{"word":"eardrums","stemmed":"eardrum"}]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;I showed how to "map" the functional programming &lt;code&gt;map&lt;/code&gt; and &lt;code&gt;filter&lt;/code&gt; concepts to BigQuery by thinking in terms of little embedded SQL statements over unnested arrays as tables. I've not yet had any need to use &lt;code&gt;reduce&lt;/code&gt;, the other function that springs to mind, so I'll update this if I ever need it.&lt;/p&gt;

&lt;p&gt;This approach seems to work generally across SQL engines with minimal dialect variation. Some data warehouses provide functions that implement these capabilities more directly and naturally. For example, the Trino (was Presto) engine under AWS Athena provides &lt;a href="https://trino.io/docs/current/functions/array.html#transform" rel="noopener noreferrer"&gt;transform (for map)&lt;/a&gt;, &lt;a href="https://trino.io/docs/current/functions/array.html#filter" rel="noopener noreferrer"&gt;filter&lt;/a&gt; and &lt;a href="https://trino.io/docs/current/functions/array.html#reduce" rel="noopener noreferrer"&gt;reduce&lt;/a&gt; functions over arrays.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Handling CVE-2018-20225</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sat, 18 May 2024 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/handling-cve-2018-20225-4l91</link>
      <guid>https://dev.to/brabster/handling-cve-2018-20225-4l91</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Fsafety_actions.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Fsafety_actions.webp" alt="GitHub Actions update and safety run logs ignore and succeeds" width="647" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2018-20225" rel="noopener noreferrer"&gt;CVE-2018-20225&lt;/a&gt; in all versions of &lt;code&gt;pip&lt;/code&gt; tripped my vulnerability alerting this morning. If you're scanning for vulnerabilities using Safety, you've probably seen the same alarm. This post captures my reasoning and decision-making process to understand the risk and impact of this vulnerability and then deal with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery
&lt;/h2&gt;

&lt;p&gt;When I launched my &lt;a href="https://github.com/brabster/pypi_vulnerabilities" rel="noopener noreferrer"&gt;pypi-vulnerabilities&lt;/a&gt; codespace this morning, VSCode ran my &lt;a href="https://github.com/brabster/pypi_vulnerabilities/blob/c056bf3b2a4605a91526b27e0fb5ef93098e3fc4/.dev_scripts/init_and_update.sh" rel="noopener noreferrer"&gt;init_and_update&lt;/a&gt; script automatically. This script installs my dependencies and then checks them for known vulnerabilities. The first indication that something was wrong was a red task in the terminal sidebar.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Finit_update_fail.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Finit_update_fail.webp" alt="init_and_update task red with an error indicator in a VSCode sidebar" width="216" height="117"&gt;&lt;/a&gt;&lt;br&gt;init_and_update task red with an error indicator in a VSCode sidebar
 &lt;/p&gt;

&lt;p&gt;Clicking into that terminal, I see:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Scan was completed. 1 vulnerability was reported.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Scrolling up, I get this detail:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-&amp;gt; Vulnerability found in pip version 24.0
   Vulnerability ID: 67599
   Affected spec: &amp;gt;=0
   ADVISORY: ** DISPUTED ** An issue was discovered in pip (all versions) because it installs the version with the highest version number, even if the user had
   intended to obtain a private package from a private index. This only affects use of the --extra-index-url option, and exploitation requires that the package does not...
   CVE-2018-20225
   For more information about this vulnerability, visit https://data.safetycli.com/v/67599/97c
   To ignore this vulnerability, use PyUp vulnerability id 67599 in safety’s ignore command-line argument or add the ignore to your safety policy file.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Facts
&lt;/h2&gt;

&lt;p&gt;Or at least, my understanding of the facts. What do I think I know at this point?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I've installed and updated my dependencies to the latest versions, including &lt;a href="https://github.com/brabster/pypi_vulnerabilities/blob/c056bf3b2a4605a91526b27e0fb5ef93098e3fc4/.dev_scripts/init_and_update.sh#L17" rel="noopener noreferrer"&gt;pip&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;I've run &lt;code&gt;safety check&lt;/code&gt; and it found a known vulnerability, CVE-2018-20225, is present in the updated dependencies.&lt;/li&gt;
&lt;li&gt;There's no known fixed release available, as I would already have it thanks to open upper version bounds and automatic update before checking for vulnerabilities.&lt;/li&gt;
&lt;li&gt;I'm running in a codespace and the only credentials this vulnerability can expose are:

&lt;ul&gt;
&lt;li&gt;write access to the public-read pypi-vulnerabilities repo.&lt;/li&gt;
&lt;li&gt;BigQuery data editor access to the public PyPI data that I've been working on.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The same vulnerability will be present in GitHub Actions runs (executed daily and on PRs), which have the same risk exposure as Codespaces.&lt;/li&gt;

&lt;li&gt;As it's a pip vulnerability, any other Python repo I have will currently have the same vulnerability (like this one for my website!)&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I'm already ahead of where I used to be before I embraced automation and auto-updating. My automation made sure I was up to date and that I ran a vulnerability scan while I was still daydreaming about fixing the failed GitHub Actions run I got an email about yesterday. I know it's not noise that I can fix by simply updating and I have to take a look and decide what to do next. Running in a codespace lets me relax a little, as it dramatically reduces the scope of things that the vulnerability could exploit or steal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Taking Action
&lt;/h2&gt;

&lt;p&gt;Two actions spring to mind. First up, look at the vulnerability information and decide what I should do. Let's check out &lt;a href="https://data.safetycli.com/v/67599/97c/" rel="noopener noreferrer"&gt;https://data.safetycli.com/v/67599/97c/&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;** DISPUTED ** An issue was discovered in pip (all versions) because it installs the version with the highest version number, even if the user had intended to obtain a private package from a private index. This only affects use of the --extra-index-url option, and exploitation requires that the package does not already exist in the public index (and thus the attacker can put the package there with an arbitrary version number). NOTE: it has been reported that this is intended functionality and the user is responsible for using --extra-index-url securely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I'm not using private indexes, and I don't use the &lt;code&gt;--extra-index-url&lt;/code&gt; option. Looks like something I can safely ignore, but my paranoia tells me that I shouldn't do so forever. It's possible, even if I believe it unlikely, that more powerful ways of exploiting this CVE might be found in future. Ideally, I want to ignore it for a while and be prompted again in future if the CVE does not get withdrawn and a fixed release does not become available.&lt;/p&gt;

&lt;h3&gt;
  
  
  Safety Policy
&lt;/h3&gt;

&lt;p&gt;Safety has come a long way since I first used it. I remember when it only supported passing "ignore" flags to ignore specific vulnerabilities, which was problematic to work with and automate around. Now I see that &lt;a href="https://docs.safetycli.com/safety-2/safety-cli-2-scanner/policy-file" rel="noopener noreferrer"&gt;Safety supports a "policy file"&lt;/a&gt;, so let's generate one and see what we can do with it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;(venv) @brabster ➜ /workspaces/pypi_vulnerabilities (main) $&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;safety generate policy_file
&lt;span class="go"&gt;A default Safety policy file has been generated! Review the file contents in the path . in the file: .safety-policy.yml
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Checking out that new file, I see lots of configuration being set. That seems like a pretty bad idea to me. I just want to ignore this one vulnerability for a while, I don't want to hardcode a bunch of configuration options. In doing so, I effectively take responsibility for what their effects are going forward. Nah. &lt;em&gt;Delete delete delete&lt;/em&gt;. Instead, I'll use &lt;a href="https://docs.safetycli.com/safety-2/safety-cli-2-scanner/policy-file#structure-of-the-policy-file" rel="noopener noreferrer"&gt;the ignore example given in the documentation&lt;/a&gt; to &lt;strong&gt;just&lt;/strong&gt; configure what I want to happen for this specific vulnerability. The other Safety defaults are fine going forward, thanks.&lt;/p&gt;

&lt;p&gt;Here's what I end up with in my &lt;a href="https://github.com/brabster/pypi_vulnerabilities/blob/515e6d1070902e581b5b71a3872bb5c9eab34c74/.safety-policy.yml" rel="noopener noreferrer"&gt;.safety_policy.yml&lt;/a&gt;...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;security&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ignore-vulnerabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;67599'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;do not use a private index&lt;/span&gt;
      &lt;span class="na"&gt;expires&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2024-07-02&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;safety check&lt;/code&gt; now...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Vulnerability found &lt;span class="k"&gt;in &lt;/span&gt;pip version 24.0
&lt;span class="go"&gt;   Vulnerability ID: 67599
   This vulnerability is being ignored until 2024-07-02 00:00:00 UTC. See your configurations.
   Reason: do not use a private index
   For more information about this vulnerability, visit https://data.safetycli.com/v/67599/97c
&lt;/span&gt;&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="c"&gt;## snip ###&lt;/span&gt;
&lt;span class="go"&gt; Scan was completed using a safety policy file. 0 vulnerabilities were reported. 1 vulnerability from 1 package was ignored.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seems to be working fine, ignoring the expected vulnerability until the correct date. Nice confirmation in the output too ✅. I'm expecting it to return successfully instead of an error status so that my automation is all good. Does it?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;(venv) @brabster ➜ /workspaces/pypi_vulnerabilities (main) $&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt;
&lt;span class="go"&gt;0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes (&lt;a href="https://www.cyberciti.biz/faq/linux-bash-exit-status-set-exit-statusin-bash/" rel="noopener noreferrer"&gt;'0' means OK&lt;/a&gt;, it would be pretty dumb if it returned an error!). Let's just check that it returns to being an error when the date is in the past. Updating the policy file to set the date to yesterday (2024-05-17) and running &lt;code&gt;safety check&lt;/code&gt; again... yes, no ignore information is printed to the output and the status code is '64', so an error 🙌.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confirming Measures Are Effective
&lt;/h3&gt;

&lt;p&gt;I'll set the date back to 2024-07-02. Why? The free safety database is updated monthly on the 1st, and I'm pretty relaxed about the risk for this one. I'll give it the rest of May and June before checking back in. Does my &lt;code&gt;init_and_update&lt;/code&gt; task work cleanly now?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Finit_update_fixed.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Finit_update_fixed.webp" alt="init_and_update task with no colour and checkmark indicating success" width="237" height="77"&gt;&lt;/a&gt;&lt;br&gt;init_and_update task with no colour and checkmark indicating success
 &lt;/p&gt;

&lt;p&gt;Yes. Obviously, it was going to work, but I guess I've worked with software for long enough that I have trust issues. I don't believe it 'til I see it 🤷. I'll commit that policy file back with the commit message "Ignore low-risk pip vuln &lt;a href="https://data.safetycli.com/v/67599/97c" rel="noopener noreferrer"&gt;https://data.safetycli.com/v/67599/97c&lt;/a&gt;" and check that the resulting Actions workflow is also clean.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Fsafety_actions.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-18-handling-cve-2018-20225%2Fassets%2Fsafety_actions.webp" alt="GitHub Actions update and safety run logs ignore and succeeds" width="647" height="289"&gt;&lt;/a&gt;&lt;br&gt;GitHub Actions update and safety run logs ignore and succeeds
 &lt;/p&gt;

&lt;p&gt;The Actions run still fails though - oh yes, that's why I was launching the workspace in the first place! Before I get back to that though, I said &lt;strong&gt;two&lt;/strong&gt; actions didn't I?&lt;/p&gt;

&lt;h3&gt;
  
  
  Ignoring At Scale
&lt;/h3&gt;

&lt;p&gt;The other action relates to the other repositories I have that are going to be affected by the same vulnerability. Every repo that checks for vulnerabilities from this point until this vulnerability is withdrawn or fixed will raise the alarm. The safety policy file is local to &lt;code&gt;pypi-vulnerabilities&lt;/code&gt;, so that's no help. How to manage this sort of vulnerability scanning and alerting at any sort of scale?&lt;/p&gt;

&lt;p&gt;To be honest, that's a problem I've had in every role I've taken on in the past decade, employee and consultant alike. I've not had a good solution to date and I end up just working through the repos applying whatever ignore solution I had at the time. Given that I have stuff I'm maintaining in the public domain now, and a reason to blog about it, I'm going to give it a little thought this time and see if I can come up with something better.&lt;/p&gt;

&lt;p&gt;What do you do to solve this problem?&lt;/p&gt;

</description>
      <category>automation</category>
      <category>operations</category>
      <category>security</category>
    </item>
    <item>
      <title>dbt 1.8 breaks on update</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Sun, 12 May 2024 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/dbt-18-breaks-on-update-1j40</link>
      <guid>https://dev.to/brabster/dbt-18-breaks-on-update-1j40</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-12-dbt-1.8-breaks-on-update%2Fassets%2Fdbt_1_8_break.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-12-dbt-1.8-breaks-on-update%2Fassets%2Fdbt_1_8_break.webp" alt="Error traceback ending with No module named dbt.adapters.factory" width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On updating dbt-bigquery to latest 1.8.0: &lt;strong&gt;No module named 'dbt.adapters.factory'&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Issue - dbt 1.8.0 won't run
&lt;/h2&gt;

&lt;p&gt;I spun up a Codespace on &lt;a href="https://github.com/brabster/pypi_vulnerabilities" rel="noopener noreferrer"&gt;pypi_vulnerabilities&lt;/a&gt; to resolve a failing build.&lt;br&gt;
My auto-update ran as usual, updating all my Python dependencies to latest stable release and I got dbt-core 1.8.0, &lt;a href="https://github.com/dbt-labs/dbt-core/releases/tag/v1.8.0" rel="noopener noreferrer"&gt;now out of beta as of 2024-05-09&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The first thing my auto-update does that depends on dbt - install/update dbt packages - failed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Successfully installed daff-1.3.46 dbt-adapters-1.1.1 dbt-bigquery-1.8.0 dbt-common-1.0.4 dbt-core-1.8.0 dbt-semantic-interfaces-0.5.1
install or upgrade dbt dependencies
Traceback (most recent call last):
&lt;/span&gt;&lt;span class="gp"&gt;  File "/workspaces/pypi_vulnerabilities/venv/bin/dbt", line 5, in &amp;lt;module&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;    from dbt.cli.main import cli
&lt;/span&gt;&lt;span class="gp"&gt;  File "/workspaces/pypi_vulnerabilities/venv/lib/python3.11/site-packages/dbt/cli/__init__.py", line 1, in &amp;lt;module&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;    from .main import cli as dbt_cli  #&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;noqa
&lt;span class="go"&gt;    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
&lt;/span&gt;&lt;span class="gp"&gt;  File "/workspaces/pypi_vulnerabilities/venv/lib/python3.11/site-packages/dbt/cli/main.py", line 14, in &amp;lt;module&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;    from dbt.cli import requires, params as p
&lt;/span&gt;&lt;span class="gp"&gt;  File "/workspaces/pypi_vulnerabilities/venv/lib/python3.11/site-packages/dbt/cli/requires.py", line 10, in &amp;lt;module&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;    from dbt.adapters.factory import adapter_management, register_adapter, get_adapter
ModuleNotFoundError: No module named 'dbt.adapters.factory'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Clean Install to Fix
&lt;/h2&gt;

&lt;p&gt;Clearing my &lt;code&gt;venv&lt;/code&gt; and reinstalling from scratch gets things back up and running. Here's the specific procedure I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;deactivate &lt;span class="c"&gt;# deactivate the venv, if active&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; venv &lt;span class="c"&gt;# recursively delete the venv folder&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; venv/bin/activate
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;dbt deps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;I actually ran my &lt;a href="https://github.com/brabster/pypi_vulnerabilities/blob/c056bf3b2a4605a91526b27e0fb5ef93098e3fc4/.dev_scripts/init_and_update.sh" rel="noopener noreferrer"&gt;init_and_update.sh&lt;/a&gt; script rather than those highlighted lines, but those are the relevant things it does.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This also explains why my build wasn't breaking (well, not for this reason anyway) - it always starts from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Changes in Adapter Dependencies
&lt;/h2&gt;

&lt;p&gt;That's how you can fix the &lt;code&gt;No module named 'dbt.adapters.factory'&lt;/code&gt; error if it comes up.&lt;br&gt;
As I read the &lt;a href="https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.8" rel="noopener noreferrer"&gt;v1.8 upgrade docs&lt;/a&gt; I also learned that dbt Labs intend to change the way Python adapter dependencies work.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Beginning in v1.8, dbt-core and adapters are decoupled. Going forward, your installations should explicitly include both dbt-core and the desired adapter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://github.com/dbt-labs/dbt-adapters/discussions/87" rel="noopener noreferrer"&gt;Scanning through the related technical docs&lt;/a&gt;, I think &lt;a href="//../2024-05-01-how-i-do-python-supply-chain-security/index.md#updating-dependencies-automatically"&gt;the auto-update approach I use&lt;/a&gt; will still work as intended without explicitly specifying the core dependency.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Up to now adapters have been required release a new minor version to declare compatibility with dbt-core’s minor version. Post dbt-core version 1.8, adapters will not need to do this. Instead maintainers will need to declare their compatibility with dbt-adapters’ versions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, if you're &lt;strong&gt;setting your adapter dependency with an open upper bound (eg. &lt;code&gt;dbt-bigquery&amp;gt;=1.8.0&lt;/code&gt; as mine will be shortly) then you should also get the latest compatible &lt;code&gt;dbt-core&lt;/code&gt; version each time you run &lt;code&gt;pip install -U -r requirements.txt&lt;/code&gt;&lt;/strong&gt; on your development environment or your build system. I'll post how that shakes out over time.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>insights</category>
    </item>
    <item>
      <title>How I Do Python Data Supply Chain Security</title>
      <dc:creator>Paul Brabban</dc:creator>
      <pubDate>Wed, 01 May 2024 00:00:00 +0000</pubDate>
      <link>https://dev.to/brabster/how-i-do-python-data-supply-chain-security-48l5</link>
      <guid>https://dev.to/brabster/how-i-do-python-data-supply-chain-security-48l5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-01-how-i-do-python-supply-chain-security%2Fassets%2Fhero.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Ftempered.works%2F2024-05-01-how-i-do-python-supply-chain-security%2Fassets%2Fhero.webp" alt="A photo taken whilst SCUBA diving of Thresher shark circling off a seamount in the Phillipines. Credit: me" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We data practitioners - data scientists, data engineers, analytics engineers, et al. - have a hard time when it comes to security. We're exposed to tools that demand we write code and deal with the messy world of programming languages and packages. We often have little choice but to drag insights out of real and sensitive data, exposing us to risks other developers can avoid, because insights don't hide in test data. Training, career paths and dev-experience efforts typically overlook data folks, depriving them of knowledge about the risks they're exposed to and how to mitigate them. Read on and I'll share what I do (and why) to protect myself, &lt;a href="https://equalexperts.com" rel="noopener noreferrer"&gt;Equal Experts&lt;/a&gt; and my clients from the security risks lurking behind every piece of software.&lt;/p&gt;

&lt;p&gt;When I speak to fellow data practitioners I find there's a common concern about the security risks they're exposed to, but a lack of clear, pragmatic guidance on how to mitigate them. A lot of the guidance out there, like &lt;a href="https://snyk.io/blog/python-security-best-practices-cheat-sheet/" rel="noopener noreferrer"&gt;Snyk's Python security best practices&lt;/a&gt;, focuses on traditional software engineering. "Data" work can be a little different.&lt;/p&gt;

&lt;p&gt;This story covers the things I've learned to do in my day-to-day over the past decade or more, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Partitioning My Work&lt;/li&gt;
&lt;li&gt;Using pip&lt;/li&gt;
&lt;li&gt;Assessing Dependency Risk&lt;/li&gt;
&lt;li&gt;Keeping Dependencies Up To Date&lt;/li&gt;
&lt;li&gt;Scanning for Vulnerabilities&lt;/li&gt;
&lt;li&gt;Using Least-Privilege Credentials&lt;/li&gt;
&lt;li&gt;Cloud Controls&lt;/li&gt;
&lt;li&gt;Templating&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Partitioning My Work
&lt;/h2&gt;

&lt;p&gt;I have at least three different, isolated working environments that I use locally. One laptop is for my personal stuff. At any given time it may have access to credentials for my AWS accounts, my GCP account, my password manager and GMail. If someone bad got into the first two they could run me up a painful bill, despite having billing alarms set up (I talk about that in &lt;a href="//../2024-02-08-pypi-downloads-danger/index.md"&gt;$1,370 Gone in Sixty Seconds&lt;/a&gt;). If they got into either of the last two they can steal my identity and do untold harm to me and my family. I really need to minimise the risk on that computer!&lt;/p&gt;

&lt;p&gt;Then I have a laptop that Equal Experts gave me. This will have credentials for EE things that I have a responsibility to protect. EE-related work on here.&lt;/p&gt;

&lt;p&gt;Finally, I have a laptop that I do client work on. Same deal, with even greater responsibility to protect. This laptop is used for nothing but client work.&lt;/p&gt;

&lt;p&gt;I have a separate long, strong password for all three (I can reliably hold about four such passwords in my brain at once, if I'm using them regularly and don't have to change them all at once!), and their disks are all encrypted. That's the basic layer of protection - if any one were ever compromised, it's bad enough. But it's better than all three being compromised. There's more about how I manage multiple laptops and additional security measures I use in my posts on &lt;a href="//../2024-02-27-automated-laptop-build-intro/index.md"&gt;automating my laptop build&lt;/a&gt; and &lt;a href="//../2024-03-01-automated-laptop-build-conclusion/index.md"&gt;living with an automated laptop build&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;!!! warning&lt;br&gt;
    Like many things I'll talk about in this post, password managers are awesome but ship with unsafe defaults - convenience over security 🤷. I make an point of setting the delay before my password manager clears the clipboard to 30 seconds (default is never!) and setting the timeout before locking again to five minutes. Reduces the time window things have to steal passwords without significantly impacting my day to day, Don't forget the password manager app on your phone...&lt;/p&gt;

&lt;p&gt;I've recently been having a great experience with even more protective partitioning - a dedicated, isolated, customisable development environment per &lt;strong&gt;repository&lt;/strong&gt; with GitHub Codespaces. I think it's the future. If coding in the cloud is an option for you, I'd recommend giving it a try with an open mind. I've even got &lt;a href="//../2024-04-23-codespaces/index.md"&gt;a Codespaces walkthrough and video to help&lt;/a&gt;!&lt;/p&gt;
&lt;h3&gt;
  
  
  Using Virtualenvs
&lt;/h3&gt;

&lt;p&gt;It's really easy to install into your system Python by mistake - you forget to activate the venv, or you think it's active when it's not. Ideally, your system permissions are set up so that you can't write to your system Python installation, but I find that's quite rare. It's certainly not the case on the current Ubuntu OS on the laptop I'm sitting in front of right now.&lt;/p&gt;

&lt;p&gt;You can change the default behaviour of pip on your computer so that it won't install in the system Python.&lt;br&gt;
I set an environment variable &lt;code&gt;PIP_REQUIRE_VIRTUALENV&lt;/code&gt; to &lt;code&gt;true&lt;/code&gt; in scripts where I interact with pip. For example, &lt;a href="https://github.com/brabster/tw-site-mkdocs/blob/27b7a94d7dcaf0fd51c39395a205db1e5de1e9a2/.dev_scripts/init_and_update.sh#L5" rel="noopener noreferrer"&gt;the init_and_update.sh script in this repository sets PIP_REQUIRE_VIRTUALENV&lt;/a&gt;. I also set it one-off as a system-wide environment variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PIP_REQUIRE_VIRTUALENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;safety
&lt;span class="go"&gt;ERROR: Could not find an activated virtualenv (required).
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are &lt;a href="https://docs.python-guide.org/dev/pip-virtualenv/#requiring-an-active-virtual-environment-for-pip" rel="noopener noreferrer"&gt;several other ways to tell pip not to install in the system Python&lt;/a&gt; if environment variables don't work for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using pip
&lt;/h2&gt;

&lt;p&gt;For the past 18 months, I've used plain old &lt;a href="https://packaging.python.org/en/latest/guides/tool-recommendations/#installing-packages" rel="noopener noreferrer"&gt;pip, the Python Packaging Authority's recommended tooling&lt;/a&gt;. I'm very happy with it and would recommend it to other Python data practitioners. My workflow with pip and venv is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv &lt;span class="c"&gt;# create a venv for the current project if needed&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate &lt;span class="c"&gt;# activate the venv (slightly different command on Windows)&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;which python &lt;span class="c"&gt;# check that the venv is active; IDE usually indicates clearly when venv is active&lt;/span&gt;
&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt &lt;span class="c"&gt;# install and update packages in this venv based on requirements.txt file&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pipenv, Poetry et al.
&lt;/h3&gt;

&lt;p&gt;I've used &lt;a href="https://pipenv.pypa.io/en/latest/" rel="noopener noreferrer"&gt;pipenv&lt;/a&gt;. I've used &lt;a href="https://python-poetry.org/" rel="noopener noreferrer"&gt;Poetry&lt;/a&gt;. There are several more. In my experience, they don't deliver significant benefits over pip and are more trouble than they're worth. I have a bunch of stories about this that I'll share another day.&lt;/p&gt;

&lt;p&gt;!!! note&lt;br&gt;
    &lt;a href="https://github.com/pypa/pip/issues/988" rel="noopener noreferrer"&gt;&lt;code&gt;pip&lt;/code&gt; got a proper dependency resolver in 20.3 released in 2020&lt;/a&gt;. I can't recall using &lt;code&gt;pip&lt;/code&gt; before that but I imagine it wasn't the most reliable. I haven't had any problems updating on a daily basis since sometime in 2022. Kudos to fellow consultant &lt;a href="https://github.com/jneves/" rel="noopener noreferrer"&gt;João Neves&lt;/a&gt; for the feedback 🙇.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conda
&lt;/h3&gt;

&lt;p&gt;I haven't used conda. One of the reasons for that is the complexity around commercial use in the &lt;a href="https://conda-forge.org/blog/2020/11/20/anaconda-tos/" rel="noopener noreferrer"&gt;terms of service&lt;/a&gt; to warrant a &lt;a href="https://legal.anaconda.com/policies/en/" rel="noopener noreferrer"&gt;clarifying blog post&lt;/a&gt;. I think the advice here applies equally to conda users but I can't speak from my own experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assessing Dependency Risk
&lt;/h2&gt;

&lt;p&gt;Any software you bring onto your computer has the potential to hurt you. In the case of Python, just installing a package can let bad actors loose on your computer. My working assumption is that any software running on the computer I'm using can do anything I can do, including accessing any passwords, access tokens, and session tokens I have. Maybe even my password manager if it's unlocked.&lt;/p&gt;

&lt;p&gt;I don't trust what I see on the internet. Anyone can make up an identity, publish anything they want and say anything they want. Identities can and have been stolen and used to inject malware into previously safe software. Maintainers can be bought out or get burnt out. I don't think there's anything I can do here that completely mitigates the risks, but I can reduce it. I'll take this opportunity to plug &lt;a href="https://python.land/virtual-environments/installing-packages-with-pip" rel="noopener noreferrer"&gt;a responsible, in-depth treatment of package handling over at python.land&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is the package popular?
&lt;/h3&gt;

&lt;p&gt;A heavily-used package is a juicy target for bad actors. On the other hand... heavily-used packages have more eyeballs on them. They're more likely to be looked at by security researchers, and if there is a problem I'm in a larger crowd of potential victims, so a lower chance that I will be exploited before I have a chance to respond or my credentials and so on expire or are changed. On balance, I prefer packages with evidence of large user communities.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;search for articles and blogs mentioning it&lt;/li&gt;
&lt;li&gt;check out GitHub stars, forks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special mention to &lt;a href="https://libraries.io/search?q=colourama&amp;amp;sort=dependents_count" rel="noopener noreferrer"&gt;libraries.io&lt;/a&gt;, providing a search interface with metrics about how a package is used by others - &lt;code&gt;Sort: Dependents&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;!!! warning&lt;br&gt;
    &lt;strong&gt;Copy-paste the package name into requirements.txt from somewhere you trust!&lt;/strong&gt; The bad actors love "typosquatting":publishing malware-laden packages with similar names to popular legitimate packages. These packages are downloaded thousands of times before they are identified and removed!&lt;/p&gt;

&lt;h3&gt;
  
  
  Minimising Dependency Use
&lt;/h3&gt;

&lt;p&gt;There's so much software out there, a solution for every problem or suboptimal thing you could imagine. I save myself the time of more in-depth thinking by just trying to be honest with myself about whether I really need something or not.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://docs.python.org/3/tutorial/stdlib.html#batteries-included" rel="noopener noreferrer"&gt;Python is famous for its "batteries included" philosophy&lt;/a&gt;, so I find it's worth checking whether you can use something built in for no additional risk rather than a package that does expose you to new risks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Do I really need to use &lt;code&gt;Poetry&lt;/code&gt;? No, I can use boring old &lt;code&gt;pip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Do I really need to use &lt;code&gt;murmurhash&lt;/code&gt;? No, I can use Python's boring old built-in &lt;code&gt;hash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Do I really need to use &lt;code&gt;colorama&lt;/code&gt;? No, I can live with boring old monochrome terminal text.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just because I'm making boring choices in the name of safety does not mean the packages I depend on are making similar choices. Every time I avoid a dependency, I'm cutting out that dependency, and its dependencies, and their dependencies and so on! That whole subtree of dependencies, and the choices their maintainers make, and their vulnerabilities? Not my problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Common Cross-Project Dependencies
&lt;/h3&gt;

&lt;p&gt;I also try to use the same dependencies everywhere, instead of allowing variation without good reason. That helps me really get to know those dependencies and their maintainers whilst reducing the exposure I have to different supply chains generally. Want more on this topic? &lt;a href="https://gist.github.com/sleepyfox/8415e64da732c7fea02f21f1c0314f62" rel="noopener noreferrer"&gt;ZDD (Zero Dependency Development)&lt;/a&gt; is a well-argued and more detailed case for minimising and eliminating dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Well-Maintained Dependencies
&lt;/h3&gt;

&lt;p&gt;I'm suspicious of packages that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;have only one maintainer - bus factor, high risk of burnout, where are checks, measures and accountability?&lt;/li&gt;
&lt;li&gt;have more than five maintainers - (who are all these people? How do they decide what to approve or not?)&lt;/li&gt;
&lt;li&gt;don't have a history of being updated regularly&lt;/li&gt;
&lt;li&gt;don't have any obvious source of funding and don't ask for any&lt;/li&gt;
&lt;li&gt;aren't backed by an organisation I trust&lt;/li&gt;
&lt;li&gt;don't have a security policy (eg. security tab in GitHub)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I have more &lt;em&gt;Opinions&lt;/em&gt; on this one, but they go broader than the scope of this post. I'll pop that on my backlog for a future post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Updating Dependencies Automatically
&lt;/h2&gt;

&lt;p&gt;I think it's fair to say that keeping your dependencies up to date is not an industry standard practice[^1]. Tools like Pipenv, Poetry and the like default you to locking the exact version of every dependency, and their dependencies, and so on. They instruct you to commit these lockfiles to source control without mentioning the drawbacks. Unless you go and run special commands to update them and then commit those changes, your app will be frozen in time, accumulating vulnerabilities that you won't even know about unless you're scanning them for vulnerabilities.&lt;/p&gt;

&lt;p&gt;Another 👍 for &lt;code&gt;pip&lt;/code&gt; which does not lock by default. If you look at any of my more recent Python repositories, you'll find minimum-bound version constraints, along with builds and IDE support for automatically updating versions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you have to use a tool that creates lockfiles for now, you can &lt;a href="https://www.git-tower.com/learn/git/commands/git-rm" rel="noopener noreferrer"&gt;&lt;code&gt;git rm&lt;/code&gt; them&lt;/a&gt;, then add them to your &lt;code&gt;.gitgnore&lt;/code&gt; file to have Git ignore them going forward. That cuts out the need to commit updates back and so simplifies updating.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Example: dbt_bigquery_template
&lt;/h3&gt;

&lt;p&gt;I have exactly one dependency in &lt;a href="https://github.com/brabster/dbt_bigquery_template/blob/2af5ffd769ec698757847e0366aa00aea984b94e/requirements.txt#L1" rel="noopener noreferrer"&gt;dbt_bigquery_template&lt;/a&gt;, which is currently this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dbt-bigquery&amp;gt;=1.7.0&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This translates as "get me the latest &lt;strong&gt;release&lt;/strong&gt; of &lt;code&gt;dbt-bigquery&lt;/code&gt; that's no older than &lt;code&gt;1.7.0&lt;/code&gt;". The low bound ensures that I know things will blow up if somehow I get an older dependency than the last one I had reason to look at (&lt;code&gt;1.7.0&lt;/code&gt; in this case). If there's a problem with the latest version, and I can't fix it right now, I can still pin to the previous working version temporarily - but I've found this is a rare exception rather than a fatiguing everyday occurrence.&lt;/p&gt;

&lt;p&gt;Unlike some other ecosystems I've had the misfortune of needing to work with, &lt;code&gt;pip&lt;/code&gt; has a wonderful feature in &lt;a href="https://pip.pypa.io/en/stable/cli/pip_install/#pre-release-versions" rel="noopener noreferrer"&gt;requiring an explicit flag &lt;code&gt;--pre&lt;/code&gt; to include pre-release versions&lt;/a&gt;, so you won't get the technically-latest-but-unstable &lt;code&gt;1.8.0b2&lt;/code&gt; beta. Yay! The latest release is currently &lt;a href="https://libraries.io/pypi/dbt-bigquery/1.7.7" rel="noopener noreferrer"&gt;dbt-bigquery 1.7.7&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  In the IDE
&lt;/h4&gt;

&lt;p&gt;When I open &lt;code&gt;dbt_bigquery_template&lt;/code&gt; in VSCode, &lt;a href="https://github.com/brabster/dbt_bigquery_template/blob/2af5ffd769ec698757847e0366aa00aea984b94e/.vscode/tasks.json#L7" rel="noopener noreferrer"&gt;the init-and-update task&lt;/a&gt; automatically kicks off and runs a series of commands updating different kinds of dependencies including &lt;a href="https://github.com/brabster/dbt_bigquery_template/blob/2af5ffd769ec698757847e0366aa00aea984b94e/.dev_scripts/init_and_update.sh#L23" rel="noopener noreferrer"&gt;&lt;code&gt;pip install -U -r ${PROJECT_DIR}/requirements.txt&lt;/code&gt;&lt;/a&gt;. &lt;a href="https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-U" rel="noopener noreferrer"&gt;&lt;code&gt;-U&lt;/code&gt; means &lt;code&gt;--upgrade&lt;/code&gt;&lt;/a&gt; and updates any dependencies you've already got to their latest versions if newer versions are available.&lt;/p&gt;

&lt;h4&gt;
  
  
  In the Build
&lt;/h4&gt;

&lt;p&gt;The build for &lt;code&gt;dbt_bigquery_template&lt;/code&gt; does the same thing. &lt;a href="https://github.com/brabster/dbt_bigquery_template/blob/2af5ffd769ec698757847e0366aa00aea984b94e/.github/actions/setup_dbt/action.yml#L18" rel="noopener noreferrer"&gt;This line in the workflow is basically the same as the line in the VSCode task script&lt;/a&gt;. Assuming you close and re-open VSCode at least once each day, that means your development environment and your build or workflow management system are both within a few hours of latest and one another at any given point in time. You don't need to freeze the world forever to avoid "works-on-my-machine" problems.&lt;/p&gt;

&lt;p&gt;There's a lot more to say about automatically updating dependencies, but I'll keep things minimal for this post.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Least-Bad Solution
&lt;/h3&gt;

&lt;p&gt;First - in my opinion, this is the least bad approach, not a perfect solution. The big risk - if something you're depending on &lt;strong&gt;becomes&lt;/strong&gt; bad, you'll pick it up automatically. I think vulnerabilities that we do and do not know about in old software are a much bigger risk. Updating automatically you get all the security fixes straight away, at no time cost to your team, before the manual updaters have had a chance to realise there's a problem, prioritise the work to update, decide whether it needs fixing, and get around to dealing with it.&lt;/p&gt;

&lt;p&gt;I take it as validating that regulators are increasingly calling for timely and automatic updates everywhere from IoT devices to phones, servers and so on - so staying up to date seems to be generally accepted as the better position to be in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Works In Practice
&lt;/h3&gt;

&lt;p&gt;I've used this approach for several months or so with multiple teams collaborating over multiple repositories and I can't recall any significant problems. The main inconvenience that occurs is those rare occasions when a dependency lets a breaking change through, which you find out about the next day.&lt;/p&gt;

&lt;p&gt;I see this as a feature, not a bug. These kinds of breaks aren't subtle. Any sort of automated build process or your orchestration tooling is going to notice when version conflicts can't be resolved, an API change prevents your tests from running or the maintainers change something that your permissions don't let you do. I maybe wouldn't pick the most mission-critical thing I could find to try auto-updating for the first time!&lt;/p&gt;

&lt;p&gt;Oh - and yes, setting the constraint to allow semantic-versioning-major "breaking changes" through is by design, not accidental. I'd rather find out about a breaking change that actually affects me when it happens, not months later with a critical vulnerability to fix and no update path except through the breaking version. In my experience the ideals of &lt;a href="https://semver.org/spec/v2.0.0.html" rel="noopener noreferrer"&gt;semver 2.0.0&lt;/a&gt; and reality don't really line up all that well - yet another post for another day.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about renovate and dependabot?
&lt;/h3&gt;

&lt;p&gt;When I talk with someone about automatically updating, automated PR-raisers often come up. I haven't used either tool myself and I avoid them. If I'm keeping up to date and not committing every update back to source control, I don't need a tool raising PRs to help me manage the relentless torrent of vulnerability notices because I don't have that problem. Plus, &lt;a href="https://checkmarx.com/blog/surprise-when-dependabot-contributes-malicious-code/" rel="noopener noreferrer"&gt;they're not immune from expoitation themselves&lt;/a&gt;. Another supply chain bites the dust 👋&lt;/p&gt;

&lt;h3&gt;
  
  
  Increased Awareness
&lt;/h3&gt;

&lt;p&gt;It's very useful to know when a breaking change just landed on you. You can deal with it while it's fresh before the work has had any chance to pile up. You get to understand how reliable your dependencies &lt;strong&gt;really&lt;/strong&gt; are - perhaps a dependency that keeps breaking you isn't so trustworthy after all? Most importantly - you don't find out you've got multiple breaking changes in the way when your scans alert you to a critical must-fix vulnerability in the two-year-old version of that dependency you haven't updated.&lt;/p&gt;

&lt;p&gt;Speaking of which...&lt;/p&gt;

&lt;h2&gt;
  
  
  Scanning for Vulnerabilities
&lt;/h2&gt;

&lt;p&gt;Tools like &lt;a href="https://safetycli.com/product/safety-cli" rel="noopener noreferrer"&gt;safetycli&lt;/a&gt; and &lt;a href="https://snyk.io/" rel="noopener noreferrer"&gt;Snyk&lt;/a&gt; will scan your installed dependencies and tell you whether there are any known vulnerabilities in there. You'll see my use of the safetycli Python package to scan dependencies as part of my init-and-update process both &lt;a href="https://github.com/brabster/dbt_bigquery_template/blob/2af5ffd769ec698757847e0366aa00aea984b94e/.dev_scripts/init_and_update.sh#L29" rel="noopener noreferrer"&gt;locally&lt;/a&gt; and in &lt;a href="https://github.com/brabster/dbt_bigquery_template/blob/2af5ffd769ec698757847e0366aa00aea984b94e/.github/actions/setup_dbt/action.yml#L23" rel="noopener noreferrer"&gt;build and workflow infrastructure&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;They both offer free single-developer plans at the time of writing for your personal and side-projects and paid plans for teams and enterprises. There are other options too - &lt;a href="https://docs.gitlab.com/ee/user/application_security/dependency_scanning/#python" rel="noopener noreferrer"&gt;GitLab, for example, provides some vulnerability scanning tooling&lt;/a&gt;, so check what your organisation might already be using for a potentially easy option.&lt;/p&gt;

&lt;p&gt;I wrote a little about how and when to check your dependencies back in &lt;a href="//../2020-12-03-dependency-checking/index.md"&gt;Checking your Depenedencies&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Safety Usage Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;safety check
&lt;span class="go"&gt;...snip...
  Using open-source vulnerability database
  Found and scanned 54 packages
  Timestamp 2024-05-02 21:13:23
  0 vulnerabilities reported
  0 vulnerabilities ignored
+============+

 No known security vulnerabilities reported. 

+============+

  Safety is using PyUp's free open-source vulnerability database. This data is 30 days old and limited. 
  For real-time enhanced vulnerability data, fix recommendations, severity reporting, cybersecurity support, team and project policy management and more sign up at
https://pyup.io or email sales@pyup.io
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scanning vs. Automatically Updating?
&lt;/h3&gt;

&lt;p&gt;I scan my dependencies after installing the latest versions. As I mentioned before, auto-updating is not a perfect solution, and I could still have a vulnerability even after updating - for example, a known issue with no fix available. We seem to be getting pretty good at responsible disclosure of late. It seems much more rare that I get a vulnerability notice that's not already fixed - auto-updating means I'll already have the fix installed by the time the alert would have landed in my inbox.&lt;/p&gt;

&lt;p&gt;If I do end up in that situation, my options are limited. Try to fix it myself? Not very safe nor likely to be feasible. Add an ignore? I haven't had to do this for a while but &lt;a href="https://docs.safetycli.com/safety-docs/administration/safety-policy-files" rel="noopener noreferrer"&gt;safety, for example, looks to have better support now for expiring ignores&lt;/a&gt; than last time I had to do it. Lastly - shut the thing down until a fix is available. It's worth considering in a pinch, particularly if the software or pipeline isn't all that time- or mission-critical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Least-Privilege Credentials
&lt;/h2&gt;

&lt;p&gt;It's been a long read to here, so I'll briefly mention a couple more points and wrap up.&lt;/p&gt;

&lt;p&gt;Try to avoid having powerful credentials lying around, or using more powerful credentials than are needed for a job. Partitioning your development environments helps, by letting you reduce the variety of credentials you have in the same place. My experience with Codespaces shines here as I can restrict a repository to exactly the permissions it needs with nothing else lying around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Controls
&lt;/h2&gt;

&lt;p&gt;There are powerful controls in Cloud infrastructures to limit cost exposure and where data can be copied. I've been looking around at Google Cloud and &lt;a href="//../2024-02-16-bigquery-quotas/index.md"&gt;quotas are cost controls far more proactive and effective than billing alarms&lt;/a&gt;. &lt;a href="https://cloud.google.com/vpc-service-controls/docs/service-perimeters" rel="noopener noreferrer"&gt;VPC Service Perimeters&lt;/a&gt; can prevent data from being transferred outside your organisation. Both are effectively disabled by default and not straightforward to use, but I'm building my understanding and will share some pragmatic advice when I have some.&lt;/p&gt;

&lt;h2&gt;
  
  
  Templating
&lt;/h2&gt;

&lt;p&gt;A lot is going on here, and you'll hit issues as you work with more repositories. I start small and build up. &lt;a href="https://github.com/brabster/dbt_bigquery_template" rel="noopener noreferrer"&gt;dbt_bigquery_template&lt;/a&gt; is one way I'm speeding up and improving consistency in my dbt projects (and there's a &lt;a href="https://youtu.be/KQg6D0Mkyks?feature=shared" rel="noopener noreferrer"&gt;short walkthrough video&lt;/a&gt; that touches on some of the content here) I've also had some interesting success using &lt;a href="https://git-scm.com/book/en/v2/Git-Tools-Submodules" rel="noopener noreferrer"&gt;git submodules&lt;/a&gt; to reuse scripts and so on across multiple repositories and teams in a fairly effortless manner, which I'll write about another day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap
&lt;/h2&gt;

&lt;p&gt;I'll wrap up by saying that whilst this post focuses on Python packages, the risks and ideas apply to most if not all of the &lt;strong&gt;other&lt;/strong&gt; supply chains I'm exposed to. Off the top of my head - Operating system. Firmare. Application software. IDE plugins. Browser plugins. dbt packages. The list goes on and on. Any of these things might be able to get up to the kind of mischief we've been talking about, so stay vigilant and think before installing. I hope the content here helps with the thinking part!&lt;/p&gt;

&lt;p&gt;If you got this far - wow, well done, and best of luck. I'd love any feedback - including anything I got wrong or didn't make sense, or I should have covered! There's some information about how to get in touch here.&lt;/p&gt;

</description>
      <category>operations</category>
      <category>python</category>
      <category>security</category>
    </item>
  </channel>
</rss>
