<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jag Thind</title>
    <description>The latest articles on DEV Community by Jag Thind (@jag_t8490d471b36c6bb).</description>
    <link>https://dev.to/jag_t8490d471b36c6bb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3137979%2Fdf6e43a3-9ef1-4cb3-91b7-881846b275ac.png</url>
      <title>DEV Community: Jag Thind</title>
      <link>https://dev.to/jag_t8490d471b36c6bb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jag_t8490d471b36c6bb"/>
    <language>en</language>
    <item>
      <title>Setting Up a Robust Local DevX for Snowflake Python Development</title>
      <dc:creator>Jag Thind</dc:creator>
      <pubDate>Fri, 27 Feb 2026 17:04:28 +0000</pubDate>
      <link>https://dev.to/superpayments/setting-up-a-robust-local-devx-for-snowflake-python-development-12pb</link>
      <guid>https://dev.to/superpayments/setting-up-a-robust-local-devx-for-snowflake-python-development-12pb</guid>
      <description>&lt;p&gt;In the evolving world of data engineering, developing Python-based workloads in Snowflake (via &lt;a href="https://docs.snowflake.com/en/developer-guide/snowpark/index" rel="noopener noreferrer"&gt;Snowpark&lt;/a&gt;, Python UDFs, or Stored Procedures) has become increasingly popular. However, as pipelines become more complex, a critical question arises: &lt;strong&gt;How should we develop and maintain our Python code for Snowflake?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While the convenience of browser-based editors like Snowflake Workspaces is fine for quick scripts, there is a significant "Developer Experience (DevX) Gap" that emerges when you try to build production-grade Python code in a browser tab.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm writing this blog
&lt;/h2&gt;

&lt;p&gt;I've seen many Data Engineers and Analytics Engineers fall into the "UI Trap" of writing complex Python logic directly in Snowflake, only to struggle with inconsistent environments, broken dependencies, and the frustration of "it works on my machine, but not on others" problems. This blog is born out of a desire to share a better way.&lt;/p&gt;

&lt;p&gt;My goal is to encourage people to step out of the browser and into a professional local development environment. By establishing repeatable local dev environments where every developer uses the same Python version, the same dependencies, and the same tooling, we can build Python-based features that are not just functional and robust, but most importantly maintainable by others.&lt;/p&gt;

&lt;p&gt;One aspect of democratizing data rich features in a product is by making it easier to develop and maintain code using consistent tools. This is why we need to focus on local DevX!&lt;/p&gt;

&lt;h3&gt;
  
  
  What we'll cover
&lt;/h3&gt;

&lt;p&gt;We will explore the merits of a local-first approach to Snowflake Python development, specifically focusing on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic Python versions&lt;/strong&gt; with &lt;code&gt;pyenv&lt;/code&gt; and &lt;code&gt;.python-version&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robust dependency management&lt;/strong&gt; with &lt;code&gt;Poetry&lt;/code&gt; and &lt;code&gt;pyproject.toml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent tooling&lt;/strong&gt; configured in a single file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplified task execution&lt;/strong&gt; with &lt;code&gt;Poe the Poet&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python version management with pyenv
&lt;/h2&gt;

&lt;p&gt;On macOS &lt;a href="https://github.com/pyenv/pyenv" rel="noopener noreferrer"&gt;pyenv&lt;/a&gt; is a tool for managing multiple Python versions. It allows you to install and switch between different Python versions on a per-project basis by creating a &lt;code&gt;.python-version&lt;/code&gt; file in the project root.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters for DevX:&lt;/strong&gt; By pinning the Python version in version control, you ensure that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every developer uses the same Python version for the project.&lt;/li&gt;
&lt;li&gt;Your CI/CD pipeline can install the exact same version.&lt;/li&gt;
&lt;li&gt;You avoid subtle bugs that arise from Python version differences.&lt;/li&gt;
&lt;li&gt;Dependencies work consistently (some packages require specific Python versions).&lt;/li&gt;
&lt;li&gt;Debugging is easier when issues are reproducible across all environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Install &lt;code&gt;pyenv&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;pyenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a &lt;code&gt;.python-version&lt;/code&gt; file in the project root:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the Python version specified in the &lt;code&gt;.python-version&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pyenv &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Verify the desired Python version is installed and is set for the project:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pyenv version
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Dependency management with Poetry and the &lt;code&gt;pyproject.toml&lt;/code&gt; file
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://python-poetry.org/docs" rel="noopener noreferrer"&gt;Poetry&lt;/a&gt; is a tool for dependency management and packaging in Python. It allows you to declare the packages your project depends on and it will manage (install/update) them for you. Poetry offers a lockfile to ensure repeatable installs, and can build your project for distribution.&lt;/p&gt;

&lt;p&gt;It uses the &lt;code&gt;pyproject.toml&lt;/code&gt; file (which we'll explore next) as its source of truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters for DevX:&lt;/strong&gt; With &lt;code&gt;pyproject.toml&lt;/code&gt; and Poetry, you've eliminated the "works on my machine, but not on others" problem at the dependency level. Every developer and every CI/CD runner will install the exact same versions of every package, every time!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installing Poetry&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Install Poetry using Homebrew:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;poetry
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure Poetry to create virtual environments in the project directory (recommended for better DevX). This ensures that when you run &lt;code&gt;poetry install&lt;/code&gt;, it creates a &lt;code&gt;.venv&lt;/code&gt; folder directly in the project, making it easy to activate and manage:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry config virtualenvs.in-project &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The &lt;code&gt;pyproject.toml&lt;/code&gt; file
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pyproject.toml&lt;/code&gt; file is a &lt;a href="https://peps.python.org/pep-0518/" rel="noopener noreferrer"&gt;PEP 518&lt;/a&gt; standard that replaces the need for multiple configuration files (&lt;code&gt;setup.py&lt;/code&gt;, &lt;code&gt;requirements.txt&lt;/code&gt;, &lt;code&gt;setup.cfg&lt;/code&gt;, etc.) with one unified file. It uses &lt;a href="https://toml.io/en/" rel="noopener noreferrer"&gt;TOML (Tom's Obvious, Minimal Language)&lt;/a&gt; format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Single source of truth:&lt;/strong&gt; All project configuration lives in one file.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Version constraints:&lt;/strong&gt; You can specify package versions according to Poetry's &lt;a href="https://python-poetry.org/docs/dependency-specification/#version-constraints" rel="noopener noreferrer"&gt;dependency specification and version constraints&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Deterministic builds:&lt;/strong&gt; Poetry generates a &lt;code&gt;poetry.lock&lt;/code&gt; file that pins every dependency—both direct (what you specify) and transitive (dependencies of your dependencies)—ensuring identical installs across environments.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Tool configuration:&lt;/strong&gt; You can configure multiple tools in the same file (no need for separate config files).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example &lt;code&gt;pyproject.toml&lt;/code&gt; file:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[project]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"PROJECT_NAME"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.0"&lt;/span&gt;
&lt;span class="py"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"PROJECT_DESCRIPTION"&lt;/span&gt;
&lt;span class="py"&gt;authors&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="py"&gt;{name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;"YOUR_NAME",email&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"youremail@domain.com"&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;readme&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"README.md"&lt;/span&gt;

&lt;span class="c"&gt;# Production dependencies that your code needs to run.&lt;/span&gt;
&lt;span class="nn"&gt;[tool.poetry.dependencies]&lt;/span&gt;
&lt;span class="py"&gt;python&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;"&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;3.11&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="py"&gt;snowflake-snowpark-python&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.33.0"&lt;/span&gt; &lt;span class="c"&gt;# Snowflake Snowpark Python library&lt;/span&gt;
&lt;span class="py"&gt;pydantic&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2.11.7"&lt;/span&gt;                  &lt;span class="c"&gt;# Data validation library in Python&lt;/span&gt;

&lt;span class="c"&gt;# Development-only tools that aren't needed in production.&lt;/span&gt;
&lt;span class="nn"&gt;[tool.poetry.group.dev.dependencies]&lt;/span&gt;
&lt;span class="py"&gt;black&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^23.0.0"&lt;/span&gt;           &lt;span class="c"&gt;# Code formatter&lt;/span&gt;
&lt;span class="py"&gt;pylint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^3.0.0"&lt;/span&gt;           &lt;span class="c"&gt;# Linter for code quality&lt;/span&gt;
&lt;span class="py"&gt;isort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^5.13.2"&lt;/span&gt;           &lt;span class="c"&gt;# Import statement organiser&lt;/span&gt;
&lt;span class="py"&gt;poethepoet&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.27.0"&lt;/span&gt;      &lt;span class="c"&gt;# Task runner for simplifying development tasks&lt;/span&gt;
&lt;span class="py"&gt;pytest&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^8.1.2"&lt;/span&gt;           &lt;span class="c"&gt;# Testing framework for Python&lt;/span&gt;
&lt;span class="py"&gt;pytest-xdist&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^3.0.0"&lt;/span&gt;     &lt;span class="c"&gt;# Run tests in parallel for faster execution&lt;/span&gt;
&lt;span class="py"&gt;pytest-cov&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^5.0.0"&lt;/span&gt;       &lt;span class="c"&gt;# Generate code coverage reports&lt;/span&gt;

&lt;span class="nn"&gt;[build-system]&lt;/span&gt;
&lt;span class="py"&gt;requires&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;["poetry-core&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;"]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="py"&gt;build-backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"poetry.core.masonry.api"&lt;/span&gt;

&lt;span class="c"&gt;# Configure all your tools&lt;/span&gt;

&lt;span class="nn"&gt;[tool.black]&lt;/span&gt;
&lt;span class="py"&gt;line-length&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;
&lt;span class="py"&gt;target-version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'py310'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[tool.isort]&lt;/span&gt;
&lt;span class="py"&gt;profile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"black"&lt;/span&gt;
&lt;span class="py"&gt;multi_line_output&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="nn"&gt;[tool.pylint]&lt;/span&gt;
&lt;span class="py"&gt;max-line-length&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;
&lt;span class="py"&gt;fail-under&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;9.5&lt;/span&gt;

&lt;span class="c"&gt;# Configure tasks for Poe the Poet&lt;/span&gt;
&lt;span class="nn"&gt;[tool.poe.tasks]&lt;/span&gt;
&lt;span class="c"&gt;# Private tasks (prefixed with _ to hide from the help menu)&lt;/span&gt;
&lt;span class="py"&gt;_format_black&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"black ."&lt;/span&gt;
&lt;span class="py"&gt;_format_isort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"isort ."&lt;/span&gt;
&lt;span class="py"&gt;_pylint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"pylint src/"&lt;/span&gt;

&lt;span class="c"&gt;# Public tasks that compose the individual tools&lt;/span&gt;
&lt;span class="py"&gt;format&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"_format_black"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"_format_isort"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;lint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"_pylint"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;test&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"pytest --cov -vv"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Installing dependencies&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once your &lt;code&gt;pyproject.toml&lt;/code&gt; is set up, installing all dependencies (including dev dependencies) is a single command. It will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a virtual environment (in &lt;code&gt;.venv&lt;/code&gt; if you configured Poetry to do so).&lt;/li&gt;
&lt;li&gt;Install all dependencies (including dev dependencies).&lt;/li&gt;
&lt;li&gt;Generate or update &lt;code&gt;poetry.lock&lt;/code&gt; to ensure reproducible installs across environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For new projects where you haven't written code yet, you'll need to use the &lt;code&gt;--no-root&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-root&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;--no-root&lt;/code&gt; is needed initially:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you first create a project manually or with &lt;code&gt;poetry init&lt;/code&gt;, Poetry assumes you're building a package. If you run &lt;code&gt;poetry install&lt;/code&gt; without any code, you'll get an error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Installing the current project: example-project (0.1.0)
Error: The current project could not be installed: No file/folder found for package example-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--no-root&lt;/code&gt; flag tells Poetry to skip installing your project as a package and only install the dependencies you've specified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you won't need &lt;code&gt;--no-root&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once you've written code and added a &lt;code&gt;packages&lt;/code&gt; section to your &lt;code&gt;pyproject.toml&lt;/code&gt; file like the example below, you can use the standard &lt;code&gt;poetry install&lt;/code&gt; command (without &lt;code&gt;--no-root&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.poetry]&lt;/span&gt;
&lt;span class="py"&gt;packages&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;[{include&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;YOUR_PACKAGE_NAME&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;from&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"src"&lt;/span&gt;&lt;span class="err"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuring VS Code (optional):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To use the project's virtual environment in VS Code / Cursor for IntelliSense, debugging, and running code in the IDE:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Press &lt;code&gt;Cmd+Shift+P&lt;/code&gt; (or &lt;code&gt;Ctrl+Shift+P&lt;/code&gt; on Windows/Linux)&lt;/li&gt;
&lt;li&gt;Type "Python: Select Interpreter"&lt;/li&gt;
&lt;li&gt;Select "Enter interpreter path"&lt;/li&gt;
&lt;li&gt;Enter the path to your project's virtual environment: &lt;code&gt;./&amp;lt;PROJECT_ROOT&amp;gt;/.venv/bin/python&lt;/code&gt; (adjust the path to match your project location)&lt;/li&gt;
&lt;li&gt;VS Code will now use the same Python environment as Poetry, giving you access to all installed packages and proper code completion.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Poe the Poet: Simplifying development tasks
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://poethepoet.natn.io/" rel="noopener noreferrer"&gt;Poe the Poet&lt;/a&gt; is a task runner that lets you define common development commands in your &lt;code&gt;pyproject.toml&lt;/code&gt; file. Instead of remembering long commands like &lt;code&gt;poetry run black . &amp;amp;&amp;amp; poetry run isort . &amp;amp;&amp;amp; poetry run pylint src/&lt;/code&gt;, you can create a simple alias and run &lt;code&gt;poetry run poe lint&lt;/code&gt;. See the &lt;code&gt;[tool.poe.tasks]&lt;/code&gt; section in the example &lt;code&gt;pyproject.toml&lt;/code&gt; file above for the configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consistency:&lt;/strong&gt; Everyone on your team uses the same commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity:&lt;/strong&gt; &lt;code&gt;poe lint&lt;/code&gt; instead of remembering multiple flags&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composability:&lt;/strong&gt; Chain tasks together (e.g., &lt;code&gt;lint&lt;/code&gt; runs &lt;code&gt;format&lt;/code&gt; then &lt;code&gt;pylint&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation:&lt;/strong&gt; Tasks are self-documenting in &lt;code&gt;pyproject.toml&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;You now have a solid foundation for local Snowflake Python development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Deterministic Python versions&lt;/strong&gt; with &lt;code&gt;pyenv&lt;/code&gt; and &lt;code&gt;.python-version&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Robust dependency management&lt;/strong&gt; with &lt;code&gt;Poetry&lt;/code&gt; and &lt;code&gt;pyproject.toml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Consistent tooling&lt;/strong&gt; configured in a single file&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Simplified task execution&lt;/strong&gt; with &lt;code&gt;Poe the Poet&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup eliminates the "it works on my machine, but not on others" problem at its source. Every developer on your team will have the exact same environment, the same dependencies, and the same tooling automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The DevX payoff:&lt;/strong&gt; By investing in these foundations, you're not just setting up tools, you're creating an environment where Data Engineers can focus on building features instead of fighting with configuration. This is how we democratize data development.&lt;/p&gt;

&lt;p&gt;I hope you find this guide helpful. If you have questions or feedback, I'd love to hear from you!&lt;/p&gt;

</description>
      <category>python</category>
      <category>dataengineering</category>
      <category>snowflake</category>
    </item>
    <item>
      <title>How We Use OpenAI and Gemini Batch APIs to Qualify Thousands of Sales Leads</title>
      <dc:creator>Jag Thind</dc:creator>
      <pubDate>Tue, 09 Sep 2025 11:38:32 +0000</pubDate>
      <link>https://dev.to/superpayments/how-we-use-openai-and-gemini-batch-apis-to-qualify-thousands-of-sales-leads-2knk</link>
      <guid>https://dev.to/superpayments/how-we-use-openai-and-gemini-batch-apis-to-qualify-thousands-of-sales-leads-2knk</guid>
      <description>&lt;p&gt;The following blog details how the Data team used AI to solve a specific problem for our Marketing and Sales teams - &lt;strong&gt;Qualify 3000 websites (Salesforce Accounts) to determine if they are ecommerce and can take payments.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is broken down into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem at hand and what are we trying to solve?&lt;/li&gt;
&lt;li&gt;Process design&lt;/li&gt;
&lt;li&gt;Why use LLMs from 2 AI providers&lt;/li&gt;
&lt;li&gt;Prompt engineering and using prompt templates&lt;/li&gt;
&lt;li&gt;Scaling up using OpenAI batch API and google batch predictions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;We implemented a batch data enrichment pipeline that uses OpenAI and Gemini Large Large Models (LLMs) via the &lt;a href="https://platform.openai.com/docs/guides/batch" rel="noopener noreferrer"&gt;OpenAI Batch API&lt;/a&gt; and &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini" rel="noopener noreferrer"&gt;Google Batch Predictions&lt;/a&gt; for a cost effective way to enrich data using the power of LLMs.&lt;/p&gt;

&lt;p&gt;To ensure maximum accuracy and minimise the effects of hallucinations from the LLMs, we use a simple consensus system: each website is checked by both AIs &lt;strong&gt;3 times each&lt;/strong&gt;, and only results where they agree are accepted. Yes, this makes it more expensive, but we optimised for &lt;em&gt;time to value&lt;/em&gt; and getting good leads into the hands of the Sales team.&lt;/p&gt;

&lt;p&gt;We used a prompt template configured to use the &lt;a href="https://platform.openai.com/docs/guides/tools-web-search" rel="noopener noreferrer"&gt;web search tool&lt;/a&gt; to ground the LLM with real-time information about the website, overcoming the model's static knowledge cutoff date.&lt;/p&gt;

&lt;p&gt;We trained the Marketing team in writing effective prompts for the LLMs before we scaled up using the batch mode.&lt;/p&gt;

&lt;p&gt;A great example of tech and the business working together to achieve a shared outcome and spreading the use of AI in the business.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem at Hand
&lt;/h2&gt;

&lt;p&gt;The Marketing team periodically builds lists of potential merchants that can integrate Super as a payment method on their website checkout. These leads are then provided to Account Executives (AEs) to sign up.&lt;/p&gt;

&lt;p&gt;When assigned a website the first thing AEs do is manually double check &lt;em&gt;is the website ecommerce&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you buy products on the website?&lt;/li&gt;
&lt;li&gt;Is there a checkout on the website?&lt;/li&gt;
&lt;li&gt;Does it accept card payments?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚠️ Many websites were not ecommerce ⚠️ resulting in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AEs wasting time doing manual checking&lt;/li&gt;
&lt;li&gt;Many leads getting dis-qualified at the top of the sales funnel&lt;/li&gt;
&lt;li&gt;AEs getting frustrated with leads they were assigned&lt;/li&gt;
&lt;li&gt;AEs resorting to self-sourcing leads and taking them away from their core responsibilities of closing deals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What are we trying to solve?
&lt;/h3&gt;

&lt;p&gt;Questions we asked ourselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can we increase the number of leads at the top of the sales funnel?&lt;/li&gt;
&lt;li&gt;Can we automate the &lt;em&gt;is ecommerce&lt;/em&gt; check instead of manually qualifying each website?&lt;/li&gt;
&lt;li&gt;Can we scale this check across &lt;em&gt;N&lt;/em&gt; (hundreds/thousands) websites?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Process Design
&lt;/h2&gt;

&lt;p&gt;Before we dive into the details of Prompt Engineering and how the Batch pipeline works. The below illustrates the process and its 2 parts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffa17ev2njumarmfekk2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffa17ev2njumarmfekk2a.png" alt="Process Design 1" width="648" height="764"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dnzo4lipqpmf9ylv7e8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dnzo4lipqpmf9ylv7e8.png" alt="Process Design 2" width="800" height="734"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why use LLMs from 2 AI Providers?
&lt;/h2&gt;

&lt;p&gt;Even though it was more costly to do so, we needed to be confident in the accuracy of what we were telling the AEs in the Sales team. Instead of relying on a single AI, we used LLMs from two different AI providers, then based our final decision on their consensus.&lt;/p&gt;

&lt;p&gt;Think of it like getting a second opinion from a trusted expert. If two independent specialists examine the same data and come to the same conclusion, your confidence in that outcome increases dramatically.&lt;/p&gt;

&lt;p&gt;Some benefits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Accuracy Through Consensus&lt;/code&gt;: The core of our strategy is built on consensus. An ecommerce qualification is only confirmed if both LLMs independently agree. This simple but powerful rule acts as a powerful filter, significantly reducing the risk of a single LLM making a mistake, hallucinating, or misinterpreting a site.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Mitigating Model-Specific Weaknesses&lt;/code&gt;: Every LLM has its own unique architecture, training data, and inherent biases. One LLM might be brilliant at identifying traditional retail sites but struggle with subscription services, while the other might have the opposite strengths. Using a single LLM means you also inherit all of its blind spots. By using two, we diversify our "cognitive portfolio," allowing the strengths of one LLM to compensate for the weaknesses of the other, leading to a more balanced and consistently accurate outcome.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Automatic Quality Control&lt;/code&gt;: Perhaps the most valuable benefit is what happens when the LLMs disagree. A disagreement is a critical signal. It tells us that a website is ambiguous, an edge case, or complex in a way that could have easily fooled a single AI. Our system automatically flags these disagreements for manual review.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;Prompt engineering is the process of writing effective instructions for a LLM, such that it consistently generates content that meets your requirements.&lt;/p&gt;

&lt;p&gt;We used the &lt;a href="https://platform.openai.com/" rel="noopener noreferrer"&gt;OpenAI developer platform&lt;/a&gt; to iteratively develop a &lt;a href="https://platform.openai.com/docs/guides/text?api-mode=responses#reusable-prompts" rel="noopener noreferrer"&gt;reusable prompt&lt;/a&gt; template that could be used in the &lt;a href="https://platform.openai.com/docs/api-reference/responses/create" rel="noopener noreferrer"&gt;responses API&lt;/a&gt;. The platform allows testing different versions of a prompt side-by-side to evaluate changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantages of using a Prompt Template
&lt;/h3&gt;

&lt;p&gt;You can use variables via &lt;code&gt;{{placeholder}}&lt;/code&gt; and your integration code remains the same, e.g.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = client.responses.create(
    model="gpt-4.1",
    prompt={
        "id": "pmpt_abc123",
        "version": "2",
        "variables": {
            "website_url": "xyz.com"
        }
    }
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also configure the prompt to use the &lt;a href="https://platform.openai.com/docs/guides/tools-web-search" rel="noopener noreferrer"&gt;web search tool&lt;/a&gt; to allow the LLM to search the web for the latest information before generating a response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "type": "web_search_preview",
    "user_location": {
        "type": "approximate",
        "country": "GB",
        "search_context_size": "high",
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Prompt Template
&lt;/h3&gt;

&lt;p&gt;The Marketing team produced a prompt template that had clear instructions for the LLM to check if a &lt;em&gt;single&lt;/em&gt; website URL is ecommerce.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please research the website {{url}} provided by the user. You must only return the data requested in the "InformationRequested" section and in a format according to the "OutputFormat" section. Do not include any explanations, reasoning, or commentary.

## InformationRequested
- url: {{url}}
- is_url_valid: Y/N — Is the URL valid and accessible?
- is_ecommerce: Y/N - You MUST use rules from section "Evaluation Rules for column is_ecommerce"

## OutputFormat
Output as JSON with the following fields. Do not include markdown around the JSON:
- url
- is_url_valid
- is_ecommerce

## Evaluation Rules for column is_ecommerce

*Mark "Y" only if all of the following are true, based on explicit evidence available*:
* rule 1
* rule 2
* etc

*Mark "N" in any of the following cases*:
* rule 1
* rule 2
* etc

## Final Reminder

- You must only return the data requested in the "InformationRequested" section.
- You must only return it in the format according to the "OutputFormat" section.
- You must not include any explanations, reasoning, or commentary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Scaling it up - OpenAI Batch API
&lt;/h2&gt;

&lt;p&gt;OpenAI has a &lt;a href="https://platform.openai.com/docs/guides/batch" rel="noopener noreferrer"&gt;Batch API&lt;/a&gt; that allows you to send asynchronous groups of requests with 50% lower costs, a separate pool of significantly higher rate limits, and a clear 24-hour turnaround time. The workflow is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feg4da93exwhgb5py4eu2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feg4da93exwhgb5py4eu2.png" alt="OpenAI Batch API Workflow" width="800" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The uploaded batch file containing the requests will have one line per website as below,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"custom_id": "request-[1756480801.159196]-xyz.com", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1", "input": "Run the following prompt", "prompt": {"id": "pmpt_XXX", "version": "2", "variables": {"url": "xyz.com"}}}}
{"custom_id": "request-[1756480802.1434196]-abc.com", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1", "input": "Run the following prompt", "prompt": {"id": "pmpt_XXX", "version": "2", "variables": {"url": "abc.com"}}}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benefits of this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Significant Cost Reduction&lt;/code&gt;: The 50% discount on pricing is a major advantage for processing thousands of URLs, leading to substantial cost savings compared to using the real-time API.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Increased Throughput&lt;/code&gt;: The much higher rate limits allow for processing a large volume of requests in parallel, drastically reducing the overall time it takes to enrich a large dataset.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Asynchronous "Fire-and-Forget" Workflow&lt;/code&gt;: You can submit a large batch job and not have to wait for it to complete. This is perfect for non-time-sensitive, offline processing tasks, as you can retrieve the results later without keeping a connection open.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Simplified Client-Side Logic&lt;/code&gt;: It removes the need for you to build and maintain complex logic to handle rate limiting, concurrent requests, and retries. You simply prepare and upload a file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Enhanced Resilience and Error Handling&lt;/code&gt;: Since requests are independent, the success or failure of one doesn't impact others. The output file clearly indicates the status of each request, making it easy to identify and retry only the failed ones.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Up to date context&lt;/code&gt;: The prompt template is configured to use the &lt;a href="https://platform.openai.com/docs/guides/tools-web-search" rel="noopener noreferrer"&gt;web search tool&lt;/a&gt; to ground the LLM with real-time information about the website. This search is performed independently for each website.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Scaling it up - Google Batch Predictions
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini" rel="noopener noreferrer"&gt;Google Batch Predictions&lt;/a&gt; also allows you to generate predictions from Gemini models using a &lt;em&gt;Batch Job&lt;/em&gt;, the workflow is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c0oxojcpbag5emls3lg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c0oxojcpbag5emls3lg.png" alt="Google Batch Predictions Workflow" width="800" height="116"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similar to OpenAI the batch job file contains one request per line, but you cannot use a prompt template, so each request in the file has the full prompt. Also web search tools in Gemini are not available via Batch Predictions, but we still found the results to be accurate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where we Ended Up
&lt;/h2&gt;

&lt;p&gt;We now have a repeatable way to enrich data using the power of LLMs for a large number of websites. We have already started using it to conduct other checks.&lt;/p&gt;

&lt;p&gt;The Salesforce Accounts we enriched with &lt;code&gt;is commerce = Y/N&lt;/code&gt; was used to create a better qualified list at the top of the sales funnel.&lt;/p&gt;

&lt;p&gt;AEs were no longer reporting the website as &lt;strong&gt;not&lt;/strong&gt; ecommerce.&lt;/p&gt;

&lt;p&gt;A job well done by AI and Humans!&lt;/p&gt;

</description>
      <category>openai</category>
      <category>gemini</category>
      <category>ai</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Improve DBT Incremental Performance on Snowflake using Custom Incremental Strategy</title>
      <dc:creator>Jag Thind</dc:creator>
      <pubDate>Thu, 29 May 2025 11:34:29 +0000</pubDate>
      <link>https://dev.to/superpayments/improve-dbt-incremental-performance-on-snowflake-using-custom-incremental-strategy-3ag3</link>
      <guid>https://dev.to/superpayments/improve-dbt-incremental-performance-on-snowflake-using-custom-incremental-strategy-3ag3</guid>
      <description>&lt;p&gt;The following presents how to improve the performance of the DBT built-in &lt;code&gt;delete-insert&lt;/code&gt; &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy" rel="noopener noreferrer"&gt;incremental strategy&lt;/a&gt; on &lt;strong&gt;snowflake&lt;/strong&gt; so we can control snowflake query costs. It is broken down into:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Defining the problem, with supporting performance statistics&lt;/li&gt;
&lt;li&gt;Desired solution requirements&lt;/li&gt;
&lt;li&gt;Solution implementation, with supporting performance statistics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We implemented a DBT &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy#custom-strategies" rel="noopener noreferrer"&gt;custom incremental strategy&lt;/a&gt;, along with &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy#about-incremental_predicates" rel="noopener noreferrer"&gt;incremental predicates&lt;/a&gt; to improve snowflake query performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced MBs scanned by ~99.68%&lt;/li&gt;
&lt;li&gt;Reduced &lt;a href="https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions#what-are-micro-partitions" rel="noopener noreferrer"&gt;micro-partitions&lt;/a&gt; scanned by ~99.53%&lt;/li&gt;
&lt;li&gt;Reduced query time from &lt;code&gt;19&lt;/code&gt; seconds to &lt;code&gt;1.3&lt;/code&gt; seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Less data is being scanned, so the snowflake warehouse is waiting less time on I/O, so the query completes faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Custom incremental strategies and incremental predicates are more advanced uses of DBT for incremental processing. But I suppose that’s where you have the most fun, so lets get stuck in!&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;When using the DBT built-in &lt;code&gt;delete-insert&lt;/code&gt; &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy" rel="noopener noreferrer"&gt;incremental strategy&lt;/a&gt; on large volumes of data, you can get inefficient queries on snowflake when the &lt;code&gt;delete&lt;/code&gt; statement is executed. This means queries take longer and &lt;strong&gt;increase warehouse costs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Taking an example target table:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With &lt;code&gt;~458 million&lt;/code&gt; rows&lt;/li&gt;
&lt;li&gt;Is &lt;code&gt;~26 GB&lt;/code&gt; in size&lt;/li&gt;
&lt;li&gt;Has &lt;code&gt;~2560&lt;/code&gt; micro-partitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a DBT model that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is running every &lt;code&gt;30 minutes&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Typically there are &lt;code&gt;~100K&lt;/code&gt; rows to merge into the target table on every run. As data can arrive out-of-order, a subsequent run will pick these up, but means it can include rows already processed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With DBT model config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  - name: model_name
    config:
      materialized: "incremental"
      incremental_strategy: "delete+insert"
      on_schema_change: "append_new_columns"
      unique_key: ["dw_order_created_skey"] -- varchar(100)
      cluster_by: ["to_date(order_created_at)"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default &lt;code&gt;delete&lt;/code&gt; SQL generated by DBT, before it inserts data in the same transaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;delete from target_table as DBT_INTERNAL_DEST
where (dw_order_created_skey) in (
  select distinct dw_order_created_skey
  from source_temp_table as DBT_INTERNAL_SOURCE
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Statistics
&lt;/h3&gt;

&lt;p&gt;To find the rows in the target table to delete with the matching &lt;code&gt;dw_order_created_skey&lt;/code&gt; (see node profile overview image below), snowflake has to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan &lt;code&gt;~11 GB&lt;/code&gt; of the target table&lt;/li&gt;
&lt;li&gt;Scan all &lt;code&gt;~2560 micro-partitions&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Query takes &lt;code&gt;~19 seconds&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; - The query is &lt;strong&gt;not&lt;/strong&gt; filtering on &lt;code&gt;order_created_at&lt;/code&gt; to allow snowflake to use the &lt;code&gt;clustering key&lt;/code&gt; of &lt;code&gt;to_date(order_created_at)&lt;/code&gt; to find the matching rows to delete. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query plan&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimzno0g6qh0rjiv3pglt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimzno0g6qh0rjiv3pglt.png" alt="delete query plan"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx4xw5sqxxzu9mmj7mq11.png" alt="node profile overview"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Desired Solution
&lt;/h2&gt;

&lt;p&gt;To limit the data read in the target table above. We can make use of &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy#about-incremental_predicates" rel="noopener noreferrer"&gt;incremental_predicates&lt;/a&gt; in the model config. This will add SQL to filter the target table.&lt;/p&gt;

&lt;p&gt;DBT model config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  - name: model_name
    config:
      materialized: "incremental"
      incremental_strategy: "delete+insert"
      on_schema_change: "append_new_columns"
      unique_key: ["dw_order_created_skey"]
      cluster_by: ["to_date(order_created_at)"]
      incremental_predicates:
        - "order_created_at &amp;gt;= (select dateadd(hour, -24, min(order_created_at)) from DBT_INTERNAL_SOURCE)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Issues with this&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy#about-incremental_predicates" rel="noopener noreferrer"&gt;incremental_predicates&lt;/a&gt; docs states &lt;em&gt;dbt does not check the syntax of the SQL statements&lt;/em&gt;, so it does not change anything in the SQL.&lt;/li&gt;
&lt;li&gt;We get an error when it executes on snowflake: &lt;code&gt;Object 'DBT_INTERNAL_SOURCE' does not exist or not authorized.&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;We cannot hardcode the snowflake table name in the incremental_predicates, as its dynamically generated by DBT.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Solution Implementation
&lt;/h2&gt;

&lt;p&gt;We need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do some pre-processing on each element of &lt;code&gt;incremental_predicates&lt;/code&gt; to replace &lt;code&gt;DBT_INTERNAL_SOURCE&lt;/code&gt; with actual &lt;code&gt;source_temp_table&lt;/code&gt; so SQL like the below is generated by DBT for better performance:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;delete from target_table as DBT_INTERNAL_DEST
where (dw_order_created_skey) in (
  select distinct dw_order_created_skey
  from source_temp_table as DBT_INTERNAL_SOURCE
)
-- Added by incremental_predicates
and order_created_at &amp;gt;= (select dateadd(hour, -24, min(order_created_at)) from source_temp_table)
;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Continue to call the &lt;strong&gt;default&lt;/strong&gt; DBT &lt;code&gt;delete+insert&lt;/code&gt; incremental strategy with the new value for &lt;code&gt;incremental_predicates&lt;/code&gt; in the arguments dictionary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How&lt;/strong&gt; - The below macro implements a &lt;em&gt;light-weight&lt;/em&gt; &lt;a href="https://docs.getdbt.com/docs/build/incremental-strategy#custom-strategies" rel="noopener noreferrer"&gt;custom incremental strategy&lt;/a&gt; do this. You can see at the end it calls the default &lt;code&gt;get_incremental_delete_insert_sql&lt;/code&gt; DBT code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{% macro get_incremental_custom_delete_insert_sql(arg_dict) %}
  {% set custom_arg_dict = arg_dict.copy() %}
  {% set source = custom_arg_dict.get('temp_relation') | string %}
  {% set target = custom_arg_dict.get('target_relation') | string %}

  {% if source is none %}
    {{ exceptions.raise_compiler_error('temp_relation is not present in arguments!') }}
  {% endif %}

  {% if target is none %}
    {{ exceptions.raise_compiler_error('target_relation is not present in arguments!') }}
  {% endif %}

  {% set raw_predicates = custom_arg_dict.get('incremental_predicates', []) %}

  {% if raw_predicates is string %}
    {% set predicates = [raw_predicates] %}
  {% else %}
    {% set predicates = raw_predicates %}
  {% endif %}

  {% if predicates %}
    {% set replaced_predicates = [] %}
    {% for predicate in predicates %}
      {% set replaced = predicate
        | replace('DBT_INTERNAL_SOURCE', source)
        | replace('DBT_INTERNAL_DEST', target)
      %}
      {% do replaced_predicates.append(replaced) %}
    {% endfor %}
    {% do custom_arg_dict.update({'incremental_predicates': replaced_predicates}) %}
  {% endif %}

  {{ log('Calling get_incremental_delete_insert_sql with args: ' ~ custom_arg_dict, info=False) }}
  {{ get_incremental_delete_insert_sql(custom_arg_dict) }}
{% endmacro %}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is now callable from the DBT model config by setting &lt;code&gt;incremental_strategy&lt;/code&gt; to &lt;code&gt;custom_delete_insert&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  - name: model_name
    config:
      materialized: "incremental"
      incremental_strategy: "custom_delete_insert"
      on_schema_change: "append_new_columns"
      unique_key: ["dw_order_created_skey"]
      cluster_by: ["to_date(order_created_at)"]
      incremental_predicates:
        - "order_created_at &amp;gt;= (select dateadd(hour, -24, min(order_created_at)) from DBT_INTERNAL_SOURCE)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Improvement Statistics
&lt;/h3&gt;

&lt;p&gt;To find &lt;code&gt;~100K&lt;/code&gt; rows to delete in the target table, now snowflake has to only:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan &lt;code&gt;~35 MB&lt;/code&gt; of the target table, 11 GB → 35 MB = &lt;strong&gt;~99.68%&lt;/strong&gt; improvement&lt;/li&gt;
&lt;li&gt;Scan &lt;code&gt;12 micro-partitions&lt;/code&gt;, 2560 → 12 = &lt;strong&gt;~99.53%&lt;/strong&gt; improvement&lt;/li&gt;
&lt;li&gt;Query takes &lt;code&gt;~1.3 seconds&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Less data is being scanned, so the snowflake warehouse is waiting less time on I/O, so the query completes faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query plan&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqdvdbdos892fe5urwru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqdvdbdos892fe5urwru.png" alt="delete query plan"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ieoslvjkw1e2j9ep52n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ieoslvjkw1e2j9ep52n.png" alt="node profile overview"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;If you're interested in hearing more about how we use DBT at &lt;a href="https://www.superpayments.com" rel="noopener noreferrer"&gt;Super Payments&lt;/a&gt;, feel free to reach out!&lt;/p&gt;

</description>
      <category>dbt</category>
      <category>snowflake</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
