<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: LETSQL</title>
    <description>The latest articles on DEV Community by LETSQL (@letsql).</description>
    <link>https://dev.to/letsql</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9199%2F7efafa02-8461-4b41-b0d1-2d9fcac23a56.png</url>
      <title>DEV Community: LETSQL</title>
      <link>https://dev.to/letsql</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/letsql"/>
    <language>en</language>
    <item>
      <title>Embrace the Power of Nix for Your Python + Rust Workflow</title>
      <dc:creator>Hussain Sultan</dc:creator>
      <pubDate>Wed, 24 Jul 2024 18:01:43 +0000</pubDate>
      <link>https://dev.to/letsql/embrace-the-power-of-nix-for-your-python-rust-workflow-2ho9</link>
      <guid>https://dev.to/letsql/embrace-the-power-of-nix-for-your-python-rust-workflow-2ho9</guid>
      <description>&lt;p&gt;Hello Dev.to community!&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://www.letsql.com/" rel="noopener noreferrer"&gt;LETSQL&lt;/a&gt;, we're passionate about delivering the best tools for data scientists and ML engineers. As part of our journey, we leverage Nix to streamline our development workflows. If you're working with Python and Rust, or simply curious about why Nix is a game-changer, read on!&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Nix?
&lt;/h3&gt;

&lt;p&gt;Nix is a powerful package manager that offers reproducibility, reliability, and isolation. Here’s why we think it’s great:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reproducible Builds&lt;/strong&gt;: Nix ensures that your development environment is consistent across different machines. With Nix, you can avoid the infamous "works on my machine" problem. Every developer and CI pipeline gets the same environment, leading to fewer surprises and smoother collaborations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Declarative Configuration&lt;/strong&gt;: Nix uses a declarative approach to specify dependencies. This means you can define your environment in a single configuration file, making it easy to version control and share. Whether you're working on a small script or a large project, managing dependencies becomes straightforward.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Isolation&lt;/strong&gt;: Nix provides isolated environments, preventing conflicts between dependencies. This is particularly useful when working on multiple projects with different requirements. No more worrying about version clashes or dependency hell!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Nix for Python + Rust Workflow
&lt;/h3&gt;

&lt;p&gt;Combining Python and Rust in a single project can be challenging, but Nix simplifies this process. Here’s how:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Single Source of Truth&lt;/strong&gt;: With Nix, you can manage both Python and Rust dependencies in a unified way. Define your dependencies in a &lt;code&gt;flake.nix&lt;/code&gt; file and let Nix handle the rest. This ensures that your Python packages and Rust crates are compatible and work seamlessly together.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Easy Setup&lt;/strong&gt;: Setting up a development environment with Nix is as simple as running a single command. Once your configuration is in place, you can quickly spin up environments with all the necessary dependencies, saving valuable setup time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consistent Development and Deployment&lt;/strong&gt;: By using Nix, you ensure that your development, testing, and production environments are identical. This reduces the risk of bugs caused by environmental discrepancies and makes deployments more predictable and reliable.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Crane Lib: Incremental Artifact Caching for Rust Projects
&lt;/h3&gt;

&lt;p&gt;We use &lt;a href="https://github.com/ipetkov/crane" rel="noopener noreferrer"&gt;Crane&lt;/a&gt;, a Nix library for building Cargo projects. One of Crane’s standout features is incremental artifact caching, which ensures you never build the same artifact twice. This greatly speeds up the build process and reduces redundant work.&lt;/p&gt;

&lt;p&gt;Here's how Crane fits into our &lt;code&gt;flake.nix&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Crane Library Integration&lt;/strong&gt;: We use Crane to manage our Rust builds. It allows for fine-grained control over the build process and ensures efficient use of cached artifacts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cargo Configuration&lt;/strong&gt;: The &lt;code&gt;cargo.toml&lt;/code&gt; and &lt;code&gt;Cargo.lock&lt;/code&gt; files define our Rust dependencies. Crane leverages these files to manage the build process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build Command&lt;/strong&gt;: We use a custom build command for Maturin, integrated into our Nix build script to create Python wheels from our Rust code. This command ensures that all necessary artifacts are built and cached.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Maturin Build: Bridging Rust and Python
&lt;/h3&gt;

&lt;p&gt;Maturin is a fantastic tool for building and publishing Rust crates as Python packages. With Nix, integrating Maturin into our workflow is straightforward. Here's a snippet from our &lt;code&gt;flake.nix&lt;/code&gt; demonstrating the Maturin build setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;buildPhaseCargoCommand&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;''&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;  &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;maturin&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/bin/maturin build \&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    --offline \&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    --target-dir target \&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    --manylinux off \&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    --strip \&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    --release&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command ensures that our Rust crate is built and packaged as a Python wheel, ready for distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Poetry2nix: Seamless Python Dependency Management
&lt;/h2&gt;

&lt;p&gt;We also utilize Poetry2nix to manage our Python dependencies. Poetry2nix translates pyproject.toml and poetry.lock files into Nix expressions, allowing us to maintain our Python dependencies declaratively.&lt;/p&gt;

&lt;p&gt;Here’s how Poetry2nix is configured in our flake.nix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nv"&gt;poetry2nix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"github:nix-community/poetry2nix"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;nixpkgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;follows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"nixpkgs"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="nv"&gt;commonPoetryArgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nv"&gt;projectDir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sx"&gt;./.&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nv"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;pySrc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nv"&gt;preferWheels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nv"&gt;python&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="err"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nv"&gt;groups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt; &lt;span class="s2"&gt;"test"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nv"&gt;myapp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;mkPoetryApplication&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;commonPoetryArgs&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nv"&gt;buildInputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;optionals&lt;/span&gt; &lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;stdenv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;isDarwin&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;libiconv&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;overridePythonAttrs&lt;/span&gt; &lt;span class="nv"&gt;maturinOverride&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explore Our Configuration
&lt;/h2&gt;

&lt;p&gt;For a full example of how we set up our development environment using Nix, check out our &lt;a href="https://github.com/letsql/letsql/blob/main/flake.nix" rel="noopener noreferrer"&gt;flake.nix file&lt;/a&gt;. This configuration includes everything from Crane for Rust builds to Poetry2nix for Python dependencies, ensuring a seamless and reproducible development workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join Us on GitHub!
&lt;/h2&gt;

&lt;p&gt;We're always looking for contributors and feedback. If you find Nix as exciting as we do, check out our &lt;a href="https://github.com/letsql/letsql" rel="noopener noreferrer"&gt;LETSQL GitHub repository&lt;/a&gt; and give us a star! 🌟 Your support helps us continue building great tools for the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Have questions or want to share your experience with Nix? Drop a comment below, or reach out to us on Twitter. We love hearing from you!&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

</description>
      <category>nix</category>
      <category>python</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to build a new Harlequin adapter with Poetry</title>
      <dc:creator>Hussain Sultan</dc:creator>
      <pubDate>Wed, 17 Jul 2024 18:59:00 +0000</pubDate>
      <link>https://dev.to/letsql/how-to-build-a-new-harlequin-adapter-with-poetry-2ld5</link>
      <guid>https://dev.to/letsql/how-to-build-a-new-harlequin-adapter-with-poetry-2ld5</guid>
      <description>&lt;p&gt;Welcome to the first post in LETSQL's tutorial series!&lt;/p&gt;

&lt;p&gt;In this blog post, we take a detour from our usual theme of data pipelines to demonstrate how to create and publish a Python package with Poetry, using DataFusion as an example.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://harlequin.sh/" rel="noopener noreferrer"&gt;Harlequin&lt;/a&gt; is a TUI client for SQL databases known for its light-weight extensive support for SQL databases. It is a versatile tool for data exploration and analysis workflows. Harlequin provides an interactive SQL editor with features like autocomplete, syntax highlighting, and query history. It also has a results viewer that can display large result sets. However, Harlequin did not have a &lt;a href="https://datafusion.apache.org/" rel="noopener noreferrer"&gt;DataFusion&lt;/a&gt; adapter before. Thankfully, it was really easy to add one. &lt;/p&gt;

&lt;p&gt;In this post, We'll demonstrate these concepts by building a Harlequin adapter for DataFusion. And, by way of doing so, we will also cover Poetry's essential features, project setup, and the steps to publish your package on PyPI.&lt;/p&gt;

&lt;p&gt;To get the most out of this guide, you should have a basic understanding of &lt;a href="https://docs.python.org/3/library/venv.html" rel="noopener noreferrer"&gt;virtual environments&lt;/a&gt;, Python &lt;a href="https://docs.python.org/3/tutorial/modules.html" rel="noopener noreferrer"&gt;packages and modules&lt;/a&gt;, and &lt;a href="https://pip.pypa.io/en/stable/" rel="noopener noreferrer"&gt;pip&lt;/a&gt;.&lt;br&gt;
Our objectives are to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduce Poetry and its advantages&lt;/li&gt;
&lt;li&gt;Set up a project using Poetry&lt;/li&gt;
&lt;li&gt;Develop a Harlequin adapter for DataFusion&lt;/li&gt;
&lt;li&gt;Prepare and publish the package to PyPI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll have practical experience with Poetry and an understanding of modern Python package management.&lt;/p&gt;

&lt;p&gt;The code implemented in this post is available on &lt;a href="https://github.com/mesejo/datafusion-adapter" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and available in &lt;a href="https://pypi.org/project/harlequin-datafusion/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Harlequin
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://harlequin.sh" rel="noopener noreferrer"&gt;Harlequin&lt;/a&gt; is a SQL IDE that runs in the terminal. It provides a powerful and feature-rich alternative to traditional command-line database tools, making it versatile for data exploration and analysis workflows.&lt;/p&gt;

&lt;p&gt;Some key things to know about Harlequin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Harlequin supports multiple &lt;a href="https://harlequin.sh/docs/adapters" rel="noopener noreferrer"&gt;database adapters&lt;/a&gt;, connecting you to DuckDB, SQLite, PostgreSQL, MySQL, and more.&lt;/li&gt;
&lt;li&gt;Harlequin provides an interactive SQL editor with features like autocomplete, syntax highlighting, and query history. It also has a results viewer that can display large result sets.&lt;/li&gt;
&lt;li&gt;Harlequin replaces traditional terminal-based database tools with a more powerful and user-friendly interface.&lt;/li&gt;
&lt;li&gt;Harlequin uses adapter plug-ins as a generic interface to any database.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  DataFusion
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;DataFusion is a fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format.&lt;/p&gt;

&lt;p&gt;DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It ships with it its own CLI, more information can be found &lt;a href="https://datafusion.apache.org/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Poetry
&lt;/h2&gt;

&lt;p&gt;Poetry is a modern, feature-rich tool that streamlines dependency management and packaging for Python projects, making development more deterministic and efficient.&lt;br&gt;
From the &lt;a href="https://python-poetry.org/docs/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on, and it will manage (install/update) them for you.&lt;br&gt;
Poetry offers a lockfile to ensure repeatable installs and can build your project for distribution.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Creating New Adapters for Harlequin
&lt;/h2&gt;

&lt;p&gt;A Harlequin adapter is a Python package that allows Harlequin to work with a database system.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An adapter is a Python package that declares an entry point in the harlequin.adapters group. That entry point should reference a subclass of the &lt;code&gt;HarlequinAdapter&lt;/code&gt; abstract base class.&lt;br&gt;
This allows Harlequin to discover installed adapters and instantiate a selected adapter at run-time&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In addition to the &lt;code&gt;HarlequinAdapter&lt;/code&gt; class, the package must also provide implementations for &lt;code&gt;HarlequinConnection&lt;/code&gt;, and &lt;code&gt;HarlequinCursor&lt;/code&gt;. A more detailed description can be found on this&lt;br&gt;
&lt;a href="https://harlequin.sh/docs/contributing/adapter-guide" rel="noopener noreferrer"&gt;guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Harlequin Adapter Template
&lt;/h3&gt;

&lt;p&gt;The first step for developing a Harlequin adapter is to generate a new repo from the existing  &lt;a href="https://github.com/tconbeer/harlequin-adapter-template" rel="noopener noreferrer"&gt;&lt;code&gt;harlequin-adapter-template&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="(https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template)"&gt;GitHub templates&lt;/a&gt; are repositories that serve as starting points for new projects. They provide pre-configured files, structures, and settings that are copied to new repositories, allowing for quick project setup without the overhead of forking.&lt;br&gt;
This feature streamlines the process of creating consistent, well-structured projects based on established patterns.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;harlequin-adapter-template&lt;/code&gt; comes with a &lt;code&gt;poetry.lock&lt;/code&gt; file and a &lt;code&gt;pyproject.toml&lt;/code&gt; file, in addition to some boilerplate code for defining the required classes. &lt;/p&gt;
&lt;h2&gt;
  
  
  Coding the Adapter
&lt;/h2&gt;

&lt;p&gt;Let's explore the essential files needed for package distribution before we get into the specifics of coding. &lt;/p&gt;
&lt;h3&gt;
  
  
  Package configuration
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pyproject.toml&lt;/code&gt; file is now the standard for configuring Python packages for publication and other tools. Introduced in &lt;a href="https://peps.python.org/pep-0518/" rel="noopener noreferrer"&gt;PEP 518&lt;/a&gt; and &lt;a href="https://peps.python.org/pep-0621/" rel="noopener noreferrer"&gt;PEP 621&lt;/a&gt;, this &lt;a href="https://toml.io/" rel="noopener noreferrer"&gt;TOML&lt;/a&gt;-formatted file consolidates multiple configuration files into one. It enhances dependency management by making it more robust and standardized. &lt;/p&gt;

&lt;p&gt;Poetry, utilizes &lt;code&gt;pyproject.toml&lt;/code&gt; to handle the project's virtual environment, resolve dependencies, and create packages.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pyproject.toml&lt;/code&gt; of the template is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.poetry]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"harlequin-myadapter"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.0"&lt;/span&gt;
&lt;span class="py"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"A Harlequin adapter for &amp;lt;my favorite database&amp;gt;."&lt;/span&gt;
&lt;span class="py"&gt;authors&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"Ted Conbeer &amp;lt;tconbeer@users.noreply.github.com&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;license&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"MIT"&lt;/span&gt;
&lt;span class="py"&gt;readme&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"README.md"&lt;/span&gt;
&lt;span class="py"&gt;packages&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="err"&gt;{&lt;/span&gt; &lt;span class="py"&gt;include&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"harlequin_myadapter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;from&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"src"&lt;/span&gt; &lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[tool.poetry.plugins."harlequin.adapter"]&lt;/span&gt;
&lt;span class="py"&gt;my-adapter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"harlequin_myadapter:MyAdapter"&lt;/span&gt;

&lt;span class="nn"&gt;[tool.poetry.dependencies]&lt;/span&gt;
&lt;span class="py"&gt;python&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;"&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.8&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="py"&gt;harlequin&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.7"&lt;/span&gt;

&lt;span class="nn"&gt;[tool.poetry.group.dev.dependencies]&lt;/span&gt;
&lt;span class="py"&gt;ruff&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.1.6"&lt;/span&gt;
&lt;span class="py"&gt;pytest&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^7.4.3"&lt;/span&gt;
&lt;span class="py"&gt;mypy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.7.0"&lt;/span&gt;
&lt;span class="py"&gt;pre-commit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^3.5.0"&lt;/span&gt;
&lt;span class="py"&gt;importlib_metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="py"&gt;"&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;4.6&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;", python = "&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mf"&gt;3.10&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="s"&gt;" }&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;build-system&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;requires&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"poetry-core"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;build-backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"poetry.core.masonry.api"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As it can be seen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;[tool.poetry]&lt;/code&gt; section of the &lt;code&gt;pyproject.toml&lt;/code&gt; file is where you define the metadata for your Python package, such as the name, version, description, authors, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;[tool.poetry.dependencies]&lt;/code&gt; subsection is where you declare the runtime dependencies your project requires. Running &lt;code&gt;poetry add &amp;lt;package&amp;gt;&lt;/code&gt; will automatically update this section.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;[tool.poetry.dev-dependencies]&lt;/code&gt; subsection is where you declare development-only dependencies, like testing frameworks, linters, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;[build-system]&lt;/code&gt; section is used to store build-related data. In this case, it specifies the &lt;code&gt;build-backend&lt;/code&gt; as &lt;code&gt;"poetry.core.masonry.api"&lt;/code&gt;. In a narrow sense, the core responsibility of a&lt;br&gt;
build-backend is to build wheels and sdist.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository also includes a &lt;code&gt;poetry.lock&lt;/code&gt; file, a Poetry-specific component generated by running &lt;code&gt;poetry install&lt;/code&gt; or &lt;code&gt;poetry update&lt;/code&gt;. This lock file specifies the exact versions of all dependencies and sub-dependencies for your project, ensuring reproducible installations across different environments.&lt;/p&gt;

&lt;p&gt;It's crucial to avoid manual edits to the &lt;code&gt;poetry.lock&lt;/code&gt; file, as this can cause inconsistencies and installation issues. Instead, make changes to your &lt;code&gt;pyproject.toml&lt;/code&gt; file and allow Poetry to automatically update the lock file by running &lt;code&gt;poetry lock&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Poetry
&lt;/h3&gt;

&lt;p&gt;Per Poetry's &lt;a href="https://python-poetry.org/docs/#installation" rel="noopener noreferrer"&gt;installation warning&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;::: {.warning}&lt;br&gt;
Poetry should always be installed in a dedicated virtual environment to isolate it from the rest of your system. It should in no case be installed in the environment of the project that is to be managed by Poetry.&lt;br&gt;
:::&lt;/p&gt;

&lt;p&gt;Here we will presume you have access to Poetry by running &lt;code&gt;pipx install poetry&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Developing in the virtual environment
&lt;/h3&gt;

&lt;p&gt;With our file structure clarified, let's begin the development process by setting up our environment. Since our project already includes &lt;code&gt;pyproject.toml&lt;/code&gt; and &lt;code&gt;poetry.lock&lt;/code&gt; files, we can initiate our environment using the &lt;code&gt;poetry shell&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;This command activates the virtual environment linked to the current Poetry project, ensuring all subsequent operations occur within the project's dependency context. If no virtual environment exists, &lt;code&gt;poetry shell&lt;/code&gt; automatically creates and activates one.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;poetry shell&lt;/code&gt; detects your current shell and launches a new instance within the virtual environment. As Poetry centralizes virtual environments by default, this command eliminates the need to locate or recall the specific path to the activate script.&lt;/p&gt;

&lt;p&gt;To verify which Python environment is currently in use with Poetry, you can use the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry &lt;span class="nb"&gt;env &lt;/span&gt;list &lt;span class="nt"&gt;--full-path&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will show all the virtual environments associated with your project and indicate which one is currently active.&lt;br&gt;
As an alternative, you can get the full path of only the current environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry &lt;span class="nb"&gt;env &lt;/span&gt;info &lt;span class="nt"&gt;-p&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the environment activated, use &lt;code&gt;poetry install&lt;/code&gt; to install the required dependencies. The command works as follows&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If a &lt;code&gt;poetry.lock&lt;/code&gt; file is present, &lt;code&gt;poetry install&lt;/code&gt; will use the exact versions specified in that file rather than resolving the dependencies dynamically. This ensures consistent, repeatable installations across different environments.
i. If you run &lt;code&gt;poetry install&lt;/code&gt; and it doesn't seem to be progressing, you may need to run &lt;code&gt;export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring&lt;/code&gt; in the shell you're installing in&lt;/li&gt;
&lt;li&gt;Otherwise, it reads the &lt;code&gt;pyproject.toml&lt;/code&gt; file in the current project, resolves the dependencies listed there, and installs them.&lt;/li&gt;
&lt;li&gt;If no &lt;code&gt;poetry.lock&lt;/code&gt; file exists, &lt;code&gt;poetry install&lt;/code&gt; will create one after resolving the dependencies, otherwise it will update the existing one.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To complete the environment setup, we need to add the &lt;code&gt;datafusion&lt;/code&gt; library to our dependencies. Execute the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry add datafusion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command updates your &lt;code&gt;pyproject.toml&lt;/code&gt; file with the &lt;code&gt;datafusion&lt;/code&gt; package and installs it. If you don't specify a version, Poetry will automatically select an appropriate one based on available package versions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing the Interfaces
&lt;/h3&gt;

&lt;p&gt;To create a Harlequin Adapter, you need to implement three interfaces defined as abstract classes in the &lt;code&gt;harlequin.adapter&lt;/code&gt; module.&lt;/p&gt;

&lt;p&gt;The first one is the &lt;code&gt;HarlequinAdapter&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#| eval: false
#| code-fold: false
#| code-summary: implementation of HarlequinAdapter
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataFusionAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HarlequinAdapter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conn_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn_str&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFusionConnection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataFusionConnection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second one is the &lt;code&gt;HarlequinConnection&lt;/code&gt;, particularly the methods &lt;code&gt;execute&lt;/code&gt; and &lt;code&gt;get_catalog&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#| eval: false
#| code-fold: false
#| code-summary: implementation of execution of HarlequinConnection
&lt;/span&gt;
 &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;HarlequinCursor&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# type: ignore
&lt;/span&gt;         &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logical_plan&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EmptyRelation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
             &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
     &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HarlequinQueryError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
             &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
             &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Harlequin encountered an error while executing your query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
     &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
             &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;DataFusionCursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
         &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
             &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For brevity, we've omitted the implementation of the &lt;code&gt;get_catalog&lt;/code&gt; function. You can find the full code in the &lt;a href="(https://github.com/mesejo/datafusion-adapter/blob/f51ca365f13a83356d31de257ca3107ce3ce3544/src/harlequin_datafusion/adapter.py#L123)"&gt;&lt;code&gt;adapter.py&lt;/code&gt;&lt;/a&gt; file within our GitHub repository.&lt;/p&gt;

&lt;p&gt;Finally, a &lt;code&gt;HarlequinCursor&lt;/code&gt; implementation must be provided as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#| eval: false
#| code-fold: false
#| code-summary: implementation of HarlequinCursor
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataFusionCursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HarlequinCursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_mapping&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFusionCursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AutoBackendType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_limit&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_arrow_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_limit&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_arrow_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HarlequinQueryError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Harlequin encountered an error while executing your query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Making the plugin discoverable
&lt;/h3&gt;

&lt;p&gt;Your adapter must register an entry point in the &lt;code&gt;harlequin.adapters&lt;/code&gt; group using the packaging software you use to build your project. &lt;br&gt;
If you use Poetry, you can define the entry point in your &lt;code&gt;pyproject.toml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.poetry.plugins."harlequin.adapter"]&lt;/span&gt;
&lt;span class="py"&gt;datafusion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"harlequin_datafusion:DataFusionAdapter"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An &lt;a href="https://packaging.python.org/en/latest/specifications/entry-points/" rel="noopener noreferrer"&gt;entry point&lt;/a&gt; is a mechanism for code to advertise components it provides to be discovered and used by other code.&lt;/p&gt;

&lt;p&gt;Notice that registering a plugin with Poetry is equivalent to the following &lt;code&gt;pyproject.toml&lt;/code&gt; &lt;a href="https://packaging.python.org/en/latest/specifications/pyproject-toml/#entry-points" rel="noopener noreferrer"&gt;specification&lt;/a&gt; for entry points:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[project.entry-points."harlequin.adapter"]&lt;/span&gt;
&lt;span class="py"&gt;datafusion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"harlequin_datafusion:DataFusionAdapter"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;The template provides a set of pre-configured tests, some of which are applicable to DataFusion while others may not be relevant. One test that's pretty cool checks if the plugin can be discovered, which is crucial for ensuring proper integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#| eval: false
#| code-fold: false
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version_info&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;importlib_metadata&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;entry_points&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;importlib.metadata&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;entry_points&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_plugin_discovery&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;PLUGIN_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;datafusion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;eps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;entry_points&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;harlequin.adapter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;eps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PLUGIN_NAME&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;adapter_cls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eps&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PLUGIN_NAME&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;issubclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;adapter_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HarlequinAdapter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;adapter_cls&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;DataFusionAdapter&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To make sure the tests are passing, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry run pytest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://python-poetry.org/docs/cli/#run" rel="noopener noreferrer"&gt;run command&lt;/a&gt; executes the given command inside the project’s virtualenv.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building and Publishing to PyPI
&lt;/h2&gt;

&lt;p&gt;With the tests passing, we're nearly ready to publish our project. Let's enhance our &lt;code&gt;pyproject.toml&lt;/code&gt; file to make our package more discoverable and appealing on PyPI. We'll add key metadata including:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A link to the GitHub repository&lt;/li&gt;
&lt;li&gt;A path to the README file&lt;/li&gt;
&lt;li&gt;A list of relevant classifiers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These additions will help potential users find and understand our package more easily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;classifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"Development Status :: 3 - Alpha"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Intended Audience :: Developers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Topic :: Software Development :: User Interfaces"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Topic :: Database :: Database Engines/Servers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"License :: OSI Approved :: MIT License"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"Programming Language :: Python :: Implementation :: CPython"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;readme&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"README.md"&lt;/span&gt;
&lt;span class="py"&gt;repository&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://github.com/mesejo/datafusion-adapter"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For reference:    &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The complete list of classifiers is available on &lt;a href="https://pypi.org/classifiers/" rel="noopener noreferrer"&gt;PyPI's website&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;For a detailed guide on writing &lt;code&gt;pyproject.toml&lt;/code&gt;, check out this &lt;a href="https://packaging.python.org/en/latest/guides/writing-pyproject-toml/" rel="noopener noreferrer"&gt;resource&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;The formal, technical specification for &lt;code&gt;pyproject.toml&lt;/code&gt; can be found on &lt;a href="https://packaging.python.org/en/latest/specifications/pyproject-toml/#" rel="noopener noreferrer"&gt;packaging.python.org&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building
&lt;/h2&gt;

&lt;p&gt;We're now ready to build our library and verify its functionality by installing it in a clean virtual environment. Let's start with the build process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will create distribution packages (both source and wheel) in the &lt;code&gt;dist&lt;/code&gt; directory. &lt;/p&gt;

&lt;p&gt;The wheel file should have a name like &lt;code&gt;harlequin_datafusion-0.1.1-py3-none-any.whl&lt;/code&gt;. This follows the standard &lt;a href="https://packaging.python.org/en/latest/specifications/binary-distribution-format/#file-name-convention" rel="noopener noreferrer"&gt;naming convention&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;harlequin_datafusion&lt;/code&gt; is the package (or distribution) name&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.1.1&lt;/code&gt; is the version number&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;py3&lt;/code&gt; indicates it's compatible with Python 3&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;none&lt;/code&gt; compatible with any CPU architecture&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;any&lt;/code&gt; with no &lt;a href="https://docs.python.org/3/c-api/stable.html" rel="noopener noreferrer"&gt;ABI&lt;/a&gt; (pure Python)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To test the installation, create a new directory called &lt;code&gt;test_install&lt;/code&gt;. Then, set up a fresh virtual environment with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To activate the virtual environment on MacOS or Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running this command, you should see the name of your virtual environment &lt;code&gt;(.venv)&lt;/code&gt; prepended to your command prompt, indicating that the virtual environment is now active.&lt;/p&gt;

&lt;p&gt;To install the wheel file we just built, use pip as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; /path/to/harlequin_datafusion-0.1.1-py3-none-any.whl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;/path/to/harlequin_datafusion-0.1.1-py3-none-any.whl&lt;/code&gt; with the actual path to the wheel file you want to install.&lt;/p&gt;

&lt;p&gt;If everything works fined, you should see some dependencies installed, and you should be able to do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;harlequin &lt;span class="nt"&gt;-a&lt;/span&gt; datafusion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congrats! You have built a Python library. Now it is time to share it with the world.&lt;/p&gt;

&lt;h3&gt;
  
  
  Publishing to PyPI
&lt;/h3&gt;

&lt;p&gt;The best practice before publishing to PyPI is to actually publish to the Test Python Package Index (TestPyPI)&lt;/p&gt;

&lt;p&gt;To publish a package to TestPyPI using Poetry, follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create an account at &lt;a href="https://test.pypi.org/" rel="noopener noreferrer"&gt;TestPyPI&lt;/a&gt;  if you haven't already.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate an &lt;a href="https://test.pypi.org/manage/account/token/" rel="noopener noreferrer"&gt;API token&lt;/a&gt; on your TestPyPI account page.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Register the TestPyPI repository with Poetry by running:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry config repositories.test-pypi https://test.pypi.org/legacy/
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To publish your package, run:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry publish &lt;span class="nt"&gt;-r&lt;/span&gt; testpypi &lt;span class="nt"&gt;--username&lt;/span&gt; __token__ &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;token&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Replace &lt;code&gt;&amp;lt;token&amp;gt;&lt;/code&gt; with the actual token value you generated in step 2. To verify the publishing process, use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://test.pypi.org/simple/ &lt;span class="nt"&gt;--extra-index-url&lt;/span&gt; https://pypi.org/simple &amp;lt;package-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command uses two key arguments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--index-url&lt;/code&gt;: Directs pip to find your package on TestPyPI.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--extra-index-url&lt;/code&gt;: Allows pip to fetch any dependencies from the main PyPI repository.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replace &lt;code&gt;&amp;lt;package-name&amp;gt;&lt;/code&gt; with your specific package name (e.g., &lt;code&gt;harlequin-datafusion&lt;/code&gt; if following this post). For additional details, consult the information provided in this &lt;a href="https://stackoverflow.com/questions/34514703/pip-install-from-pypi-works-but-from-testpypi-fails-cannot-find-requirements" rel="noopener noreferrer"&gt;post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To publish to the actual Python Package Index (PyPI) instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create an account at &lt;a href="https://pypi.org/" rel="noopener noreferrer"&gt;https://pypi.org/&lt;/a&gt; if you haven't already.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate an API token on your PyPI account page.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;poetry publish &lt;span class="nt"&gt;--username&lt;/span&gt; __token__ &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;token&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The default repository is PyPI, so there's no need to specify it. &lt;/p&gt;

&lt;p&gt;Is worth noting that Poetry only supports the &lt;a href="https://warehouse.pypa.io/api-reference/legacy.html#upload-api" rel="noopener noreferrer"&gt;Legacy Upload API&lt;/a&gt; when publishing your project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Publishing on GitHub release
&lt;/h3&gt;

&lt;p&gt;Manually publishing each time is repetitive and error-prone, so to fix this problem, let us create a GitHub Action to &lt;br&gt;
publish each time we create a release.&lt;/p&gt;

&lt;p&gt;Here are the key steps to publish a Python package to PyPI using GitHub Actions and Poetry:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set up PyPI authentication&lt;/strong&gt;: You must provide your PyPI credentials (the API token) as GitHub secrets so the GitHub Actions workflow can access them. Name these secrets something like &lt;code&gt;PYPI_TOKEN&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a GitHub Actions workflow file&lt;/strong&gt;: In your project's &lt;code&gt;.github/workflows&lt;/code&gt; directory, create a new file like &lt;code&gt;publish.yml&lt;/code&gt; with the following content:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and publish python package&lt;/span&gt;

   &lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;published&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

   &lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;publish-package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
       &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
       &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v4&lt;/span&gt;
           &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.10'&lt;/span&gt;

         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Poetry&lt;/span&gt;
           &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;snok/install-poetry@v1&lt;/span&gt;

         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;poetry config pypi-token.pypi "${{ secrets.PYPI_TOKEN }}"&lt;/span&gt;

         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Publish package&lt;/span&gt;
           &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;poetry publish --build --username __token__&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is to leverage GitHub Actions to automate the publishing process and use Poetry to manage your package's dependencies and metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Poetry is a user-friendly Python package management tool that simplifies project setup and publication. Its intuitive command-line interface streamlines environment management and dependency installation. It supports plugin development, integrates with other tools, and emphasizes testing for robust code. With straightforward commands for building and publishing packages, Poetry makes it easier for developers to share their work with the Python community.&lt;/p&gt;

&lt;p&gt;At LETSQL, we're committed to contributing to the developer community. We hope this blog post serves as a straightforward guide to developing and publishing Python packages, emphasizing best practices and providing valuable resources. &lt;br&gt;
To subscribe to our newsletter, visit &lt;a href="https://letsql.com/" rel="noopener noreferrer"&gt;letsql.com&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;As we continue to refine the adapter, we would like to provide better autocompletion and direct reading from files (parquet, csv) as in the DataFusion-cli. This requires a tighter integration with the Rust library without going through the Python bindings.&lt;/p&gt;

&lt;p&gt;Your thoughts and feedback are invaluable as we navigate this journey. Share your experiences, questions, or suggestions in the comments below or on our community forum. Let's redefine the boundaries of data science and machine learning integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Acknowledgements
&lt;/h2&gt;

&lt;p&gt;Thanks to Dan Lovell and Hussain Sultan for the comments and the thorough review.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>tui</category>
      <category>datafusion</category>
      <category>python</category>
    </item>
    <item>
      <title>Declarative Multi-Engine Data Stack with Ibis</title>
      <dc:creator>Hussain Sultan</dc:creator>
      <pubDate>Wed, 17 Jul 2024 18:26:24 +0000</pubDate>
      <link>https://dev.to/letsql/declarative-multi-engine-data-stack-with-ibis-3015</link>
      <guid>https://dev.to/letsql/declarative-multi-engine-data-stack-with-ibis-3015</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I recently came across the &lt;a href="https://substack.com/@juhache" rel="noopener noreferrer"&gt;Ju Data Engineering Newsletter&lt;/a&gt; by Julien Hurault on the &lt;a href="https://juhache.substack.com/p/multi-engine-data-stack-v0" rel="noopener noreferrer"&gt;multi-engine data stack&lt;/a&gt;. The idea is simple; we'd like to easily port our code across any backend while retaining the flexibility to grow our pipeline as new backends and features are developed. This entails in at least the following high-level workflows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Offloading part of a SQL query to serverless engines with DuckDB, polars, DataFusion, &lt;a href="https://github.com/chdb-io/chdb" rel="noopener noreferrer"&gt;chdb&lt;/a&gt; etc. &lt;/li&gt;
&lt;li&gt;Right-size pipeline for various development and deployment scenarios. For example, developers can work locally and ship to production with confidence. &lt;/li&gt;
&lt;li&gt;Apply &lt;a href="https://www.letsql.com/posts/xgboost-end-to-end/" rel="noopener noreferrer"&gt;database style optimizations&lt;/a&gt; to your pipelines automatically.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this post, we dive into how we can implement the multi-engine pipeline from a programming language; Instead of SQL, we use propose using a Dataframe API that can be used for both interactive and batch use-cases. Specifically, we show how to break up our pipeline into smaller pieces and execute them across DuckDB, pandas, and Snowflake. We also discuss the advantages of a multi-engine data stack and highlight emerging trends in the field.&lt;/p&gt;

&lt;p&gt;The code implemented in this post is available on &lt;a href="https://github.com/hussainsultan/multi-engine-stack-ibis" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;^[In order to quickly try out repo, I also provide a nix flake]. The reference work in the newsletter with original implementation is &lt;a href="https://github.com/hachej/multi-engine-data-stack/tree/experientation-duck" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Multi-engine data stack pipeline works as follows: Some data lands in an S3 bucket, gets preprocessed to remove any duplicates and then loaded into a Snowflake table, where it is transformed further with ML or Snowflake specific functions^[Please note we do not go into implementing the types of things that might be possible in Snowflake and assume that as a requirement for the workflow]. The pipeline takes orders as parquet files that get saved into &lt;code&gt;landing&lt;/code&gt; location, are preprocessed and then stored at the &lt;code&gt;staging&lt;/code&gt; location in an S3 bucket. The staging data is then loaded in Snowflake to connect downstream BI tools to it. The pipeline is tied together by SQL &lt;code&gt;dbt&lt;/code&gt; with one model for each backend and the newsletter chooses Dagster as the orchestration tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2Fw_1456%2Cc_limit%2Cf_webp%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F3bdaaee0-67d5-4e6c-bfac-824adb46d901_1822x1100.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2Fw_1456%2Cc_limit%2Cf_webp%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252F3bdaaee0-67d5-4e6c-bfac-824adb46d901_1822x1100.png" alt="Multi-Engine Data Stack (V0)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Today, we are going to dive into how we can convert our pandas code to Ibis expressions, reproducing the complete example for Julien Hurault's multi engine stack example &lt;sup id="fnref1"&gt;1&lt;/sup&gt;. Instead of using &lt;code&gt;dbt&lt;/code&gt; Models and SQL, we use &lt;code&gt;ibis&lt;/code&gt; and some Python to compile and orchestrate SQL engines from a shell. By rewriting our code as Ibis expressions, we can declaratively build our data pipelines with deferred execution. Moreover, Ibis supports over 20 backends, so we can write code once and port our &lt;code&gt;ibis.expr&lt;/code&gt;s to multiple backends. To further simplify, we leave scheduling and task orchestration&lt;sup id="fnref2"&gt;2&lt;/sup&gt; provided by Dagster, up to the reader. &lt;/p&gt;

&lt;h2&gt;
  
  
  Core Concept of Multi-Engine Data Stack
&lt;/h2&gt;

&lt;p&gt;Here are the core concepts of the multi-engine data stack as outlined in Julien's newsletter:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fmulti-engine-data-stack.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fmulti-engine-data-stack.png" alt="multi-engine data stack"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Engine Data Stack:&lt;/strong&gt; The concept involves combining different data engines like Snowflake, Spark, DuckDB, and BigQuery. This approach aims to reduce costs, limit vendor lock-in, and increase flexibility. Julien mentions that for certain benchmark queries, using DuckDB could achieve a significant cost reduction compared to Snowflake.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development of a Cross-Engine Query Layer:&lt;/strong&gt; The newsletter highlights advancements in technology that allow data teams to transpile their SQL or Dataframe code from one engine to another seamlessly. This development is crucial for maintaining efficiency across different engines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use of Apache Iceberg and Alternatives:&lt;/strong&gt; While Apache Iceberg is seen as a potential unified storage layer, its integration is not yet mature to be used in a &lt;code&gt;dbt&lt;/code&gt; project. Instead, Julien has opted to use Parquet files stored in S3, accessed by both DuckDB and Snowflake, in his Proof of Concept (PoC).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration and Engines in PoC:&lt;/strong&gt; For the project, Julien used Dagster as the orchestrator, which simplifies the job scheduling of different engines within a &lt;code&gt;dbt&lt;/code&gt; project. The engines combined in this PoC were DuckDB and Snowflake.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why DataFrames and Ibis?
&lt;/h3&gt;

&lt;p&gt;While the pipeline above is nice for ETL and ELT, sometimes we want the power of a full programming language instead of a Query Language like SQL e.g. debugging, testing, complex &lt;a href="https://en.wikipedia.org/wiki/User-defined_function" rel="noopener noreferrer"&gt;UDFs&lt;/a&gt; etc. For scientific exploration, interactive computing is essential as data scientists need to quickly iterate on their code, visualize the results, and make decisions based on the data.&lt;/p&gt;

&lt;p&gt;DataFrames are such a data structure: DataFrames are used to process ordered data and apply compute operations on it in an interactive manner. They provide the flexibility to be able to process large data with SQL style operations, but also provides lower level control to edit cell level changes ala Excel Sheets.  Typically, the expectation is that all data is processed in-memory and typically fits in-memory. Moreover, DataFrames make it easy to go back and forth between deferred/batch and interactive modes. &lt;/p&gt;

&lt;p&gt;DataFrames excel^[no pun intended] at enabling folks to apply user-defined functions and releases a user from the &lt;a href="https://www.scattered-thoughts.net/writing/against-sql" rel="noopener noreferrer"&gt;limitations of SQL&lt;/a&gt; i.e. You can now re-use code, test your operations, easily extend relational machinery for complex operations.  DataFrames also make it easy to quickly go from Tabular representation of data into Arrays and Tensors expected by Machine Learning libraries.&lt;/p&gt;

&lt;p&gt;Specialized and in-process databases e.g. DuckDB for OLAP&lt;sup id="fnref3"&gt;3&lt;/sup&gt;, are blurring the boundary between a remote heavy weight database like Snowflake and an ergonomic library like pandas. We believe this is an opportunity for allowing DataFrames to process larger than memory data while maintaining the interactivity expectations and developer feel of a local Python shell, making larger than memory data feel small. &lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Deep Dive
&lt;/h2&gt;

&lt;p&gt;Our implementation focuses on the 4 concepts presented earlier:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Engine Data Stack&lt;/strong&gt;: We will use DuckDB, pandas, and Snowflake as our engines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Engine Query Layer&lt;/strong&gt;: We will use Ibis to write our expressions and compile them to run on DuckDB, pandas, and Snowflake.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Iceberg and Alternatives&lt;/strong&gt;: We will use Parquet files stored locally as our storage layer with the expectation that its trivial to extend to S3 using &lt;code&gt;s3fs&lt;/code&gt; package.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration and Engines in PoC&lt;/strong&gt;: We will focus on fine-grained scheduling for engines and leave orchestration to the reader. Fine-grained scheduling is more aligned with Ray, Dask, PySpark as compared to orchestration frameworks e.g. Dagster, Airflow etc. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Implementing with &lt;code&gt;pandas&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fpandas-parquet-snowflake.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fpandas-parquet-snowflake.png" alt="pandas, parquet and Snowflake"&gt;&lt;/a&gt;&lt;br&gt;
pandas is the quintessential DataFrame library and perhaps provides the simplest way to implement the above workflow. First, we generate random data borrowing from the implementation in the newsletter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#| echo: false
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;multi_engine_stack_ibis.generator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;generate_random_data&lt;/span&gt;
&lt;span class="nf"&gt;generate_random_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;landing/orders.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;landing/orders.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;deduped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pandas implementation is imperative in style and is designed so the data that can fit in memory. The pandas API is hard to compile down to SQL with all its nuances and largely sits in its own special place bringing together Python visualization, plotting, machine learning, AI and complex processing libraries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_pandas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deduped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;auto_create_table&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quote_identifiers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;table_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temporary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After de-duplicating using pandas operators, we are ready to send the data to Snowflake. Snowflake has a method called &lt;code&gt;write_pandas&lt;/code&gt; that comes in handy for our use-case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing with &lt;code&gt;Ibis&lt;/code&gt; aka Ibisify
&lt;/h3&gt;

&lt;p&gt;One pandas limitation is that it has its own API that does not quite &lt;a href="https://vldb.org/pvldb/vol13/p2033-petersohn.pdf" rel="noopener noreferrer"&gt;map back to relational algebra&lt;/a&gt;. Ibis is such a library that's literally built by people who built pandas to provide a sane expressions system that can be mapped back to multiple SQL backends. Ibis takes inspiration from the &lt;a href="https://dplyr.tidyverse.org/" rel="noopener noreferrer"&gt;dplyr&lt;/a&gt; R package to build a new expression system that can easily map back to relational algebra and thus compile to SQL. It also is declarative in style, enabling us to apply database style optimizations on the complete logical plan or the expression.  Ibis is a key component for enabling composability as highlighted in the excellent &lt;a href="https://voltrondata.com/codex/a-new-frontier#what-is-a-composable-data-system" rel="noopener noreferrer"&gt;composable codex&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#| echo: false
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ibis.backends.pandas.executor&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ibis.expr.types.relations&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;multi_engine_stack_ibis.generator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;generate_random_data&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;multi_engine_stack_ibis.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MyExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;checkpoint_parquet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                           &lt;span class="n"&gt;create_table_snowflake&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                           &lt;span class="n"&gt;replace_unbound&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;multi_engine_stack_ibis.connections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;make_ibis_snowflake_connection&lt;/span&gt;



&lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backends&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pandas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PandasExecutor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MyExecutor&lt;/span&gt;
&lt;span class="nf"&gt;setattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checkpoint_parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;checkpoint_parquet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;setattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create_table_snowflake&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_table_snowflake&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_backend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;p_staging&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging/staging.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;p_landing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;landing/orders.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;snow_backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;make_ibis_snowflake_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MULTI_ENGINE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PUBLIC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;warehouse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COMPUTE_WH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_landing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mutate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;row_number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;row_number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;over&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;order_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;row_number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;checkpoint_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_staging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_table_snowflake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;expr&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ibis expression prints itself as a plan that is akin to traditional Logical Plan in databases. A Logical Plan is a tree of relational algebra operators that describes the computation that needs to be performed. This plan is then optimized by the query optimizer and converted into a physical plan that is executed by the query executor. Ibis expressions are similar to Logical Plans in that they describe the computation that needs to be performed, but they are not executed immediately. Instead, they are compiled into SQL and executed on the backend when needed. Logical Plan is generally at a higher level of granularity than a DAG produced by a task scheduling framework like Dask. In theory, this plan could be compiled down to Dask's DAG. &lt;/p&gt;

&lt;p&gt;While pandas is embedded and is just a pip install away, it still has much documented limitations with plenty of performance improvements left on the table. This is where the recent embedded databases like DuckDB fill the gap of packing the full punch of a SQL engine, with all of its optimizations and benefiting from years of research that is as easy to import as is pandas. In this world, at minimum we can delegate all relational and SQL parts of our pipeline in pandas to DuckDB and only get the processed data ready for complex user defined Python. &lt;/p&gt;

&lt;p&gt;Now, we are ready to take our Ibisified code and compile our expression above to execute on arbitrary engines, to truly realize the write-once-run-anywhere paradigm: We have successfully decoupled our compute engine with the expression system describing our computation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Engine Data Stack w/ Ibis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  DuckDB + pandas + Snowflake
&lt;/h3&gt;

&lt;p&gt;Let's break our expression above into smaller parts and have them run across DuckDB, pandas and Snowflake. Note that we are not doing anything once the data lands in Snowflake and just show that we can select the data. Instead, we are leaving that up to the user's imagination what is possible with Snowflake native features.&lt;/p&gt;

&lt;p&gt;Notice our expression above is bound to the pandas backend. First, lets create an &lt;a href="https://ibis-project.org/how-to/extending/unbound_expression" rel="noopener noreferrer"&gt;UnboundTable&lt;/a&gt; expression to not have to depend on a backend when writing our expressions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fbreaking-up-ibis-expressions.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fbreaking-up-ibis-expressions.png" alt="Decomposed Ibis Expressions"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int64&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int64&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;purchase_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float64&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;row_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int64&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;first_expr_for&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mutate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;row_number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;row_number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;over&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;group_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;order_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;row_number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;first_expr_for&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we replace the UnboundTable expression with the DuckDB backend and execute it with &lt;code&gt;to_parquet&lt;/code&gt; method&lt;sup id="fnref4"&gt;4&lt;/sup&gt;. This step is covered by the &lt;code&gt;checkpoint_parquet&lt;/code&gt; operator that we added to pandas backend above.  Here is an &lt;a href="https://ibis-project.org/posts/into-snowflake/" rel="noopener noreferrer"&gt;excellent blog&lt;/a&gt; that discusses inserting data into Snowflake from any Ibis backend with &lt;code&gt;to_pyarrow&lt;/code&gt; functionality.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;landing/orders.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;duck_backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;duck_backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE TABLE orders as SELECT * from data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;bind_to_duckdb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;replace_unbound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_expr_for&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;duck_backend&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;bind_to_duckdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_staging&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;to_sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind_to_duckdb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the above step creates the de-duplicated table, we can then send data to Snowflake using the pandas backend. This functionality is covered by &lt;code&gt;create_table_snowflake&lt;/code&gt; operator that we added to pandas backend above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;second_expr_for&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# nothing special just a reading the data from orders table
&lt;/span&gt;&lt;span class="n"&gt;snow_backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;second_expr_for&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pandas_backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pandas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_staging&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
&lt;span class="n"&gt;snow_backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pandas_backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_pyarrow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;second_expr_for&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we can select the data from the Snowflake table to verify that the data has been loaded successfully.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;third_expr_for&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ibis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;T_ORDERS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# add you Snowflake ML functions here
&lt;/span&gt;&lt;span class="n"&gt;third_expr_for&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fmulti-engine-data-stack-ibisified.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.letsql.com%2Fposts%2Fmulti-engine-data-stack-ibis%2Fimages%2Fmulti-engine-data-stack-ibisified.png" alt="Ibisified!"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We successfully broke up our computation in pieces, albeit manually, and executed them across DuckDB, pandas, and Snowflake. This demonstrates the flexibility and power of a multi-engine data stack, allowing users to leverage the strengths of different engines to optimize their data processing pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Acknowledgments
&lt;/h2&gt;

&lt;p&gt;I'd like to thank Neal Richardson, Dan Lovell and Daniel Mesejo for providing the initial feedback on the post. I highly appreciate the early review and encouragement by Wes McKinney.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://wesmckinney.com/blog/looking-back-15-years/" rel="noopener noreferrer"&gt;The Road to Composable Data Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://voltrondata.com/codex.html" rel="noopener noreferrer"&gt;The Composable Codex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arrow.apache.org/" rel="noopener noreferrer"&gt;Apache Arrow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Multi-Engine Data Stack Newsleter &lt;a href="https://juhache.substack.com/p/multi-engine-data-stack-v0" rel="noopener noreferrer"&gt;v0&lt;/a&gt; &lt;a href="https://juhache.substack.com/p/multi-engine-data-stack-v1?utm_source=profile&amp;amp;utm_medium=reader2" rel="noopener noreferrer"&gt;v1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ibis-project.org/" rel="noopener noreferrer"&gt;Ibis, the portable dataframe library&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.getdbt.com/docs/collaborate/documentation" rel="noopener noreferrer"&gt;dbt Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.dagster.io/getting-started" rel="noopener noreferrer"&gt;Dagster Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lancedb.com/" rel="noopener noreferrer"&gt;LanceDB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kuzudb.com/" rel="noopener noreferrer"&gt;KuzuDB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://duckdb.org/" rel="noopener noreferrer"&gt;DuckDB&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;In this post, we have primarily focused on v0 of the multi-engine data stack. In the latest version, Apache Iceberg is included as a storage and data format layer. NYC Taxi data is used instead of the random Orders data treated in this and v0 of the posts.  ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Orchestration Vs fine-grained scheduling: ↩&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The orchestration is left to the reader. The orchestration can be done using a tool like Dagster, Prefect, or Apache Airflow.&lt;/li&gt;
&lt;li&gt;The fine-grained scheduling can be done using a tool like Dask, Ray, or Spark. &lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;Some of the examples of in-process databases is described in &lt;a href="https://thedataquarry.com/posts/embedded-db-1/" rel="noopener noreferrer"&gt;this post&lt;/a&gt; extending DuckDB example above to newer purpose built databases like LanceDB and KuzuDB.   ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;The Ibis &lt;a href="https://ibis-project.org/how-to/extending/unbound_expression" rel="noopener noreferrer"&gt;docs&lt;/a&gt; use &lt;code&gt;backend.to_pandas(expr)&lt;/code&gt; commands to bind and run the expression in the same go. Instead, we use &lt;code&gt;replace_unbound&lt;/code&gt; method to show a generic way to just compile the expression and not execute it to said backend. This is just for illustration purposes. All the code below, uses the &lt;code&gt;backend.to_pyarrow&lt;/code&gt; methods from here on. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>python</category>
      <category>sql</category>
      <category>pandas</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
