<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Theodore P.</title>
    <description>The latest articles on DEV Community by Theodore P. (@theodore_p_9749548f7dd03).</description>
    <link>https://dev.to/theodore_p_9749548f7dd03</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2192744%2F3caf6e4b-c4fb-4987-8fe8-6eb07542622f.jpg</url>
      <title>DEV Community: Theodore P.</title>
      <link>https://dev.to/theodore_p_9749548f7dd03</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/theodore_p_9749548f7dd03"/>
    <language>en</language>
    <item>
      <title>Streaming SQL Engine: Lightweight Cross-Data Source Integration for Resource-Constrained Environments.</title>
      <dc:creator>Theodore P.</dc:creator>
      <pubDate>Mon, 05 Jan 2026 11:28:45 +0000</pubDate>
      <link>https://dev.to/theodore_p_9749548f7dd03/streaming-sql-engine-lightweight-cross-data-source-integration-for-resource-constrained-1o2l</link>
      <guid>https://dev.to/theodore_p_9749548f7dd03/streaming-sql-engine-lightweight-cross-data-source-integration-for-resource-constrained-1o2l</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feph870fnrybk2b2fxv3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feph870fnrybk2b2fxv3j.png" alt=" " width="800" height="717"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One month ago, I shared how we deployed a streaming SQL engine that lets you many data sources using standard SQL syntax. &lt;/p&gt;

&lt;p&gt;Original post is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/theodore_p_9749548f7dd03/how-i-built-a-python-library-that-lets-you-join-mysql-postgresql-mongodb-rest-apis-and-files-in-h5d"&gt;https://dev.to/theodore_p_9749548f7dd03/how-i-built-a-python-library-that-lets-you-join-mysql-postgresql-mongodb-rest-apis-and-files-in-h5d&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Today, I want to present our research paper: "Streaming SQL Engine: Lightweight Cross-Data Source Integration for Resource-Constrained Environments."&lt;/p&gt;

&lt;p&gt;Read it here: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Ierofantis/streaming_sql_engine/blob/master/Streaming_SQL_Engine_Paper.pdf" rel="noopener noreferrer"&gt;https://github.com/Ierofantis/streaming_sql_engine/blob/master/Streaming_SQL_Engine_Paper.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The problem we aimed to solve is simple: many organizations need ETL and big data capabilities without the overhead of heavy infrastructure. &lt;/p&gt;

&lt;p&gt;Not every team can spin up large compute clusters or dedicate substantial resources to data pipelines.&lt;/p&gt;

&lt;p&gt;Our solution is a streaming, iterator-based SQL engine optimized for memory efficiency and cross-source integration.&lt;/p&gt;

&lt;p&gt;This work isn’t about claiming the "fastest" solution, it’s about providing a practical, optimized tool that solves real problems for teams with limited resources.&lt;/p&gt;

&lt;p&gt;All comparisons are transparent, with fair benchmarks. &lt;/p&gt;

&lt;p&gt;I believe the best solutions come from honest evaluations of trade-offs, not marketing hype.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>dataengineering</category>
      <category>python</category>
      <category>sql</category>
    </item>
    <item>
      <title>Join Data from Anywhere: The Streaming SQL Engine That Bridges Databases, APIs, and Files</title>
      <dc:creator>Theodore P.</dc:creator>
      <pubDate>Tue, 16 Dec 2025 21:55:43 +0000</pubDate>
      <link>https://dev.to/theodore_p_9749548f7dd03/how-i-built-a-python-library-that-lets-you-join-mysql-postgresql-mongodb-rest-apis-and-files-in-h5d</link>
      <guid>https://dev.to/theodore_p_9749548f7dd03/how-i-built-a-python-library-that-lets-you-join-mysql-postgresql-mongodb-rest-apis-and-files-in-h5d</guid>
      <description>&lt;p&gt;Have you ever needed to join data from a MySQL database with a PostgreSQL database, a MongoDB collection, and a REST API all in one query? Traditional databases can't do this. That's why I built the Streaming SQL Engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Data Lives Everywhere
&lt;/h2&gt;

&lt;p&gt;Modern applications don't store all their data in one place. You might have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User data in PostgreSQL&lt;/li&gt;
&lt;li&gt;Order data in MySQL&lt;/li&gt;
&lt;li&gt;Product catalog in MongoDB&lt;/li&gt;
&lt;li&gt;Pricing information from a REST API&lt;/li&gt;
&lt;li&gt;Inventory data in CSV files&lt;/li&gt;
&lt;li&gt;Product feeds in XML files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; How do you join all this data together?&lt;/p&gt;

&lt;p&gt;Traditional solutions require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exporting data from each system&lt;/li&gt;
&lt;li&gt;Importing into a central database&lt;/li&gt;
&lt;li&gt;Writing complex ETL pipelines&lt;/li&gt;
&lt;li&gt;Maintaining data synchronization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;There had to be a better way.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Streaming SQL Engine
&lt;/h2&gt;

&lt;p&gt;I built a lightweight Python library that lets you join data from &lt;strong&gt;any source&lt;/strong&gt; using standard SQL syntax without exporting, importing, or setting up infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;streaming_sql_engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pymysql&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Register PostgreSQL source (iterator function)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;postgres_users&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mydb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name, email FROM users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres_users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postgres_users&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register MySQL source (iterator function)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mysql_products&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pymysql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mydb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name, price FROM products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mysql_products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mysql_products&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register MongoDB source (iterator function)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mongo_inventory&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mongodb://localhost:27017&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mydb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inventory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mongo_inventory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mongo_inventory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register REST API source (iterator function)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;api_prices&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com/prices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_prices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_prices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register CSV source (iterator function)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;csv_suppliers&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suppliers.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DictReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csv_suppliers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;csv_suppliers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Join them all in one SQL query!
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    SELECT
        mysql_products.name,
        postgres_users.email,
        mongo_inventory.quantity,
        api_prices.price,
        csv_suppliers.supplier_name
    FROM mysql_products
    JOIN postgres_users ON mysql_products.user_id = postgres_users.id
    JOIN mongo_inventory ON mysql_products.sku = mongo_inventory.sku
    JOIN api_prices ON mysql_products.sku = api_prices.sku
    JOIN csv_suppliers ON mysql_products.supplier_id = csv_suppliers.id
    WHERE api_prices.price &amp;gt; 100
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That's it.&lt;/strong&gt; No clusters, no infrastructure, no data export - just pure Python and SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built This: The Problem That Needed Solving
&lt;/h2&gt;

&lt;p&gt;I was working on a data reconciliation project where I needed to join data from multiple sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MySQL database (product catalog)&lt;/li&gt;
&lt;li&gt;PostgreSQL database (user data)&lt;/li&gt;
&lt;li&gt;MongoDB collection (inventory)&lt;/li&gt;
&lt;li&gt;REST API (pricing information)&lt;/li&gt;
&lt;li&gt;CSV files (supplier data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; Traditional databases can't join across different systems. I had three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Export everything to one database&lt;/strong&gt; - Time-consuming, requires ETL pipelines, data becomes stale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write custom Python code&lt;/strong&gt; - Complex, error-prone, hard to maintain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use existing tools&lt;/strong&gt; - Spark/Flink require clusters, DuckDB requires data import, Presto needs infrastructure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;None of these worked for my use case.&lt;/strong&gt; I needed something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Could join data from different systems without export&lt;/li&gt;
&lt;li&gt;Was simple to use (SQL syntax)&lt;/li&gt;
&lt;li&gt;Required zero infrastructure&lt;/li&gt;
&lt;li&gt;Worked with Python natively&lt;/li&gt;
&lt;li&gt;Processed data efficiently (streaming)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built the Streaming SQL Engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It: Architecture and Design Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Design Philosophy
&lt;/h3&gt;

&lt;p&gt;The engine follows a &lt;strong&gt;pipeline architecture&lt;/strong&gt; inspired by database query execution engines (like PostgreSQL and SQLite), but implemented in pure Python using iterators.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Python's iterator protocol is perfect for streaming data. Each operator in the pipeline is an iterator that processes rows one at a time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Iterator-Based Architecture?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Key advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory efficiency:&lt;/strong&gt; Only one row in memory at a time (except for join indexes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lazy evaluation:&lt;/strong&gt; Processing starts only when you iterate over results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composability:&lt;/strong&gt; Operators can be chained arbitrarily&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility:&lt;/strong&gt; Easy to add new operators&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python-native:&lt;/strong&gt; Uses standard Python iterator protocol&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance:&lt;/strong&gt; Python iterators are slower than compiled code, but flexibility is worth it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory:&lt;/strong&gt; Join indexes require memory, but this is necessary for efficient joins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity:&lt;/strong&gt; Iterator chains can be complex, but they're composable and testable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Expression Evaluation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How WHERE clauses are evaluated:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The engine uses recursive AST traversal to evaluate expressions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why recursive:&lt;/strong&gt; SQL expressions are trees. Recursive evaluation naturally handles nested expressions like &lt;code&gt;(a &amp;gt; 10 AND b &amp;lt; 20) OR c = 5&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Join Algorithm Selection
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How the engine chooses join algorithms:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The engine follows this priority order when selecting a join algorithm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check if both sides are sorted&lt;/strong&gt; (&lt;code&gt;ordered_by&lt;/code&gt; metadata)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;If yes, use &lt;code&gt;MergeJoinIterator&lt;/code&gt; (most memory-efficient for sorted data)&lt;/li&gt;
&lt;li&gt;Only used when &lt;code&gt;use_polars=False&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check if right side is a file&lt;/strong&gt; (&lt;code&gt;filename&lt;/code&gt; metadata)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;If yes, use &lt;code&gt;MmapLookupJoinIterator&lt;/code&gt; (memory-mapped, 90-99% memory reduction)&lt;/li&gt;
&lt;li&gt;Only used when &lt;code&gt;use_polars=False&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check if Polars is available&lt;/strong&gt; (&lt;code&gt;use_polars&lt;/code&gt; flag)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;If yes, use &lt;code&gt;PolarsLookupJoinIterator&lt;/code&gt; (vectorized, SIMD-accelerated)&lt;/li&gt;
&lt;li&gt;Used when &lt;code&gt;use_polars=True&lt;/code&gt; is explicitly set&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Default fallback&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;LookupJoinIterator&lt;/code&gt; (Python hash-based, most compatible)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; When &lt;code&gt;use_polars=False&lt;/code&gt; (default), Merge Join and MMAP Join are checked first before falling back to Python Lookup Join. When &lt;code&gt;use_polars=True&lt;/code&gt; is explicitly set, the engine prioritizes Polars over MMAP and Merge Join.&lt;/p&gt;

&lt;h3&gt;
  
  
  Protocol-Based Optimization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The key innovation:&lt;/strong&gt; Automatic optimization detection via function signature inspection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Engine inspects source function signature using &lt;code&gt;inspect.signature()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;If function accepts &lt;code&gt;dynamic_where&lt;/code&gt; or &lt;code&gt;dynamic_columns&lt;/code&gt; parameters, it supports optimizations&lt;/li&gt;
&lt;li&gt;Engine passes optimization parameters automatically&lt;/li&gt;
&lt;li&gt;Source function applies optimizations (filter pushdown, column pruning)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No flags needed:&lt;/strong&gt; Detection is automatic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible:&lt;/strong&gt; Any Python function can be a source&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backward compatible:&lt;/strong&gt; Simple sources still work (no optimizations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible:&lt;/strong&gt; Easy to add new optimization parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simple source (no optimizations)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simple_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;

&lt;span class="c1"&gt;# Optimized source (with protocol)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;optimized_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; FROM table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; WHERE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;execute_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Both work the same way:
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;simple_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# No optimizations
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimized_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Optimizations apply automatically!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memory Management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Key strategies:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Streaming:&lt;/strong&gt; Process one row at a time, never load full tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Join indexes:&lt;/strong&gt; Only right side of joins is materialized (necessary for lookups)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory-mapped files:&lt;/strong&gt; For large JSONL files, use OS virtual memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column pruning:&lt;/strong&gt; Only extract needed columns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter pushdown:&lt;/strong&gt; Filter at source, reduce data transfer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Memory footprint:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Left side of join:&lt;/strong&gt; O(1) - one row at a time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Right side of join:&lt;/strong&gt; O(n) - hash index in memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total:&lt;/strong&gt; O(n) where n = size of right side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; In most queries, the right side is smaller (e.g., joining large product table with small category table). The engine is designed to put smaller tables on the right side.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Optimizations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Polars Vectorization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When Polars is available, the engine uses vectorized operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Instead of row-by-row filtering:
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;

&lt;span class="c1"&gt;# Use Polars batch filtering:
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;filtered_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;filtered_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;named&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why faster:&lt;/strong&gt; Polars uses SIMD instructions and columnar processing, 10-200x faster than row-by-row Python loops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Memory-Mapped Joins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For large JSONL files, use OS virtual memory instead of loading into RAM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Instead of loading entire file:
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;large_file.jsonl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# 10GB in memory!
&lt;/span&gt;
&lt;span class="c1"&gt;# Use memory-mapped file:
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;large_file.jsonl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fileno&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;access&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ACCESS_READ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# OS handles memory, can process files larger than RAM
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; OS virtual memory allows accessing file data without loading it all into RAM. 90-99% memory reduction for large files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. First Match Only Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For joins where only the first match matters (prevents Cartesian products).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why useful:&lt;/strong&gt; Prevents Cartesian products when right side has duplicate keys. Significantly reduces output size and processing time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Column Pruning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Only extracts columns needed for the query, reducing I/O and memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Query only requests 'name' and 'price'
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT name, price FROM products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Engine automatically requests only these columns from source
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ['name', 'price']
&lt;/span&gt;    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; FROM table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Filter Pushdown&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pushes WHERE conditions to data sources when possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Query has WHERE clause
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM products WHERE price &amp;gt; 100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Engine automatically pushes filter to source
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM table WHERE &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;6. Merge Joins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Efficient joins for pre-sorted data (O(n+m) time complexity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Register with ordered_by to enable merge join
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;users_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ordered_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Engine uses merge join algorithm
# Both sides must be sorted by join key
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Design Decisions and Trade-offs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Why Python iterators instead of compiled code?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision:&lt;/strong&gt; Use Python iterators for flexibility&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Slower than compiled code, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with any Python data source&lt;/li&gt;
&lt;li&gt;Easy to extend and customize&lt;/li&gt;
&lt;li&gt;No compilation step needed&lt;/li&gt;
&lt;li&gt;Python-native integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Why separate logical planning from execution?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision:&lt;/strong&gt; Two-phase approach (plan then execute)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Extra step, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enables query optimization&lt;/li&gt;
&lt;li&gt;Easier to test and debug&lt;/li&gt;
&lt;li&gt;Can reuse planning logic&lt;/li&gt;
&lt;li&gt;Clear separation of concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Why protocol-based optimization instead of flags?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision:&lt;/strong&gt; Automatic detection via function signature&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Slightly more complex detection, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No flags to remember&lt;/li&gt;
&lt;li&gt;Automatic optimization&lt;/li&gt;
&lt;li&gt;Backward compatible&lt;/li&gt;
&lt;li&gt;More Pythonic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Why materialize right side of joins?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision:&lt;/strong&gt; Build hash index on right side&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Memory usage, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O(1) lookup time per left row&lt;/li&gt;
&lt;li&gt;Necessary for efficient joins&lt;/li&gt;
&lt;li&gt;Can use memory-mapped files for large files&lt;/li&gt;
&lt;li&gt;Standard database approach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Why limit SQL features (no GROUP BY, aggregations)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision:&lt;/strong&gt; Focus on joins and filtering&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Less SQL support, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simpler implementation&lt;/li&gt;
&lt;li&gt;Faster execution&lt;/li&gt;
&lt;li&gt;Focuses on core use case (cross-system joins)&lt;/li&gt;
&lt;li&gt;Can add later if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Result
&lt;/h3&gt;

&lt;p&gt;A lightweight Python library that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Joins data from any source using SQL&lt;/li&gt;
&lt;li&gt;Processes data row-by-row (streaming)&lt;/li&gt;
&lt;li&gt;Requires zero infrastructure&lt;/li&gt;
&lt;li&gt;Automatically optimizes when possible&lt;/li&gt;
&lt;li&gt;Works with any Python iterator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Python's iterator protocol is perfect for streaming SQL execution. By combining SQL parsing, logical planning, and iterator-based execution, I created a tool that solves a real problem: joining data from different systems without complex infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: Streaming Architecture
&lt;/h2&gt;

&lt;p&gt;The engine processes data &lt;strong&gt;row-by-row&lt;/strong&gt;, never loading entire tables into memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SQL Query
    |
Parser -&amp;gt; AST
    |
Planner -&amp;gt; Logical Plan
    |
Executor -&amp;gt; Iterator Pipeline
    |
Results (Generator)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Iterator Pipeline:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ScanIterator -&amp;gt; FilterIterator -&amp;gt; JoinIterators -&amp;gt; ProjectIterator -&amp;gt; Results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each iterator processes rows incrementally, enabling true streaming execution. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low memory footprint&lt;/li&gt;
&lt;li&gt;Can process data larger than RAM&lt;/li&gt;
&lt;li&gt;Results yielded immediately&lt;/li&gt;
&lt;li&gt;No buffering required&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Supported Data Sources
&lt;/h2&gt;

&lt;p&gt;The engine works with &lt;strong&gt;any Python iterator&lt;/strong&gt;, making it incredibly flexible:&lt;/p&gt;

&lt;h3&gt;
  
  
  Databases
&lt;/h3&gt;

&lt;p&gt;All databases are accessed via Python iterator functions. The engine doesn't use connectors - it works with any Python function that returns an iterator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; - Create iterator function that queries PostgreSQL and yields rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MySQL&lt;/strong&gt; - Create iterator function that queries MySQL and yields rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MongoDB&lt;/strong&gt; - Create iterator function that queries MongoDB and yields documents&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Files
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSV&lt;/strong&gt; - Standard CSV files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSONL&lt;/strong&gt; - JSON Lines format with memory-mapped joins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt; - Standard JSON files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XML&lt;/strong&gt; - XML parsing with ElementTree&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  APIs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;REST APIs&lt;/strong&gt; - Any HTTP endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GraphQL&lt;/strong&gt; - Via custom functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSockets&lt;/strong&gt; - Streaming data sources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Custom Sources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Any Python function&lt;/strong&gt; that returns an iterator&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generators&lt;/strong&gt; - Perfect for streaming data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom transformations&lt;/strong&gt; - Apply Python logic between joins&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  SQL Features
&lt;/h2&gt;

&lt;p&gt;The engine supports standard SQL syntax:&lt;/p&gt;

&lt;h3&gt;
  
  
  Supported Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SELECT&lt;/strong&gt; - Column selection, aliasing, table-qualified columns&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;JOIN&lt;/strong&gt; - INNER JOIN and LEFT JOIN with equality conditions&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;table1&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;table2&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;table3&lt;/span&gt; &lt;span class="n"&gt;t3&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;WHERE&lt;/strong&gt; - Comparisons, boolean logic, NULL checks, IN clauses&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Arithmetic&lt;/strong&gt; - Addition, subtraction, multiplication, division, modulo&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;discount&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;final_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;unit_price&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Not Supported
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GROUP BY and aggregations (COUNT, SUM, AVG)&lt;/li&gt;
&lt;li&gt;ORDER BY&lt;/li&gt;
&lt;li&gt;HAVING&lt;/li&gt;
&lt;li&gt;Subqueries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;These limitations keep the engine focused on joins and filtering - its core strength.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example 1: Microservices Data Integration
&lt;/h3&gt;

&lt;p&gt;In a microservices architecture, data is distributed across services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;streaming_sql_engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pymysql&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Service 1: User service (PostgreSQL) - iterator function
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;users_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name, email FROM users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;users_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Service 2: Order service (MySQL) - iterator function
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;orders_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pymysql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3306&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, user_id, total FROM orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orders_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Service 3: Payment service (REST API) - iterator function
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;payment_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://payments.service/api/transactions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payment_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Join across services
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    SELECT users.name, orders.total, payments.status
    FROM users
    JOIN orders ON users.id = orders.user_id
    JOIN payments ON orders.id = payments.order_id
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; No need for a shared database or complex ETL pipelines. The engine accepts any Python function that returns an iterator, making it incredibly flexible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Real-Time Price Comparison
&lt;/h3&gt;

&lt;p&gt;Compare prices from multiple XML feeds and match with MongoDB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_xml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ET&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.//product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xml1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;parse_xml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prices1.xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xml2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;parse_xml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prices2.xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mongo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mongo_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    SELECT
        xml1.ean,
        xml1.price AS price1,
        xml2.price AS price2,
        mongo.sf_sku
    FROM xml1
    JOIN xml2 ON xml1.ean = xml2.ean
    JOIN mongo ON xml1.ean = mongo.ean
    WHERE xml1.price != xml2.price
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 3: Python Processing Between Joins
&lt;/h3&gt;

&lt;p&gt;Apply Python logic (ML models, custom functions) between joins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enriched_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Source that processes data with Python before joining&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mydb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name, category_id FROM products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
        &lt;span class="c1"&gt;# Apply Python logic
&lt;/span&gt;        &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ml_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;custom_field&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;custom_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;categories_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mydb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id, name FROM categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enriched_products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enriched_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;categories_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    SELECT p.name, p.ml_score, c.category_name
    FROM enriched_products p
    JOIN categories c ON p.category_id = c.id
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Seamless integration with Python ecosystem - use any library, apply any logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison with Alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  vs DuckDB
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Streaming SQL Engine&lt;/th&gt;
&lt;th&gt;DuckDB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cross-system joins&lt;/td&gt;
&lt;td&gt;✅ Direct&lt;/td&gt;
&lt;td&gt;⚠️ Requires import&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API + Database join&lt;/td&gt;
&lt;td&gt;✅ Direct&lt;/td&gt;
&lt;td&gt;❌ Must export API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time streaming&lt;/td&gt;
&lt;td&gt;✅ True streaming&lt;/td&gt;
&lt;td&gt;⚠️ Buffering needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python processing&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;⚠️ Export/import&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GROUP BY / Aggregations&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;td&gt;✅ Full support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance (same DB)&lt;/td&gt;
&lt;td&gt;⚠️ Moderate&lt;/td&gt;
&lt;td&gt;✅ Very fast&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use Streaming SQL Engine when:&lt;/strong&gt; You need to join data from different systems that can't be imported into DuckDB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use DuckDB when:&lt;/strong&gt; All data can be imported and you need aggregations.&lt;/p&gt;

&lt;h3&gt;
  
  
  vs Pandas
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Streaming SQL Engine&lt;/th&gt;
&lt;th&gt;Pandas&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cross-system joins&lt;/td&gt;
&lt;td&gt;✅ Direct&lt;/td&gt;
&lt;td&gt;❌ Must load all data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory efficiency&lt;/td&gt;
&lt;td&gt;✅ Streaming&lt;/td&gt;
&lt;td&gt;❌ Loads all in memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL syntax&lt;/td&gt;
&lt;td&gt;✅ Standard SQL&lt;/td&gt;
&lt;td&gt;❌ DataFrame API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large datasets&lt;/td&gt;
&lt;td&gt;✅ Can exceed RAM&lt;/td&gt;
&lt;td&gt;❌ Limited by RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time data&lt;/td&gt;
&lt;td&gt;✅ Streaming&lt;/td&gt;
&lt;td&gt;❌ Batch only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File formats&lt;/td&gt;
&lt;td&gt;✅ Any Python iterator&lt;/td&gt;
&lt;td&gt;⚠️ Limited formats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance (large data)&lt;/td&gt;
&lt;td&gt;✅ Streaming&lt;/td&gt;
&lt;td&gt;⚠️ Slower for large data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use Streaming SQL Engine when:&lt;/strong&gt; You need to join data from different systems, process data larger than RAM, or use SQL syntax.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Pandas when:&lt;/strong&gt; All data fits in memory and you prefer DataFrame API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Are There Other Tools Like This?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Short answer: Not exactly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While there are many tools that do &lt;strong&gt;parts&lt;/strong&gt; of what Streaming SQL Engine does, none combine all these characteristics:&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes Streaming SQL Engine Unique
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Zero Infrastructure + Cross-System Joins&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Most tools require clusters (Spark, Flink, Drill)&lt;/li&gt;
&lt;li&gt;Or require specific infrastructure (ksqlDB needs Kafka)&lt;/li&gt;
&lt;li&gt;Streaming SQL Engine: Just Python&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Any Python Iterator as Data Source&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Most tools require specific connectors&lt;/li&gt;
&lt;li&gt;Streaming SQL Engine: Any Python function works&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Direct API Joins&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Most tools can't join REST APIs directly&lt;/li&gt;
&lt;li&gt;Streaming SQL Engine: Native support&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Python-Native Architecture&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Most tools are Java/Rust with Python wrappers&lt;/li&gt;
&lt;li&gt;Streaming SQL Engine: Pure Python, seamless integration&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Similar Tools (But Different)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Apache Drill&lt;/strong&gt; - Similar cross-system capability, but requires cluster and Java&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ksqlDB&lt;/strong&gt; - Streaming SQL, but Kafka-only and requires infrastructure&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Materialize&lt;/strong&gt; - Streaming database, but requires database server&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DataFusion&lt;/strong&gt; - Fast SQL engine, but limited to Arrow/Parquet data&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Polars SQL&lt;/strong&gt; - Fast SQL, but requires loading data into DataFrames first&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presto/Trino&lt;/strong&gt; - Cross-system SQL, but requires cluster infrastructure&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None of these&lt;/strong&gt; combine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero infrastructure&lt;/li&gt;
&lt;li&gt;Any Python iterator as source&lt;/li&gt;
&lt;li&gt;Direct API joins&lt;/li&gt;
&lt;li&gt;Pure Python implementation&lt;/li&gt;
&lt;li&gt;Simple deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;That's what makes Streaming SQL Engine unique.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Here: The Most Secure Way
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Install and Basic Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streaming-sql-engine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Most Stable Join Option
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lookup Join (Default)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Python hash-based join - the default when no special metadata is provided.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to use:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# use_polars=False (default)
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;products_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# No special metadata - uses Lookup Join automatically
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Builds hash index on right side (loads all right rows into memory)&lt;/li&gt;
&lt;li&gt;Streams left side row-by-row&lt;/li&gt;
&lt;li&gt;Looks up matches in hash index (O(1) per lookup)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default choice - works with any data&lt;/li&gt;
&lt;li&gt;Small to medium datasets (&amp;lt; 1M rows)&lt;/li&gt;
&lt;li&gt;When right side fits in memory&lt;/li&gt;
&lt;li&gt;When data is not sorted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Most compatible (works with any data types)&lt;/li&gt;
&lt;li&gt;✅ No special requirements&lt;/li&gt;
&lt;li&gt;✅ Reliable and stable&lt;/li&gt;
&lt;li&gt;✅ Good performance for medium datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚠️ Loads entire right side into memory&lt;/li&gt;
&lt;li&gt;⚠️ Not optimal for very large right tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Some recommendations from the author.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Recommendation 1: Use Merge Join if Data is Sorted (Fastest + Lowest Memory)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Pre-sorted data or data that can be sorted once&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Merge Join requires use_polars=False
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;products_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ordered_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ordered_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it's best:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Lowest memory overhead&lt;/li&gt;
&lt;li&gt;✅ True streaming (both sides stream simultaneously)&lt;/li&gt;
&lt;li&gt;✅ Fast execution (comparable to other options)&lt;/li&gt;
&lt;li&gt;✅ Can handle datasets larger than RAM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is already sorted by join key&lt;/li&gt;
&lt;li&gt;Memory is a concern&lt;/li&gt;
&lt;li&gt;You can sort data once (e.g., sort JSONL files before processing)&lt;/li&gt;
&lt;li&gt;Both sides of join are sorted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example: Preparing Data for Merge Join&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Step 1: Sort your JSONL files (one-time operation)
# Use a utility script or database ORDER BY
&lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;sort_jsonl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl&lt;/span&gt; &lt;span class="n"&gt;products_sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;sort_jsonl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl&lt;/span&gt; &lt;span class="n"&gt;images_sorted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonl&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Use sorted files with Merge Join
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;load_jsonl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products_sorted.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ordered_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;load_jsonl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images_sorted.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ordered_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Query - Merge Join will be used automatically
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    SELECT products.product_id, products.title, images.image
    FROM products
    LEFT JOIN images ON products.product_id = images.product_id
    WHERE products.checked = 1
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Recommendation 2: Use MMAP Join if Data is NOT Sorted (Low Memory)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Unsorted data, large files, memory-constrained environments&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# MMAP Join requires use_polars=False
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;products_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it's best:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Low memory overhead&lt;/li&gt;
&lt;li&gt;✅ Works with unsorted data (no sorting required)&lt;/li&gt;
&lt;li&gt;✅ OS-managed memory mapping (can handle files larger than RAM)&lt;/li&gt;
&lt;li&gt;✅ 90-99% memory reduction compared to loading entire file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is NOT sorted (or sorting is expensive)&lt;/li&gt;
&lt;li&gt;Large files (&amp;gt; 100MB)&lt;/li&gt;
&lt;li&gt;Memory-constrained systems&lt;/li&gt;
&lt;li&gt;Files larger than available RAM&lt;/li&gt;
&lt;li&gt;You want low memory without sorting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;jsonl_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register with filename parameter to enable MMAP Join
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonl_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Recommendation 3: Use Polars + Optimizations for Best Throughput
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Maximum speed when you can filter early and normalize data types&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_match_only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;optimized_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Source with all optimizations:
    - Filter pushdown (dynamic_where)
    - Column pruning (dynamic_columns)
    - Data normalization (for Polars)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Build optimized query
&lt;/span&gt;    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dynamic_where&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dynamic_columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;execute_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Normalize types for Polars stability
&lt;/span&gt;        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nf"&gt;normalize_types&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;optimized_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;images_source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it's best:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Fastest execution time &lt;/li&gt;
&lt;li&gt;✅ Low memory overhead &lt;/li&gt;
&lt;li&gt;✅ Early filtering reduces data volume significantly&lt;/li&gt;
&lt;li&gt;✅ Vectorized operations (SIMD acceleration)&lt;/li&gt;
&lt;li&gt;✅ All optimizations combined (filter pushdown + column pruning)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can filter data early (WHERE clause can be pushed to source)&lt;/li&gt;
&lt;li&gt;Data types are consistent (can normalize)&lt;/li&gt;
&lt;li&gt;Speed is priority&lt;/li&gt;
&lt;li&gt;Polars is available&lt;/li&gt;
&lt;li&gt;You want maximum performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚠️ Requires protocol support in source function (&lt;code&gt;dynamic_where&lt;/code&gt;, &lt;code&gt;dynamic_columns&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;⚠️ Requires data normalization for Polars stability&lt;/li&gt;
&lt;li&gt;⚠️ Requires Polars dependency&lt;/li&gt;
&lt;li&gt;⚠️ Processes fewer rows (because of early filtering)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Summary: Choosing the Right Configuration for 100K+ Records
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Follow this decision tree:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Is your data sorted (or can you sort it)?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;YES&lt;/strong&gt; Use &lt;strong&gt;Merge Join&lt;/strong&gt; (&lt;code&gt;ordered_by&lt;/code&gt; parameter)

&lt;ul&gt;
&lt;li&gt;Best memory efficiency &lt;/li&gt;
&lt;li&gt;Fast execution &lt;/li&gt;
&lt;li&gt;True streaming&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Is your data NOT sorted?&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;YES&lt;/strong&gt; Use &lt;strong&gt;MMAP Join&lt;/strong&gt; (&lt;code&gt;filename&lt;/code&gt; parameter)

&lt;ul&gt;
&lt;li&gt;Low memory &lt;/li&gt;
&lt;li&gt;Works with unsorted data&lt;/li&gt;
&lt;li&gt;Good for large files&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Do you need maximum throughput and can filter early?&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;YES&lt;/strong&gt; Use &lt;strong&gt;Polars + Optimizations&lt;/strong&gt; (&lt;code&gt;use_polars=True&lt;/code&gt; + protocols)

&lt;ul&gt;
&lt;li&gt;Fastest execution &lt;/li&gt;
&lt;li&gt;Low memory &lt;/li&gt;
&lt;li&gt;Requires protocol support&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Default fallback:&lt;/strong&gt; If none of the above apply, use &lt;strong&gt;Lookup Join&lt;/strong&gt; (default Python) - it's stable and works with any data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Comparison Table: 100K+ Records (150K+ Joined Rows)
&lt;/h3&gt;

&lt;p&gt;Based on comprehensive benchmarks with &lt;strong&gt;150,048 joined rows&lt;/strong&gt; (50,089 products with ~3 images each, filtered to &lt;code&gt;checked=1&lt;/code&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Rows Processed&lt;/th&gt;
&lt;th&gt;Time (s)&lt;/th&gt;
&lt;th&gt;Memory Overhead (MB)&lt;/th&gt;
&lt;th&gt;CPU %&lt;/th&gt;
&lt;th&gt;Throughput (rows/s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Merge Join (Sorted Data)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150,048&lt;/td&gt;
&lt;td&gt;159.51&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;69.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.9%&lt;/td&gt;
&lt;td&gt;941&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Lookup Join (Default Python)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150,048&lt;/td&gt;
&lt;td&gt;159.44&lt;/td&gt;
&lt;td&gt;236.12&lt;/td&gt;
&lt;td&gt;1.8%&lt;/td&gt;
&lt;td&gt;941&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Polars Join&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150,048&lt;/td&gt;
&lt;td&gt;157.74&lt;/td&gt;
&lt;td&gt;285.64&lt;/td&gt;
&lt;td&gt;4.1%&lt;/td&gt;
&lt;td&gt;951&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4. MMAP Join&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150,048&lt;/td&gt;
&lt;td&gt;165.74&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;77.48&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5.0%&lt;/td&gt;
&lt;td&gt;905&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5. Polars + Column Pruning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150,048&lt;/td&gt;
&lt;td&gt;157.72&lt;/td&gt;
&lt;td&gt;243.85&lt;/td&gt;
&lt;td&gt;3.1%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;951&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;6. Polars + Filter Pushdown&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150,048&lt;/td&gt;
&lt;td&gt;157.74&lt;/td&gt;
&lt;td&gt;239.38&lt;/td&gt;
&lt;td&gt;2.0%&lt;/td&gt;
&lt;td&gt;951&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;7. All Optimizations Combined&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50,089&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;54.30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;46.68&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6.6%&lt;/td&gt;
&lt;td&gt;922&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Configuration 7 processes fewer rows (50,089 vs 150,048) because it applies filter pushdown early, filtering products with &lt;code&gt;checked=1&lt;/code&gt; before the join. This demonstrates the power of early filtering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Insights for 100K+ Records
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fastest Execution:&lt;/strong&gt; All Optimizations Combined (54.30s)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses Polars Join + Column Pruning + Filter Pushdown + early filtering&lt;/li&gt;
&lt;li&gt;Processes 50,089 rows (filtered early) vs 150,048 without filtering&lt;/li&gt;
&lt;li&gt;Best for speed priority when you can filter early&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lowest Memory (Full Join):&lt;/strong&gt; Merge Join (69.50 MB)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;True streaming, no index needed&lt;/li&gt;
&lt;li&gt;Best for memory-constrained environments with sorted data&lt;/li&gt;
&lt;li&gt;Processes all 150,048 rows with minimal memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lowest Memory (Unsorted Data):&lt;/strong&gt; MMAP Join (77.48 MB)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with unsorted data&lt;/li&gt;
&lt;li&gt;OS-managed memory mapping&lt;/li&gt;
&lt;li&gt;Best for large files when data is not sorted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Highest Throughput:&lt;/strong&gt; Polars + Column Pruning (951 rows/s)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vectorized operations + reduced I/O&lt;/li&gt;
&lt;li&gt;Best for processing large volumes&lt;/li&gt;
&lt;li&gt;Processes all 150,048 rows efficiently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best Balance:&lt;/strong&gt; Merge Join (159.51s, 69.50 MB)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Good speed with lowest memory overhead&lt;/li&gt;
&lt;li&gt;Best for sorted data when memory matters&lt;/li&gt;
&lt;li&gt;True streaming architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  By Dataset Size
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt; 10K rows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;use_polars=False&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;Fastest, most stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10K-100K rows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;use_polars=False&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;Still fastest, handles mixed types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;100K-1M rows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;See recommendations below&lt;/td&gt;
&lt;td&gt;Choose based on data characteristics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&amp;gt; 1M rows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All optimizations&lt;/td&gt;
&lt;td&gt;Maximum performance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;For 100K-1M rows, choose based on the data:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data is sorted&lt;/strong&gt; &lt;strong&gt;Merge Join&lt;/strong&gt; (&lt;code&gt;ordered_by&lt;/code&gt; parameter) &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data is NOT sorted&lt;/strong&gt; &lt;strong&gt;MMAP Join&lt;/strong&gt; (&lt;code&gt;filename&lt;/code&gt; parameter) - &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Need maximum speed + can filter early&lt;/strong&gt; &lt;strong&gt;Polars + Optimizations&lt;/strong&gt; (&lt;code&gt;use_polars=True&lt;/code&gt; + protocols)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Using Polars Without Normalization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Mixed types cause schema inference errors
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalized_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;raw_source&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall 2: Using MMAP Without Polars (Very Slow)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Very slow!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# MMAP uses Polars internally for index building if available
# But you still need use_polars=False for MMAP Join algorithm
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;use_polars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# MMAP Join requires this
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# MMAP will use Polars internally for faster index building
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall 3: Using MMAP for Small Files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# MMAP overhead &amp;gt; benefit for small files
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Slower!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# No filename for small files
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Faster for &amp;lt; 100MB
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Start Here (Most Secure)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Default: use_polars=False
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt; Most stable, handles all edge cases, works with any data types&lt;/p&gt;

&lt;h3&gt;
  
  
  Then Experiment
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add debug mode&lt;/strong&gt;: &lt;code&gt;Engine(debug=True)&lt;/code&gt; - See what's happening&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try Polars&lt;/strong&gt;: &lt;code&gt;Engine(use_polars=True)&lt;/code&gt; - For large datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try MMAP&lt;/strong&gt;: &lt;code&gt;filename="data.jsonl"&lt;/code&gt; - For large files &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try Merge Join&lt;/strong&gt;: &lt;code&gt;ordered_by="key"&lt;/code&gt; - For sorted data &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Streaming SQL Engine fills a unique niche: &lt;strong&gt;cross-system data integration&lt;/strong&gt;. While it may not match the raw performance of specialized tools for their specific use cases, it excels at joining data from different systems - a problem that traditional databases cannot solve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-system joins (databases, APIs, files)&lt;/li&gt;
&lt;li&gt;Zero infrastructure requirements&lt;/li&gt;
&lt;li&gt;Memory-efficient streaming architecture&lt;/li&gt;
&lt;li&gt;Python-native integration&lt;/li&gt;
&lt;li&gt;Automatic optimizations&lt;/li&gt;
&lt;li&gt;Simple deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best suited for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microservices data aggregation&lt;/li&gt;
&lt;li&gt;Cross-system ETL pipelines&lt;/li&gt;
&lt;li&gt;Real-time data integration&lt;/li&gt;
&lt;li&gt;Memory-constrained environments&lt;/li&gt;
&lt;li&gt;Python-native workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For cross-system data integration, the Streaming SQL Engine provides a unique solution that balances performance, simplicity, and flexibility.&lt;/p&gt;

</description>
      <category>python</category>
      <category>database</category>
      <category>dataengineering</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
