<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: vahid Saber</title>
    <description>The latest articles on DEV Community by vahid Saber (@vahid_saber_d47950be99729).</description>
    <link>https://dev.to/vahid_saber_d47950be99729</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2572510%2F93cb0e37-a74d-4fff-809a-505ffaf486ce.png</url>
      <title>DEV Community: vahid Saber</title>
      <link>https://dev.to/vahid_saber_d47950be99729</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vahid_saber_d47950be99729"/>
    <language>en</language>
    <item>
      <title>Add Data Quality Checks to Your Airflow DAG in 5 Minutes</title>
      <dc:creator>vahid Saber</dc:creator>
      <pubDate>Thu, 07 May 2026 20:28:58 +0000</pubDate>
      <link>https://dev.to/vahid_saber_d47950be99729/add-data-quality-checks-to-your-airflow-dag-in-5-minutes-7l5</link>
      <guid>https://dev.to/vahid_saber_d47950be99729/add-data-quality-checks-to-your-airflow-dag-in-5-minutes-7l5</guid>
      <description>&lt;p&gt;Most Airflow DAGs have zero data quality checks. The pipeline runs, data lands in the warehouse, and you find out something is wrong when a stakeholder asks why the dashboard numbers look off. Three days later.&lt;/p&gt;

&lt;p&gt;Adding quality checks feels like a project: pick a tool, configure it, write checks for every table, maintain them as schemas change. So it never happens.&lt;/p&gt;

&lt;p&gt;Here's how to add auto-generated data quality checks to any Airflow DAG in under 5 minutes. No configuration, no writing checks by hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: BashOperator (zero install beyond pip)
&lt;/h2&gt;

&lt;p&gt;If you already have DQLens installed in your Airflow environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.bash&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BashOperator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2026&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@daily&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;load_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;load_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python load_script.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;quality_check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dqlens init $DATABASE_URL --schema public &amp;amp;&amp;amp; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dqlens profile &amp;amp;&amp;amp; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dqlens run --ci --focus high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql://user:pass@host:5432/db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;load_data&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;quality_check&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. After your data loads, DQLens profiles every table, compares against the previous run, and fails the task if it finds HIGH severity problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 2: DQLensOperator (cleaner, typed)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;airflow-provider-dqlens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dqlens_airflow.operators&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DQLensOperator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2026&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@daily&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;load_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="n"&gt;quality_check&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DQLensOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conn_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;public&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;focus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;load_data&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;quality_check&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The operator reads your Airflow connection, profiles the database, and fails if problems are found. Results are pushed to XCom so downstream tasks can access them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it catches (without you writing anything)
&lt;/h2&gt;

&lt;p&gt;On the first run, DQLens profiles your tables and stores a baseline. On every subsequent run, it compares and flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Null rate spikes&lt;/strong&gt;: email column went from 0.1% null to 12% null&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Row count anomalies&lt;/strong&gt;: table grew 50% overnight (possible duplicate ingestion)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema drift&lt;/strong&gt;: a column was dropped or changed type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empty strings&lt;/strong&gt;: columns that pass not-null checks but carry no information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Freshness&lt;/strong&gt;: data that hasn't been updated recently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every finding has a severity level (HIGH / MEDIUM / LOW). The &lt;code&gt;focus="high"&lt;/code&gt; parameter means only structural problems (FK violations, schema changes, major null spikes) fail the task. Medium and low findings are logged but don't block the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not Great Expectations or Soda?
&lt;/h2&gt;

&lt;p&gt;Both require you to write every check by hand. Great Expectations needs Python expectation suites. Soda needs YAML check definitions. For 200 tables, that's days of work and ongoing maintenance as schemas change.&lt;/p&gt;

&lt;p&gt;DQLens generates checks automatically from your data. You add one task to your DAG and get coverage you never had to write.&lt;/p&gt;

&lt;h2&gt;
  
  
  Accessing results downstream
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.python&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PythonOperator&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ti&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;xcom_pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tables profiled: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tables_profiled&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Findings: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;findings_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Passed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;passed_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PythonOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;python_callable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;review_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;quality_check&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;review&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Supported databases
&lt;/h2&gt;

&lt;p&gt;PostgreSQL, DuckDB, SQLite, MySQL. The operator reads your Airflow connection type and builds the right connection URL automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;airflow-provider-dqlens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add one task to your DAG. Run it. See what it finds.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/vahid110/airflow-provider-dqlens" rel="noopener noreferrer"&gt;github.com/vahid110/airflow-provider-dqlens&lt;/a&gt;&lt;br&gt;
Core engine: &lt;a href="https://github.com/vahid110/dqlens" rel="noopener noreferrer"&gt;github.com/vahid110/dqlens&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If your DAG loads data but doesn't check it, you're flying blind. One task fixes that.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>dataengineering</category>
      <category>python</category>
      <category>dataquality</category>
    </item>
    <item>
      <title>What dbt Tests Miss (and How to Catch It Automatically)</title>
      <dc:creator>vahid Saber</dc:creator>
      <pubDate>Tue, 05 May 2026 12:42:22 +0000</pubDate>
      <link>https://dev.to/vahid_saber_d47950be99729/what-dbt-tests-miss-and-how-to-catch-it-automatically-il0</link>
      <guid>https://dev.to/vahid_saber_d47950be99729/what-dbt-tests-miss-and-how-to-catch-it-automatically-il0</guid>
      <description>&lt;p&gt;If you use dbt, you probably have some tests. A few &lt;code&gt;not_null&lt;/code&gt; checks, maybe &lt;code&gt;unique&lt;/code&gt; on your primary keys, possibly some &lt;code&gt;accepted_values&lt;/code&gt; on status columns.&lt;/p&gt;

&lt;p&gt;But be honest: how many of your columns actually have tests? 10%? 20%?&lt;/p&gt;

&lt;p&gt;The rest are untested. Not because you don't care, but because writing test YAML for 200 columns across 40 models is tedious work that never makes it to the top of the sprint.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap in dbt testing
&lt;/h2&gt;

&lt;p&gt;dbt tests are rule-based. You write a rule, it checks that rule. If you didn't write a rule, nothing gets checked. This creates three blind spots:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Drift goes undetected.&lt;/strong&gt;&lt;br&gt;
Your &lt;code&gt;email&lt;/code&gt; column had 0.1% nulls last month. Today it's 12%. No dbt test catches this because you never wrote one that says "null rate should stay below X%." You find out when a PM asks why the marketing numbers look off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Structural changes slip through.&lt;/strong&gt;&lt;br&gt;
A column gets dropped upstream. A type changes from integer to text. dbt won't tell you unless you wrote a test for that specific column. By the time your Spark job fails, the damage is downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Nobody tests what they don't know about.&lt;/strong&gt;&lt;br&gt;
Orphaned foreign keys, outlier values 10x beyond normal range, columns that are technically "not null" but 40% empty strings. These are real problems in real databases that nobody writes tests for because they don't know they exist until something breaks.&lt;/p&gt;
&lt;h2&gt;
  
  
  What if dbt tests wrote themselves?
&lt;/h2&gt;

&lt;p&gt;That's what I built. &lt;a href="https://github.com/vahid110/dbt-dqlens" rel="noopener noreferrer"&gt;dbt-dqlens&lt;/a&gt; profiles your models and generates the test YAML for you.&lt;/p&gt;

&lt;p&gt;After your normal &lt;code&gt;dbt run&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dbt-dqlens
dqlens-dbt profile        &lt;span class="c"&gt;# profiles all models using your dbt connection&lt;/span&gt;
dqlens-dbt generate-tests &lt;span class="c"&gt;# outputs _dqlens_tests.yml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reads your &lt;code&gt;profiles.yml&lt;/code&gt;, connects to the same warehouse dbt uses, profiles every column (nulls, uniqueness, distributions, patterns, foreign keys, percentiles), and generates native dbt tests based on what it finds.&lt;/p&gt;

&lt;p&gt;The output is a standard &lt;code&gt;schema.yml&lt;/code&gt; file you commit to your repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;orders&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;dqlens&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;id&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;unique&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;not_null&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;dqlens_no_null_drift&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;baseline_pct&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.1&lt;/span&gt;
              &lt;span class="na"&gt;threshold_multiplier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amount&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;dqlens_no_outliers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;lower_bound&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-110.0&lt;/span&gt;
              &lt;span class="na"&gt;upper_bound&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;210.0&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer_id&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;dqlens_no_orphans&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;target_model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ref('customers')&lt;/span&gt;
              &lt;span class="na"&gt;target_column&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then &lt;code&gt;dbt test --select tag:dqlens&lt;/code&gt; runs them as native dbt tests. They show up in dbt docs, dbt Cloud, your CI pipeline. Nothing changes about your workflow except now you have tests you didn't write.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it catches that dbt tests don't
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Standard dbt test&lt;/th&gt;
&lt;th&gt;dbt-dqlens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Null rate increased 10x from last week&lt;/td&gt;
&lt;td&gt;No (unless you wrote a threshold)&lt;/td&gt;
&lt;td&gt;Yes (baseline comparison)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Column dropped upstream&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (schema drift detection)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FK references non-existent rows&lt;/td&gt;
&lt;td&gt;Only with relationships test (manual)&lt;/td&gt;
&lt;td&gt;Yes (auto-detected from schema)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40% empty strings masquerading as "not null"&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (empty string rate check)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Values 10x beyond normal range&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (IQR-based outlier detection)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Column type changed&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (type change detection)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  It's behavior-based, not rule-based
&lt;/h2&gt;

&lt;p&gt;The key difference: dbt tests check static rules you defined. dbt-dqlens checks behavior. It learns what your data looks like (the baseline) and flags when something changes.&lt;/p&gt;

&lt;p&gt;You don't define thresholds. It computes them from your data. If your email column is normally 0.1% null and jumps to 5%, that's a finding. If your orders table normally grows 2-5% daily and suddenly jumps 50%, that's a finding.&lt;/p&gt;

&lt;p&gt;This is the kind of check nobody writes by hand because you'd need to know the baseline first. The tool knows it because it profiled your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dbt-dqlens
dqlens-dbt run  &lt;span class="c"&gt;# profiles + generates tests in one step&lt;/span&gt;
dbt &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--select&lt;/span&gt; tag:dqlens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reads your existing &lt;code&gt;profiles.yml&lt;/code&gt;. No new connections to configure. Works with PostgreSQL today, more databases coming.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/vahid110/dbt-dqlens" rel="noopener noreferrer"&gt;github.com/vahid110/dbt-dqlens&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core engine (&lt;a href="https://github.com/vahid110/dqlens" rel="noopener noreferrer"&gt;DQLens&lt;/a&gt;) also works standalone if you don't use dbt. Same profiling, same detection, just a CLI instead of dbt integration.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you've been meaning to add data quality tests but never found the time, this is the shortcut. Three commands, zero YAML writing, and you get coverage you never had.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>dbt</category>
      <category>postgres</category>
      <category>postgressql</category>
    </item>
    <item>
      <title>Nice read</title>
      <dc:creator>vahid Saber</dc:creator>
      <pubDate>Mon, 27 Apr 2026 08:43:53 +0000</pubDate>
      <link>https://dev.to/vahid_saber_d47950be99729/nice-read-2ghc</link>
      <guid>https://dev.to/vahid_saber_d47950be99729/nice-read-2ghc</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/abdumasah/4x-faster-redshift-reads-with-one-line-of-python-l4l" class="crayons-story__hidden-navigation-link"&gt;4x Faster Redshift Reads With One Line of Python&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/abdumasah" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891331%2Fcf728784-c29e-41ec-9980-28f5abd79d4f.jpg" alt="abdumasah profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/abdumasah" class="crayons-story__secondary fw-medium m:hidden"&gt;
              abdu masah
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                abdu masah
                
              
              &lt;div id="story-author-preview-content-3533437" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/abdumasah" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891331%2Fcf728784-c29e-41ec-9980-28f5abd79d4f.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;abdu masah&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/abdumasah/4x-faster-redshift-reads-with-one-line-of-python-l4l" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 21&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/abdumasah/4x-faster-redshift-reads-with-one-line-of-python-l4l" id="article-link-3533437"&gt;
          4x Faster Redshift Reads With One Line of Python
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aws"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aws&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/dataengineering"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;dataengineering&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/opensource"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;opensource&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/abdumasah/4x-faster-redshift-reads-with-one-line-of-python-l4l" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/abdumasah/4x-faster-redshift-reads-with-one-line-of-python-l4l#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              1&lt;span class="hidden s:inline"&gt; comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Introducing sqlxport: Export SQL Query Results to Parquet or CSV and Upload to S3 or MinIO</title>
      <dc:creator>vahid Saber</dc:creator>
      <pubDate>Wed, 04 Jun 2025 19:19:25 +0000</pubDate>
      <link>https://dev.to/vahid_saber_d47950be99729/introducing-sqlxport-export-sql-query-results-to-parquet-or-csv-and-upload-to-s3-or-minio-3i5f</link>
      <guid>https://dev.to/vahid_saber_d47950be99729/introducing-sqlxport-export-sql-query-results-to-parquet-or-csv-and-upload-to-s3-or-minio-3i5f</guid>
      <description>&lt;p&gt;In today’s data pipelines, exporting data from SQL databases into flexible and efficient formats like Parquet or CSV is a frequent need — especially when integrating with tools like AWS Athena, Pandas, Spark, or Delta Lake.&lt;/p&gt;

&lt;p&gt;That’s where sqlxport comes in.&lt;/p&gt;

&lt;p&gt;🚀 What is sqlxport?&lt;br&gt;
sqlxport is a simple, powerful CLI tool that lets you:&lt;/p&gt;

&lt;p&gt;Run a SQL query against PostgreSQL or Redshift&lt;br&gt;
Export the results as Parquet or CSV&lt;br&gt;
Optionally upload the result to S3 or MinIO&lt;br&gt;
It’s open source, Python-based, and available on PyPI.&lt;/p&gt;

&lt;p&gt;🛠️ Use Cases&lt;br&gt;
Export Redshift query results to S3 in a single command&lt;br&gt;
Prepare Parquet files for data science in DuckDB or Pandas&lt;br&gt;
Integrate your SQL results into Spark Delta Lake pipelines&lt;br&gt;
Automate backups or snapshots from your production databases&lt;br&gt;
✨ Key Features&lt;br&gt;
✅ PostgreSQL and Redshift support&lt;br&gt;
✅ Parquet and CSV output&lt;br&gt;
✅ Supports partitioning&lt;br&gt;
✅ MinIO and AWS S3 support&lt;br&gt;
✅ CLI-friendly and scriptable&lt;br&gt;
✅ MIT licensed&lt;br&gt;
📦 Quickstart&lt;br&gt;
pip install sqlxport&lt;br&gt;
sqlxport run \&lt;br&gt;
  --db-url postgresql://user:pass@host:5432/dbname \&lt;br&gt;
  --query "SELECT * FROM sales" \&lt;br&gt;
  --format parquet \&lt;br&gt;
  --output-file sales.parquet&lt;br&gt;
Want to upload it to MinIO or S3?&lt;/p&gt;

&lt;p&gt;sqlxport run \&lt;br&gt;
  ... \&lt;br&gt;
  --upload-s3 \&lt;br&gt;
  --s3-bucket my-bucket \&lt;br&gt;
  --s3-key sales.parquet \&lt;br&gt;
  --aws-access-key-id XXX \&lt;br&gt;
  --aws-secret-access-key YYY&lt;br&gt;
🧪 Live Demo&lt;br&gt;
We provide a full end-to-end demo using:&lt;/p&gt;

&lt;p&gt;PostgreSQL&lt;br&gt;
MinIO (S3-compatible)&lt;br&gt;
Apache Spark with Delta Lake&lt;br&gt;
DuckDB for preview&lt;br&gt;
👉 See it on GitHub&lt;/p&gt;

&lt;p&gt;🌐 Where to Find It&lt;br&gt;
📦 PyPI: sqlxport&lt;br&gt;
💻 GitHub: sqlxport&lt;br&gt;
🐦 Follow updates on Twitter/X&lt;br&gt;
🙌 Contributions Welcome&lt;br&gt;
We’re just getting started. Feel free to open issues, submit PRs, or suggest ideas for future features and integrations.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
