<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yu-Chuan Hung</title>
    <description>The latest articles on DEV Community by Yu-Chuan Hung (@cutechuanchuan).</description>
    <link>https://dev.to/cutechuanchuan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3458433%2F40aa6a2a-2065-4d95-b1eb-2e74cb353de7.png</url>
      <title>DEV Community: Yu-Chuan Hung</title>
      <link>https://dev.to/cutechuanchuan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cutechuanchuan"/>
    <language>en</language>
    <item>
      <title>[Apache Iceberg] Iceberg Performance: The Hidden Cost of NULLS FIRST</title>
      <dc:creator>Yu-Chuan Hung</dc:creator>
      <pubDate>Sun, 16 Nov 2025 16:47:45 +0000</pubDate>
      <link>https://dev.to/cutechuanchuan/apache-iceberg-iceberg-performance-the-hidden-cost-of-nulls-first-20cl</link>
      <guid>https://dev.to/cutechuanchuan/apache-iceberg-iceberg-performance-the-hidden-cost-of-nulls-first-20cl</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Apache Iceberg is a widely-used table format in Data Lakehouse architectures. It provides flexibility in how data is written, with two key optimizations: &lt;code&gt;partition&lt;/code&gt;, which splits data into segments, and &lt;code&gt;sort&lt;/code&gt;, which reorders data within those segments. These optimizations can significantly reduce the amount of data scanned by query engines, ultimately boosting query performance.&lt;/p&gt;

&lt;p&gt;When querying data with high-cardinality columns (e.g., IDs or serial numbers), quickly filtering out unnecessary values is crucial. Sorting becomes particularly valuable in these scenarios. The rationale is simple: if data is written in order, query engines can rapidly locate the needed data rather than performing a full table scan and discarding irrelevant rows.&lt;/p&gt;

&lt;p&gt;When configuring Iceberg table sort properties, engineers can specify whether sorting follows ascending or descending order—with ascending as the default. While reading about this configuration, a question came to mind: Is there any performance difference between these two ordering approaches? If so, which one performs better, and why? To answer these questions, I designed an experiment to find out.&lt;/p&gt;

&lt;h1&gt;
  
  
  Experiment
&lt;/h1&gt;

&lt;p&gt;Detailed code and performance analysis can be found in my repo: &lt;a href="https://github.com/CuteChuanChuan/Dive-Into-Iceberg" rel="noopener noreferrer"&gt;https://github.com/CuteChuanChuan/Dive-Into-Iceberg&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Materials
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Generated &lt;strong&gt;1,000,000&lt;/strong&gt; rows with &lt;strong&gt;30% null values&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Created two identically configured Iceberg tables with different null sorting orders (i.e., &lt;strong&gt;NULLS FIRST vs. NULLS LAST&lt;/strong&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Queries Executed to Evaluate Performance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;select count(*) from table where value is not null&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;select sum(value) from table where value is not null&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;select avg(value) from table where value is not null&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;select count(*) from table where value is null&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;select count(*) from table&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Evaluation Metrics
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Query plan&lt;/strong&gt;: Whether different sorting orders generate different execution plans&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Execution time with statistical analysis&lt;/strong&gt;: Overall query time comparison&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU profiling&lt;/strong&gt;: Detailed CPU usage analysis&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Findings
&lt;/h1&gt;

&lt;p&gt;To obtain a complete picture, I planned to conduct three types of analysis. First, I compared query plans to see whether different null placements generate different plans, which might influence query performance. Second, I conducted statistical analysis on execution times for rigorous examination. Since query time differences are the observable outcome, we need to identify the root cause if significant differences exist. Therefore, if statistical significance is found, CPU profiling will be conducted in the final phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query Plan
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Details
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;select count(*) from table where value is not null&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1557&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnotnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;508&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
               &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;508&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NOT&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1574&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnotnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;521&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
               &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;521&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NOT&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select sum(value) from table where value is not null&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;886&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3045&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;886&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnotnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;886&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;886&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NOT&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;899&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3064&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;899&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnotnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;899&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;899&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NOT&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select avg(value) from table where value is not null&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1264&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4535&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_avg&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1264&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnotnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1264&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1264&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NOT&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1279&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4554&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_avg&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1279&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnotnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1279&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1279&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NOT&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select count(*) from table where value is null&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6023&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1646&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
               &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1646&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_first&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6040&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_count&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Filter&lt;/span&gt; &lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1659&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
               &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;BatchScan&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1659&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;test_nulls_last&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="no"&gt;IS&lt;/span&gt; &lt;span class="no"&gt;NULL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;groupedBy&lt;/span&gt;&lt;span class="o"&gt;=]&lt;/span&gt; &lt;span class="nl"&gt;RuntimeFilters:&lt;/span&gt; &lt;span class="o"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select count(*) from table&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;First&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agg_func_0&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1895L&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7045&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agg_func_0&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1895L&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(*)&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1896L&lt;/span&gt; &lt;span class="no"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;agg_func_0&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1895L&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;LocalTableScan&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(*)&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1896L&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nc"&gt;Null&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;
&lt;span class="nc"&gt;Query&lt;/span&gt; &lt;span class="nf"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NULLS&lt;/span&gt; &lt;span class="nc"&gt;Last&lt;/span&gt;&lt;span class="o"&gt;):&lt;/span&gt;
&lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Physical&lt;/span&gt; &lt;span class="nc"&gt;Plan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt;
&lt;span class="nc"&gt;AdaptiveSparkPlan&lt;/span&gt; &lt;span class="n"&gt;isFinalPlan&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agg_func_0&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1904L&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
   &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Exchange&lt;/span&gt; &lt;span class="nc"&gt;SinglePartition&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;ENSURE_REQUIREMENTS&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7060&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;HashAggregate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="o"&gt;=[],&lt;/span&gt; &lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="o"&gt;=[&lt;/span&gt;&lt;span class="n"&gt;partial_sum&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agg_func_0&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1904L&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt;
         &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(*)&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1905L&lt;/span&gt; &lt;span class="no"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;agg_func_0&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1904L&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
            &lt;span class="o"&gt;+-&lt;/span&gt; &lt;span class="nc"&gt;LocalTableScan&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;(*)&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="mi"&gt;1905L&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;For both tables, the execution plans for all queries are identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  File-Level Statistics Analysis
&lt;/h2&gt;

&lt;p&gt;Although the query plans are the same, a deeper look at the Parquet file statistics reveals important differences in how data is physically organized.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partition Statistics Comparison
&lt;/h3&gt;

&lt;p&gt;Below are the min/max statistics for each partition in both configurations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Partition&lt;/th&gt;
&lt;th&gt;NULLS FIRST&lt;/th&gt;
&lt;th&gt;NULLS LAST&lt;/th&gt;
&lt;th&gt;Min Value Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cat_0-2&lt;/td&gt;
&lt;td&gt;All nulls&lt;/td&gt;
&lt;td&gt;All nulls&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_3&lt;/td&gt;
&lt;td&gt;min=103, max=993&lt;/td&gt;
&lt;td&gt;min=103, max=993&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_4&lt;/td&gt;
&lt;td&gt;min=4, max=994&lt;/td&gt;
&lt;td&gt;min=4, max=994&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_5&lt;/td&gt;
&lt;td&gt;min=405, max=995&lt;/td&gt;
&lt;td&gt;min=355, max=995&lt;/td&gt;
&lt;td&gt;-50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_6&lt;/td&gt;
&lt;td&gt;min=106, max=996&lt;/td&gt;
&lt;td&gt;min=6, max=996&lt;/td&gt;
&lt;td&gt;-100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_7&lt;/td&gt;
&lt;td&gt;min=517, max=997&lt;/td&gt;
&lt;td&gt;min=487, max=997&lt;/td&gt;
&lt;td&gt;-30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_8&lt;/td&gt;
&lt;td&gt;min=228, max=998&lt;/td&gt;
&lt;td&gt;min=208, max=998&lt;/td&gt;
&lt;td&gt;-20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cat_9&lt;/td&gt;
&lt;td&gt;min=619, max=999&lt;/td&gt;
&lt;td&gt;min=609, max=999&lt;/td&gt;
&lt;td&gt;-10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Are Statistics Different?
&lt;/h3&gt;

&lt;p&gt;The different min/max values reveal that &lt;strong&gt;physical data layout differs&lt;/strong&gt; between the two configurations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Different File Boundaries&lt;/strong&gt;: When sorting with NULLS FIRST vs. NULLS LAST, Spark writes data in different orders, causing file splits to occur at different points. Even though both tables contain identical data, the way rows are distributed across files differs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;File Organization Pattern&lt;/strong&gt;:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NULLS FIRST&lt;/strong&gt;: Files begin with null values, followed by non-null values. The minimum non-null value appears after skipping nulls within each file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NULLS LAST&lt;/strong&gt;: Files begin with non-null values immediately. The minimum value is at or near the start of the file.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Metadata Quality&lt;/strong&gt;: NULLS LAST produces "better" statistics for non-null queries:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In &lt;strong&gt;NULLS FIRST&lt;/strong&gt; (e.g., cat_6): &lt;code&gt;min=106&lt;/code&gt; means the file starts with nulls, and 106 is the first non-null value encountered.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In &lt;strong&gt;NULLS LAST&lt;/strong&gt; (e.g., cat_6): &lt;code&gt;min=6&lt;/code&gt; means the file immediately starts with value 6, providing more accurate bounds.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Impact on Query Execution
&lt;/h3&gt;

&lt;p&gt;For queries with &lt;code&gt;WHERE value IS NOT NULL&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NULLS FIRST&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Files contain nulls at the beginning, causing mixed value distribution&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Query engine must scan through null values before reaching non-null data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Statistics indicate the presence of non-null values, but they're not immediately accessible&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NULLS LAST&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Files with non-null data have those values at the beginning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Query engine can immediately start processing valid values&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better sequential access pattern for counting non-null values&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This file-level organization difference, combined with CPU microarchitecture optimizations, explains why NULLS LAST performs better for counting non-null values even though logical query plans are identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Execution Time Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data Collection
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;5 queries, each executed 100 times&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Statistical Methods
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;T-test&lt;/strong&gt;: Compare whether query times are statistically different&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cohen's d&lt;/strong&gt;: Calculate the effect size of null ordering settings&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Details
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;select count(*) from table where value is not null&lt;/code&gt;: Null Last performs better
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Descriptive&lt;/span&gt; &lt;span class="nx"&gt;Statistics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;FIRST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;41.46&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;8.38&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;31.55&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.40&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;

&lt;span class="nx"&gt;Paired&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;11.9367&lt;/span&gt; 
  &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.000000&lt;/span&gt; 
  &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;8.26&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;11.55&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;***&lt;/span&gt; &lt;span class="nx"&gt;HIGHLY&lt;/span&gt; &lt;span class="nc"&gt;SIGNIFICANT &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="nc"&gt;Size &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Cohen&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;s d):
  d = 1.1937 
  Interpretation: Large 

Summary:
  Mean difference: 9.91 ms
  Percentage difference: 23.90 %
  Winner: NULLS LAST
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select sum(value) from table where value is not null&lt;/code&gt;: Not significantly different
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Descriptive&lt;/span&gt; &lt;span class="nx"&gt;Statistics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;FIRST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;34.14&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.12&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;33.40&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;6.43&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;

&lt;span class="nx"&gt;Paired&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8759&lt;/span&gt; 
  &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.383195&lt;/span&gt; 
  &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.43&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NOT&lt;/span&gt; &lt;span class="nc"&gt;SIGNIFICANT &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select avg(value) from table where value is not null&lt;/code&gt;: Not significantly different
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Descriptive&lt;/span&gt; &lt;span class="nx"&gt;Statistics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;FIRST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;28.84&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.42&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;27.95&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.26&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;

&lt;span class="nx"&gt;Paired&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.9654&lt;/span&gt; 
  &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.052165&lt;/span&gt; 
  &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.80&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NOT&lt;/span&gt; &lt;span class="nc"&gt;SIGNIFICANT &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select count(*) from table where value is null&lt;/code&gt;: Not significantly different
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Descriptive&lt;/span&gt; &lt;span class="nx"&gt;Statistics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;FIRST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;24.00&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;4.64&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;23.16&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3.43&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;

&lt;span class="nx"&gt;Paired&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.3804&lt;/span&gt; 
  &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.170582&lt;/span&gt; 
  &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.37&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.05&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NOT&lt;/span&gt; &lt;span class="nc"&gt;SIGNIFICANT &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;select count(*) from table&lt;/code&gt;: Not significantly different
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;Descriptive&lt;/span&gt; &lt;span class="nx"&gt;Statistics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;FIRST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;14.95&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.41&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;NULLS&lt;/span&gt; &lt;span class="nx"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nx"&gt;mean&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;14.39&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.45&lt;/span&gt;&lt;span class="nx"&gt;ms&lt;/span&gt;

&lt;span class="nx"&gt;Paired&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;statistic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.6356&lt;/span&gt; 
  &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.105090&lt;/span&gt; 
  &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;CI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.25&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="nx"&gt;ms&lt;/span&gt;
  &lt;span class="nx"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NOT&lt;/span&gt; &lt;span class="nc"&gt;SIGNIFICANT &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;NULLS LAST is significantly faster than NULLS FIRST when counting non-null values.&lt;/p&gt;

&lt;h2&gt;
  
  
  CPU Profiling: Analyzing Count Non-Null Values Query
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Details
&lt;/h3&gt;

&lt;p&gt;Please refer to the flame graphs in my repo.&lt;/p&gt;

&lt;p&gt;The performance difference observed in execution time analysis can be attributed to both &lt;strong&gt;file-level organization&lt;/strong&gt; and &lt;strong&gt;CPU microarchitecture optimizations&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;File-Level Organization Impact:&lt;/strong&gt; As shown in the file statistics analysis, NULLS LAST creates files where non-null values are positioned at the beginning. This layout means when the query engine scans data with &lt;code&gt;WHERE value IS NOT NULL&lt;/code&gt;, it immediately encounters a continuous block of valid values rather than having to skip over nulls first. This reduces unnecessary I/O operations and deserialization overhead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;CPU Microarchitecture Optimizations:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;SIMD (Single Instruction, Multiple Data):&lt;/em&gt;
Modern CPUs can process multiple data elements simultaneously using SIMD instructions. When counting non-null values with NULLS LAST, the query engine encounters a continuous block of non-null values at the start of each file. This layout allows SIMD instructions to efficiently process multiple valid values in parallel. For example, when checking &lt;code&gt;isnotnull(value)&lt;/code&gt; on 8 consecutive values that are all non-null, a single SIMD instruction can validate and count them in one operation.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Branch Prediction:&lt;/em&gt;
Modern CPUs use branch predictors to anticipate the outcome of conditional statements (like &lt;code&gt;if (value != null)&lt;/code&gt;). With NULLS LAST, the query engine scans data following a highly predictable pattern: a long sequence of non-null values followed by nulls. This consistency allows the branch predictor to achieve high accuracy, keeping the CPU pipeline running smoothly. In contrast, NULLS FIRST presents a less predictable pattern at file boundaries where nulls transition to non-nulls, potentially causing pipeline stalls.&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;The CPU profiling data supports these optimizations: NULLS LAST (2,238 samples) uses approximately 11.7% less CPU time than NULLS FIRST (2,536 samples). This reduction results from the combined effects of better file organization, improved SIMD vectorization, and enhanced branch prediction accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;NULLS LAST occupies less CPU time due to a combination of better file-level data organization and CPU microarchitecture optimizations.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion and Future Exploration
&lt;/h1&gt;

&lt;p&gt;This exploration reveals that while different null value placements do not create different query plans, they significantly impact query performance through &lt;strong&gt;physical data organization&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Findings:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;File-Level Statistics Matter&lt;/strong&gt;: NULLS LAST produces better min/max statistics, with non-null values positioned at file beginnings. This creates more favorable data layouts for queries filtering on non-null values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CPU Microarchitecture Synergy&lt;/strong&gt;: The continuous blocks of non-null values in NULLS LAST enable CPU optimizations including SIMD vectorization and improved branch prediction, resulting in ~11.7% less CPU time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Significant Performance Impact&lt;/strong&gt;: For &lt;code&gt;SELECT COUNT(*) WHERE value IS NOT NULL&lt;/code&gt;, NULLS LAST achieves 23.90% faster execution time—a substantial improvement for such a common OLAP operation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Practical Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If counting non-null values is a frequent operation in your workload—which is common in OLAP scenarios—configuring Iceberg tables with NULLS LAST can provide measurable performance improvements. The benefits stem from both better file organization and CPU-level optimizations working in tandem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Future Exploration:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This experiment tested 5 queries on a 1-million-row dataset with 30% null values. Future investigations could explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Various query patterns frequently used in OLAP scenarios (e.g., window functions like LAG, complex aggregations)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Larger datasets with multiple files per partition to amplify metadata pruning effects&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Different null percentage distributions (10%, 50%, 70%) to understand the threshold where NULLS LAST benefits diminish&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Impact on different data types (strings, decimals) and column cardinalities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Performance with Iceberg's metadata-based filtering in more complex predicates&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These investigations would provide a more complete understanding of optimal Iceberg table sorting configurations across diverse workloads.&lt;/p&gt;

</description>
      <category>apacheiceberg</category>
      <category>apachespark</category>
      <category>opensource</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
