<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Paola Pardo</title>
    <description>The latest articles on DEV Community by Paola Pardo (@paolapardo).</description>
    <link>https://dev.to/paolapardo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F726504%2Ffa5829f5-eacd-476a-97fe-95624706e277.jpg</url>
      <title>DEV Community: Paola Pardo</title>
      <link>https://dev.to/paolapardo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/paolapardo"/>
    <language>en</language>
    <item>
      <title>Datasource enabling multidimensional indexing and sampling pushdown</title>
      <dc:creator>Paola Pardo</dc:creator>
      <pubDate>Wed, 09 Mar 2022 15:23:23 +0000</pubDate>
      <link>https://dev.to/paolapardo/datasource-enabling-multidimensional-indexing-and-sampling-pushdown-29ca</link>
      <guid>https://dev.to/paolapardo/datasource-enabling-multidimensional-indexing-and-sampling-pushdown-29ca</guid>
      <description>&lt;p&gt;Do you wonder how a multidimensional index would look like in Spark? &lt;/p&gt;

&lt;p&gt;Recently we launched the Qbeast Open Source Format, a Data Lakehouse enhancement to speed up your queries!&lt;/p&gt;

&lt;p&gt;Based on &lt;a href="https://delta.io/"&gt;Delta Lake&lt;/a&gt; and available for &lt;a href="https://spark.apache.org/"&gt;Apache Spark&lt;/a&gt;, it allows indexing your data by different columns and read a representative sample directly from storage 🔥&lt;/p&gt;

&lt;p&gt;Quick example of how you can boost your query performance using Qbeast:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is a Normal Query with Spark and Delta format.&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MBBDVg2c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bdc1ngafo9jh6s740x33.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MBBDVg2c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bdc1ngafo9jh6s740x33.gif" alt="Normal query" width="666" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the same query but with Qbeast Sampling of 1%&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--azfeBxsx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ln8duog3i6fzlx3moznv.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--azfeBxsx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ln8duog3i6fzlx3moznv.gif" alt="Qbeast Sample Query" width="666" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The gifs are cool, right? Let's compare both executions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Execution Time&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Delta&lt;/td&gt;
&lt;td&gt;~ 2.5 min.&lt;/td&gt;
&lt;td&gt;37.869383&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qbeast&lt;/td&gt;
&lt;td&gt;~ 6.6 sec.&lt;/td&gt;
&lt;td&gt;37.856333&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As you can see, &lt;strong&gt;1% sampling&lt;/strong&gt; provides the result &lt;strong&gt;x22 times faster&lt;/strong&gt; compared to using Delta format, with an &lt;strong&gt;error of 0,034%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you want to play with it, check out the &lt;a href="https://github.com/Qbeast-io/qbeast-spark"&gt;Qbeast-Spark github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And don't forget to give us a star!&lt;/p&gt;

&lt;p&gt;Your support means a lot ❤️&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>datascience</category>
      <category>news</category>
      <category>github</category>
    </item>
  </channel>
</rss>
