<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kovid Rathee</title>
    <description>The latest articles on DEV Community by Kovid Rathee (@kovidr).</description>
    <link>https://dev.to/kovidr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F391473%2F9b644858-c1b5-4e94-9002-f9654d66a649.jpg</url>
      <title>DEV Community: Kovid Rathee</title>
      <link>https://dev.to/kovidr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kovidr"/>
    <language>en</language>
    <item>
      <title>Optimize Read Performance in Supabase with Postgres Materialized Views</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Tue, 30 Sep 2025 13:51:06 +0000</pubDate>
      <link>https://dev.to/kovidr/optimize-read-performance-in-supabase-with-postgres-materialized-views-12k5</link>
      <guid>https://dev.to/kovidr/optimize-read-performance-in-supabase-with-postgres-materialized-views-12k5</guid>
      <description>&lt;p&gt;A &lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;Postgres&lt;/a&gt; database typically consists of schemas that serve single-table &lt;a href="https://developer.mozilla.org/en-US/docs/Glossary/CRUD" rel="noopener noreferrer"&gt;CRUD (create, read, update, delete) operations&lt;/a&gt;. While CRUD operations handle basic transactional needs, applications also require more complex queries. These often involve multiple joins, nested CTEs, and intricate predicates for read-heavy workloads like embedded reporting and analytics. Such multitable queries can degrade overall database performance and block CRUD operations through locking. Postgres materialized views provide a solution by separating read-heavy workloads from transactional processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://supabase.com/blog/postgresql-views#what-is-a-materialized-view" rel="noopener noreferrer"&gt;Materialized views&lt;/a&gt; are just like &lt;a href="https://supabase.com/blog/postgresql-views#what-is-a-view" rel="noopener noreferrer"&gt;standard views&lt;/a&gt; but with one key difference—materialized views are precomputed and stored on disk. They save the query results in separate database objects, enabling you to run queries on top of a static snapshot instead of live underlying tables. The precomputation can help you so things like serving reports and enabling filtered data exports from multiple tables, where the queries can get quite complex.&lt;/p&gt;

&lt;p&gt;This tutorial explains how I improved my application's read performance using Postgres materialized views in &lt;a href="https://supabase.com/" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Materialized Views in Postgres
&lt;/h2&gt;

&lt;p&gt;Unlike a standard view, a materialized view has an attached storage component; this means that materialized views can be precomputed and stored while a standard view cannot. Under the hood, both types of views follow Postgres's rule system. A table is designed to support all kinds of CRUD operations. In contrast, a materialized view is intended to be refreshed by a prewritten SQL query attached to its definition.&lt;/p&gt;

&lt;p&gt;A materialized view is a point-in-time snapshot of a query result. This snapshot unlocks a wide range of benefits around query performance, with instant query execution as the results are precomputed. It also decouples the typical CRUD-type operations from the read-heavy operations using multiple tables and complex queries. It's also highly beneficial as it minimizes long-running lock contentions that can occur due to read-heavy operations on live tables.&lt;/p&gt;

&lt;p&gt;Before using materialized views, you should consider that their data storage component will impact the database storage cost. Materialized views also don't provide the most up-to-date data. They're most suitable for use cases that can function with precomputed data. The article covers some other overheads associated with materialized views later, but first, let's look at why I chose to use materialized views to improve my query performance in Supabase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Materialized Views
&lt;/h2&gt;

&lt;p&gt;Supabase uses &lt;a href="https://docs.postgrest.org/en/v13/" rel="noopener noreferrer"&gt;PostgREST&lt;/a&gt; instead of manual SQL-based CRUD programming and traditional &lt;a href="https://www.theserverside.com/definition/object-relational-mapping-ORM" rel="noopener noreferrer"&gt;ORMs (object-relational mappers)&lt;/a&gt;. PostgREST converts your entire database into a RESTful API—the &lt;a href="https://supabase.com/docs/guides/api" rel="noopener noreferrer"&gt;Supabase API&lt;/a&gt;—that allows you to interact with your database through standard API endpoints.&lt;br&gt;
Read-heavy queries slow API response times because they compute results on the fly, requiring the API to wait for database processing. Materialized views address this issue by pre-computing expensive joins and aggregations, eliminating the need for real-time calculations in performance-critical applications.&lt;br&gt;
Materialized views also have some security implications. You can create &lt;a href="https://supabase.com/docs/guides/database/postgres/row-level-security" rel="noopener noreferrer"&gt;RLS (Row Level Security) policies&lt;/a&gt; on materialized views just as you would on tables. These policies translate directly into &lt;a href="https://supabase.com/docs/guides/api/securing-your-api" rel="noopener noreferrer"&gt;API endpoint security policies&lt;/a&gt;, providing granular control over your data at both the table and materialized view levels.&lt;/p&gt;
&lt;h2&gt;
  
  
  Creating and Using Materialized Views in Supabase
&lt;/h2&gt;

&lt;p&gt;Let's look at how to create and use materialized views in Supabase using an example online storefront for an e-commerce business with the following database tables: &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;products&lt;/code&gt;, &lt;code&gt;orders&lt;/code&gt;,  and &lt;code&gt;order_payments&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Consider this scenario: The storefront owner logs into a portal that displays a month-on-month performance report. The summary should only include successful orders (orders that have been paid for) and exclude orders where a refund has been processed. Running this query every time the storefront owner logs in would be slow as it would involve filtering, aggregating, and joining data from at least three out of the five tables mentioned above. The query would look something like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH successful_orders AS (
  SELECT o.user_id, 
         o.order_id,
         o.original_amount_usd, 
         o.discount,
         o.final_amount_usd, 
         op.payment_amount_usd,
         o.order_date,
         TO_CHAR(order_date, 'YYYY-MM') order_month
    FROM orders o
    INNER JOIN order_payments op ON o.order_id = op.order_id
   WHERE o.order_status = 'SUCCESSFUL'
     AND op.order_payment_status = 'SUCCESSFUL'
     AND ROUND(o.final_amount_usd,2) = ROUND(op.payment_amount_usd,2)
     AND op.refund_processed = FALSE
)

SELECT 
    order_month,
    COUNT(DISTINCT user_id) unq_customers,
    COUNT(order_id) unq_orders,
    SUM(original_amount_usd) total_original_amount,
    SUM(payment_amount_usd) total_payment_amount
FROM successful_orders
GROUP BY order_month
ORDER BY order_month DESC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this query on demand could be quite expensive and frustrating for the storefront owner if they log in to check the status several times daily. In this situation, having a materialized view based on this query that materializes every few hours will be quicker and cheaper as the materialized view's performance will be similar to that of a regular table. Here's how you can create a materialized view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE MATERIALIZED VIEW mv_successful_orders AS ... ;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this materialized view is created, it will hold the results of the underlying query until the next time it's refreshed. You can access a materialized view using the same &lt;code&gt;SELECT&lt;/code&gt; statement as you do for a standard view or a table. Deleting a materialized view is similar to deleting any other database object, like a table or a standard view, for which you can use the &lt;code&gt;DROP&lt;/code&gt; statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DROP MATERIALIZED VIEW mv_successful_orders;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Creating and dropping a materialized view works in a similar way to creating and dropping tables and regular views. If you want to reload data into a static table, you would need to &lt;code&gt;TRUNCATE&lt;/code&gt; and &lt;code&gt;INSERT&lt;/code&gt; the new records, or you can &lt;code&gt;DROP&lt;/code&gt; and &lt;code&gt;CREATE&lt;/code&gt; the table again. Regular views don't have this requirement as they don't store data on disk. Materialized views, on the other hand, need to be refreshed to provide a fresher copy of the data from the underlying query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refreshing Materialized Views in Supabase
&lt;/h2&gt;

&lt;p&gt;I'll now explain how to keep the data fresh and up-to-date based on your application's needs in the materialized view.&lt;/p&gt;

&lt;p&gt;Running the underlying query (for example, the one defined using a &lt;a href="https://www.postgresql.org/docs/current/queries-with.html" rel="noopener noreferrer"&gt;CTE (common table expression)&lt;/a&gt; in the previous section) brings the up-to-date data into the materialized view. This process is called a refresh. Refreshing a materialized view is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REFRESH MATERIALIZED VIEW mv_successful_orders;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are many ways to refresh materialized views. You can run an on-demand refresh from the command line, refresh based on a database trigger, or refresh based on logic built into the application code. Each of these methods might be suitable for separate roles, such as data engineers, frontend engineers, and so on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supabase Edge Functions
&lt;/h3&gt;

&lt;p&gt;You can call Supabase Edge Functions directly from your application, and you can conditionally refresh the materialized view based on the conditions specified in the function code, preventing unnecessary refreshes. Here's a code snippet showing how to call the previously defined &lt;code&gt;mv_successful_orders&lt;/code&gt; using a Supabase Edge Function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  const response = await fetch(`${supabaseUrl}/functions/v1/refresh-materialized-view`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${supabaseAnonKey}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      mv_name: 'mv_successful_orders',
      use_concurrent: true
    })
  })

  const result = await response.json()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Manual Refreshes Using SQL Statements
&lt;/h3&gt;

&lt;p&gt;You can run SQL statements directly or wrapped in Postgres functions using the Supabase dashboard or your client application using the Supabase SDK via RPCs (remote procedure calls). An RPC call would look something like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  const result = await supabase.rpc('refresh_mv_successful_orders')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automatic Refreshes Using Postgres Triggers
&lt;/h3&gt;

&lt;p&gt;You can also trigger materialized view refreshes using database changes using Supabase Realtime Broadcast or Supabase Realtime Postgres changes. This method is more configured on the backend rather than the frontend as it is based on the database change log and database triggers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining Your Refresh Strategy
&lt;/h2&gt;

&lt;p&gt;No matter what method you use, you'll need a smart refresh strategy to benefit from the performance and cost gains of using materialized views. Running refreshes frequently can cause the same performance issues you might encounter with regular views or on-the-fly SQL queries. That's why materialized views are most effective when you strike the right balance between data freshness and query performance. Using materialized views can worsen the performance if you don't have the right refresh strategy. To decide on the correct strategy, you need to answer the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data freshness:&lt;/strong&gt; How frequently the application needs the data to be refreshed is a key factor that helps you decide the strategy for the data freshness requirement. For example, if a report in the application is seen by application users typically every hour, it would make sense to refresh the materialized view once an hour.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost and maintenance:&lt;/strong&gt; You must understand the cost of running frequent refreshes, including query execution, index recreation, and potential blocking of CRUD operations during refreshes. This impact grows with large data sets. To reduce cost and contention, limit the data in the materialized view, such as by filtering by date ranges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time of refresh:&lt;/strong&gt; It helps to refresh materialized views when the database usage is low in terms of compute and memory or when there are less critical workloads running on the database. This way, even if the refresh takes up a lot of resources, it doesn't impact other important database operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As mentioned earlier, the best way to determine the right strategy is to strike a balance among the three factors mentioned above. All in all, you want to come up with a strategy that fulfills the data freshness needs, doesn't refresh more frequently than required, and times the refreshes intelligently based on existing database workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Materialized Views: When and How to Use Them
&lt;/h2&gt;

&lt;p&gt;Materialized views can be a powerful tool for optimizing database performance, but they're not suitable for every scenario. Understanding when and how to use them effectively is crucial for maintaining both performance and system efficiency.&lt;/p&gt;

&lt;p&gt;For instance, if your application needs fresh data all the time or if the resulting data sets are huge, materialized views aren't typically a suitable option. However, materialized views would be a good option for an application that requires a lightweight reporting and analytics layer, but where creating a dedicated data warehouse or data lake would involve excessive cost and maintenance overhead. They can serve predefined reports, dashboards, and data extracts without overburdening the database and the application with repetitive and unnecessary queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use materialized views for the following cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use materialized views for complex queries that deal with multiple tables, especially with large volumes of data.&lt;/li&gt;
&lt;li&gt;Use materialized views for underlying tables that have a high read-to-write ratio. It means that the data doesn't change often; hence, the materialized view doesn't need to be refreshed frequently.&lt;/li&gt;
&lt;li&gt;Use materialized views when your queries have complex logic with filters, aggregates, and window functions as these are operations that can slow down and potentially block crucial CRUD operations of your application.&lt;/li&gt;
&lt;li&gt;Use materialized views for queries that are run frequently. The maximum positive impact of using materialized views occurs when your application uses the precomputed data in the materialized view frequently as it saves computation cost every time the query is executed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Materialized views shouldn't be used for the following:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not use materialized views if you need the data to be refreshed frequently as, performance-wise, the experience will somewhat be equivalent to running live complex queries on tables, resulting in little or no benefits regarding cost or performance.&lt;/li&gt;
&lt;li&gt;Do not use materialized views when the underlying data is too small and the queries on the tables or standard views are fast enough for your application's performance needs. Precomputation only helps if the queries are slowing down your application's or the database's performance.&lt;/li&gt;
&lt;li&gt;Do not overuse materialized views. Materialized views come with their own complexities around index maintenance and vacuuming, among other things. Overusing materialized views might end up increasing the maintenance overhead while giving you some performance benefits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using materialized views, you should also follow some best practices to optimize them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use indexes to enable faster point lookups and aggregates. Just like Postgres tables, materialized views also support &lt;a href="https://www.postgresql.org/docs/current/rules-materializedviews.html" rel="noopener noreferrer"&gt;various types of indexes&lt;/a&gt;, which can significantly speed up point lookups and aggregate queries.&lt;/li&gt;
&lt;li&gt;Implement a refresh strategy that doesn't burden the database with lock contentions—that is, don't refresh multiple heavy materialized views that use the same underlying tables or standard views underneath.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To reiterate, the decision to use materialized views, like many other database features such as triggers, user-defined functions, and stored procedures, depends on your specific business use case, cost considerations, and maintenance overhead tolerance. You need to find the right balance to determine if materialized views make sense for your particular scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article covered some of the key benefits of materialized views, how to create, delete, and refresh materialized views in Supabase, and some of the best practices and recommendations for using them.&lt;/p&gt;

&lt;p&gt;With materialized views, you can offload some of the read-heavy and compute-heavy queries to precompute the results that your application needs. You can schedule or trigger the materialized view refreshes at a time when the database utilization isn't high, essentially freeing up resources for the application to use and also turbocharging the performance with faster, table-like reads, which is especially beneficial for reporting, analytics, and other OLAP (online analytical processing) type of use cases.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>performance</category>
      <category>supabase</category>
    </item>
    <item>
      <title>Finding missing data in your database with QuestDB</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Sat, 28 Jan 2023 04:26:34 +0000</pubDate>
      <link>https://dev.to/kovidr/finding-missing-data-in-your-database-with-questdb-1phb</link>
      <guid>https://dev.to/kovidr/finding-missing-data-in-your-database-with-questdb-1phb</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjqj04rg2znolr10ui0p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjqj04rg2znolr10ui0p.png" alt="Finding missing data in your database with QuestDB" width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Whether you are just starting to work with a specific data set or monitoring activities and reports based on existing data sets, one of the first things you need to consider is the quality of the data you’re dealing with. Continuity is one of the most critical factors in gauging the quality of time-series data. Time-series systems usually serve use cases where data needs to be consumed, processed, and acted upon with urgency. &lt;/p&gt;

&lt;p&gt;Take the example of a public transport vehicle. For reasons pertaining to the safety of passengers and the timeliness of the service, vehicles need their various sensors  -  GPS, proximity sensors, pressure sensors, engine diagnostics sensors, and so on. Continuously using the data from these sensors helps the public transport service guarantee timeliness, safety, and reliability. However, a break in the data coming from these sensors would mean that there’s a problem.&lt;/p&gt;

&lt;p&gt;Most data access frameworks, including query languages and importable libraries, allow you to filter and see columns or rows where data is missing. The concept of data continuity and completeness isn't more relevant anywhere than when you're talking about time-series data. By definition, time-series data needs to be continuous. However, the granularity of the continuum might differ for different requirements.&lt;/p&gt;

&lt;p&gt;When you have to test your data for completeness in a relational database, you often have to write complex SQL queries paired with intermediate or temporary tables to find missing data. In some cases, these queries can be tedious and non-performant. QuestDB is a time-series database that lets you store and consume your data in tabular form, but it’s not what you would call a traditional relational database. To cater to the time-series workloads, QuestDB extends the standard SQL functionalities using SQL extensions. One of these extensions is the &lt;a href="https://questdb.io/docs/reference/sql/sample-by/" rel="noopener noreferrer"&gt;&lt;code&gt;SAMPLE BY&lt;/code&gt;&lt;/a&gt; extension, which allows you to find and deal with missing data with ease.&lt;/p&gt;

&lt;p&gt;This tutorial will take you through how to use &lt;a href="https://questdb.io/blog/2022/11/23/sql-extensions-time-series-data-questdb-part-ii/" rel="noopener noreferrer"&gt;QuestDB's SQL extensions&lt;/a&gt; to find gaps in your data without any complex queries or overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dataset
&lt;/h2&gt;

&lt;p&gt;To demonstrate finding gaps in time-series data, we'll be using the &lt;code&gt;trades&lt;/code&gt; dataset, which is readily available on the QuestDB demo website. The &lt;code&gt;trades&lt;/code&gt; dataset contains real-time anonymized trades data for Bitcoin and Ethereum in US Dollars from 8th March 2022 till date. Here's the table structure of the &lt;code&gt;trades&lt;/code&gt; dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="s1"&gt;'trades'&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="n"&gt;SYMBOL&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt; &lt;span class="k"&gt;CACHE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;side&lt;/span&gt; &lt;span class="n"&gt;SYMBOL&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt; &lt;span class="k"&gt;CACHE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more details about the dataset and the process used to ingest data into QuestDB, you can go through &lt;a href="https://questdb.io/blog/2022/04/12/demo-live-crypto-data-streamed-with-questdb-and-grafana/" rel="noopener noreferrer"&gt;this article&lt;/a&gt;. Now that you understand the structure and contents of the &lt;code&gt;trades&lt;/code&gt; dataset, let's try to figure out if anything is missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding missing data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Using QuestDB
&lt;/h3&gt;

&lt;p&gt;As mentioned earlier in the article, you can use SQL extensions to find missing data in QuestDB. There are three keywords (or SQL keyphrases) you need to know that are unique to QuestDB:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://questdb.io/docs/reference/sql/sample-by/" rel="noopener noreferrer"&gt;&lt;code&gt;SAMPLE BY&lt;/code&gt;&lt;/a&gt;   allows you to create groups and buckets of data based on time ranges.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://questdb.io/docs/reference/sql/sample-by#fill-options" rel="noopener noreferrer"&gt;&lt;code&gt;FILL&lt;/code&gt;&lt;/a&gt;   allows you to specify a fill behaviour when using &lt;code&gt;SAMPLE BY&lt;/code&gt;, which, in turn, allows you to perform time-series interpolation on the data. &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://questdb.io/docs/reference/sql/sample-by#align-to-calendar" rel="noopener noreferrer"&gt;&lt;code&gt;ALIGN TO CALENDAR&lt;/code&gt;&lt;/a&gt;   allows you to align your time buckets to a calendar date based on a timezone or an offset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find missing data using a combination of the aforementioned SQL extensions. First, let's look at a basic query using these extensions to get a day-on-day count of trades in December till date this year, using the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;trades_in_december_2022&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'trades'&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="s1"&gt;'2022-12'&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;
 &lt;span class="n"&gt;ALIGN&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;CALENDAR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this query gives us the following output when selecting the “Draw” option in the Chart view:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4j00j14mjxbib18wn09.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4j00j14mjxbib18wn09.jpg" alt="Basic query for a day-to-day count of trades for December 2022" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that running such a simple aggregate query on low granularity is possible on any database. It only becomes a problem with other databases when the data is too granular, especially in real-time. Now that it's clear how to use the SQL extensions you need, let's move on to our query that finds missing data.&lt;/p&gt;

&lt;p&gt;In the query, we'll find the volume-weighted average price (VWAP) indicator for the &lt;code&gt;trades&lt;/code&gt; dataset. The key idea is to get all the timestamps where we don't have data to calculate VWAP for all the Bitcoin trades from the starting date of the dataset till now. In the following query, you can see that the trades are sampled by 1 second using the &lt;code&gt;SAMPLE BY 1s&lt;/code&gt; statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;extract&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vwap_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;volume&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trades&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'BTC-USD'&lt;/span&gt;
   &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DATEADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;
 &lt;span class="n"&gt;ALIGN&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;CALENDAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;vwap_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;volume&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;extract&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;vwap_price&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;vwap_price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, the query doesn't result in anything, as shown in the image below:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqk8k1leryisqh1fne7rd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqk8k1leryisqh1fne7rd.png" alt="Basic query that samples trades by 1 second" width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why? Because QuestDB won't return anything if there's no data for a timestamp or timestamp range based on the &lt;code&gt;SAMPLE BY&lt;/code&gt; aggregator. To get the missing data missing from the results, you will need to use the &lt;code&gt;FILL&lt;/code&gt; keyword like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;FILL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using this, you can force-fill all the zero-value records for &lt;code&gt;vwap_price&lt;/code&gt; with &lt;code&gt;NULL&lt;/code&gt;. The complete query for finding missing data will look something like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;extract&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vwap_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;volume&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trades&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;symbol&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'BTC-USD'&lt;/span&gt;
   &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DATEADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="n"&gt;FILL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="n"&gt;ALIGN&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;CALENDAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;vwap_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;volume&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;extract&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;vwap_price&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run the query, you will get all the 1s windows where the data was missing, as shown in the image below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vojk2v92kkt5tmk2ngt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vojk2v92kkt5tmk2ngt.jpg" alt="Query to find missing data based on trades data sampled by 1 second" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Again, there is no denying that running one-off, ad-hoc queries that aggregate on lower granularity dimensions, such as 1d or 1m, might not be that hard to do in other databases. However, if you want to keep running these queries at scale, they can create performance issues in your traditional relational database. You can get similar results in other databases if you want to find missing data daily, as shown in the image below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjituo18gow5zonxznr8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnjituo18gow5zonxznr8.png" alt="Query to find missing data based on trades data sampled by 1 day" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you had to perform the same operation in a PostgreSQL database, you'd need to run a &lt;code&gt;generate_series()&lt;/code&gt; function to generate a bunch of data and then join it with the &lt;code&gt;trades&lt;/code&gt; dataset. For the sake of simplicity, let's assume that the timestamp format generated by both systems will be the same. To identify gaps in PostgreSQL, you'll need to write something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;all_seconds&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2022-12-17 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2022-12-17 23:59:59'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 second'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;dummy_timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dummy_timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vwap_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;volume&lt;/span&gt;
          &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;all_seconds&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
          &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;trades&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
            &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dummy_timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;
         &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dummy_timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vwap_price&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PostgreSQL has the advantage of having a generator function that supports all kinds of dummy data generation use cases, as you witnessed above. Not all databases have this function. In MySQL, for instance, you'd have to use recursive common table expressions (CTEs) to get the job done. In some other databases, it might be even more troublesome.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does finding missing data help?
&lt;/h2&gt;

&lt;p&gt;Identifying missing data is of utmost importance because it can tremendously impact the accuracy and reliability of every system or person that consumes it. When it comes to time-series databases, many use cases come to mind, especially those that involve edge computing devices and IoT devices, such as sensors and detectors. &lt;/p&gt;

&lt;p&gt;Take the example of sensors that send data about critical systems in industrial machineries, such as vibration, vibration, torque, pressure, and so on. Data coming from many of these sensors not only help improve machine efficiency but also helps detect early signs of possible machine failures. &lt;/p&gt;

&lt;p&gt;In many cases, this data might help improve safety and reliability too. If the continuous stream of time-series data is broken, i.e., the data is missing, the aforementioned benefits of having real-time data go down the drain  - and can cause more damage than intended as a lot rides on the reliability of these systems. This is why there’s real value in identifying missing data, and QuestDB makes it super easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Continuing our SQL extensions theme, this tutorial walked you through finding missing data using &lt;code&gt;SAMPLE BY&lt;/code&gt;, &lt;code&gt;FILL&lt;/code&gt;, and &lt;code&gt;ALIGN BY CALENDAR&lt;/code&gt; keywords with simple and highly performant queries. This article also explored some benefits of identifying missing data, especially in time-series datasets. Now, it's time for you to give this a shot. There's a system ready for you on the &lt;a href="https://demo.questdb.io/" rel="noopener noreferrer"&gt;demo website&lt;/a&gt;. Take it for a ride!&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>database</category>
      <category>questdb</category>
      <category>opensource</category>
    </item>
    <item>
      <title>SQL extensions for time series data in QuestDB part II</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Mon, 12 Dec 2022 02:35:06 +0000</pubDate>
      <link>https://dev.to/kovidr/sql-extensions-for-time-series-data-in-questdb-part-ii-3ahc</link>
      <guid>https://dev.to/kovidr/sql-extensions-for-time-series-data-in-questdb-part-ii-3ahc</guid>
      <description>&lt;p&gt;This tutorial follows up on the one where we introduced &lt;a href="https://towardsdatascience.com/sql-extensions-for-time-series-data-in-questdb-f6b53acf3213" rel="noopener noreferrer"&gt;SQL extensions in QuestDB&lt;/a&gt; that make time-series analysis easier. In this tutorial, you will learn in detail about the &lt;a href="https://questdb.io/docs/reference/sql/sample-by/" rel="noopener noreferrer"&gt;&lt;code&gt;SAMPLE BY&lt;/code&gt; extension&lt;/a&gt; in QuestDB, which will enable you to work with time-series data efficiently because of its simplicity and flexibility.&lt;/p&gt;

&lt;p&gt;To get started with this tutorial, you should know that &lt;code&gt;SAMPLE BY&lt;/code&gt; is a SQL extension in QuestDB that helps you group or bucket your time-series data based on the &lt;a href="https://questdb.io/docs/concept/designated-timestamp" rel="noopener noreferrer"&gt;designated timestamp&lt;/a&gt;. It removes the need for lengthy &lt;code&gt;CASE WHEN&lt;/code&gt; statements and &lt;code&gt;GROUP BY&lt;/code&gt; clauses. Not only that, the &lt;code&gt;SAMPLE BY&lt;/code&gt; extension helps you quickly deal with many other data-related issues, such as &lt;a href="https://questdb.io/docs/reference/sql/select/#fill" rel="noopener noreferrer"&gt;missing data&lt;/a&gt;, &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#align-to-calendar-time-zone" rel="noopener noreferrer"&gt;incorrect timezones&lt;/a&gt;, and &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#align-to-calendar-with-offset" rel="noopener noreferrer"&gt;offsets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This tutorial assumes you have an up-and-running QuestDB instance ready for use. Let's dive straight into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Import sample data
&lt;/h3&gt;

&lt;p&gt;Similar to the previous tutorial, we'll use &lt;a href="https://s3-eu-west-1.amazonaws.com/questdb.io/datasets/grafana_tutorial_dataset.tar.gz" rel="noopener noreferrer"&gt;the NYC taxi riders data for February 2018&lt;/a&gt;. You can use the following script utilizing the &lt;a href="https://questdb.io/docs/guides/importing-data-rest/" rel="noopener noreferrer"&gt;HTTP REST API&lt;/a&gt; to upload data into QuestDB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://s3-eu-west-1.amazonaws.com/questdb.io/datasets/grafana_tutorial_dataset.tar.gz &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; grafana_data.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; grafana_data.tar.gz

curl &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@taxi_trips_feb_2018.csv http://localhost:9000/imp
curl &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="nv"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;@weather.csv http://localhost:9000/imp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, you can utilize &lt;a href="https://questdb.io/docs/develop/web-console/#import" rel="noopener noreferrer"&gt;the import functionality in the QuestDB console&lt;/a&gt;, as shown in the image below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5mjfga46adif6w4q41m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5mjfga46adif6w4q41m.png" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For &lt;a href="https://questdb.io/docs/guides/importing-data/" rel="noopener noreferrer"&gt;importing large CSV files into partitioned tables&lt;/a&gt;, QuestDB recommends using the &lt;code&gt;COPY&lt;/code&gt; command. Thie method is especially useful when you are trying to migrate data from another database into QuestDB.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create an ordered timestamp column
&lt;/h3&gt;

&lt;p&gt;QuestDB mandates the use of an ordered timestamp column, so you'll have to cast the &lt;code&gt;pickup_datetime&lt;/code&gt; column to &lt;code&gt;TIMESTAMP&lt;/code&gt; in a new table called &lt;code&gt;taxi_trips&lt;/code&gt; with the script below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;taxi_trips&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips_feb_2018.csv'&lt;/span&gt;
   &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;MONTH&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;By converting the &lt;code&gt;pickup_datetime&lt;/code&gt; column to timestamp, you'll allow QuestDB to use it as the &lt;a href="https://questdb.io/docs/concept/designated-timestamp/" rel="noopener noreferrer"&gt;designated timestamp&lt;/a&gt;. Using the designated timestamp column, QuestDB is able to index the table to run time-based queries more efficiently.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If it all goes well, you should see the following data after running a &lt;code&gt;SELECT *&lt;/code&gt; query on the &lt;code&gt;taxi_trips&lt;/code&gt; table:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzvbjgjxb27fe10xn2gd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzvbjgjxb27fe10xn2gd3.png" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the basics of &lt;code&gt;SAMPLE BY&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;SAMPLE BY&lt;/code&gt; extension allows you to create groups and buckets of data based on time ranges. This is especially valuable for time-series data as you can calculate frequently used aggregates with extreme simplicity. &lt;code&gt;SAMPLE BY&lt;/code&gt; offers you the ability to summarize or aggregate data from very fine to very coarse &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#sample-units" rel="noopener noreferrer"&gt;units of time&lt;/a&gt;, i.e., from microseconds to months and everything in between, i.e., millisecond, second, minute, hour, and day. You can derive other units of time, such as a week, fortnight, and year from the ones provided out of the box.&lt;/p&gt;

&lt;p&gt;Let's look at some examples to understand how to use &lt;code&gt;SAMPLE BY&lt;/code&gt; in different scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hourly count of trips
&lt;/h3&gt;

&lt;p&gt;You can use the &lt;code&gt;SAMPLE BY&lt;/code&gt; keyword with the &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#sample-units" rel="noopener noreferrer"&gt;sample unit&lt;/a&gt; of &lt;code&gt;h&lt;/code&gt; to get an hour-by-hour count of trips for the whole duration of the data set. Running the following query, you'll get results in the console:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two ways you can read your data in the QuestDB console: using the grid, which has a tabular form factor or using a chart, where you can draw up a line chart, a bar graph, or an area chart to &lt;a href="https://questdb.io/docs/develop/web-console/#visualizing-results" rel="noopener noreferrer"&gt;visualize your data&lt;/a&gt;. Here's an example of a bar chart drawn from the query mentioned above:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokjd8vnjbt4b99it00pa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fokjd8vnjbt4b99it00pa.png" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Three-hourly holistic summary of trips
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;SAMPLE BY&lt;/code&gt; extension allows you to group data by any arbitrary number of sample units. In the following example, you'll see that the query is calculating a three-hourly summary of trips with multiple aggregate functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;total_passengers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_fare_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_tip_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_earnings&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can view the output of the query in the following grid on the QuestDB console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy1l3cpypry5v92uryzx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy1l3cpypry5v92uryzx.png" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Weekly summary of trips
&lt;/h3&gt;

&lt;p&gt;As mentioned earlier in the tutorial, although there's no sample unit for a week, a fortnight, or a year, you can derive them simply by utilizing the built-in sample units. If you want to sample the data by a week, use &lt;code&gt;7d&lt;/code&gt; as the sampling time, as shown in the query below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;total_passengers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_fare_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_tip_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_earnings&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-28'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7tiqt7nk42epo6mq0ob.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7tiqt7nk42epo6mq0ob.png" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Dealing with missing data
&lt;/h2&gt;

&lt;p&gt;If you've worked a fair bit with data, you already know that data isn't always in a pristine state. One of the most common issues, especially with time-series data, is discontinuity, i.e., scenarios where data is missing for specific time periods. You can quickly identify and deal with missing data using the advanced functionality of the &lt;code&gt;SAMPLE BY&lt;/code&gt; extension.&lt;/p&gt;

&lt;p&gt;QuestDB offers an easy way to generate and fill missing data with the &lt;code&gt;SAMPLE BY&lt;/code&gt; clause. Take the following example: I've deliberately removed data from 4 am to 5 am for the 1st of February 2018. Notice how the &lt;a href="https://questdb.io/docs/reference/sql/fill/" rel="noopener noreferrer"&gt;&lt;code&gt;FILL&lt;/code&gt; keyword&lt;/a&gt;, when used in conjunction with the &lt;code&gt;SAMPLE BY&lt;/code&gt; extension, can generate a row for the hour starting at 4 am and fill it with some data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;total_passengers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_fare_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_tip_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_earnings&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-01T04:00:00'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-01T04:59:59'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="n"&gt;FILL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LINEAR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlmi5xu7svxhz9ki4n2v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlmi5xu7svxhz9ki4n2v.png" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example above, we've used an inline &lt;code&gt;WHERE&lt;/code&gt; clause to emulate missing clause with the help of the &lt;code&gt;NOT BETWEEN&lt;/code&gt; keyword. Alternatively, you can create a separate table with missing trips using the same idea, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips_missing'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt; 
  &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-01T04:00:00'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-01T04:59:59'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ideally, you should use &lt;code&gt;DROP PARTITION&lt;/code&gt; to emulate missing data, but because the data is partitioned by &lt;code&gt;MONTH&lt;/code&gt;, you cannot run the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2018-02-01T04:59:59'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2018-02-01T04:00:00'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#fill-options" rel="noopener noreferrer"&gt;&lt;code&gt;FILL&lt;/code&gt;&lt;/a&gt; keyword demands a &lt;code&gt;fillOption&lt;/code&gt; from the following:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;fillOption&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Usage scenario&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NONE&lt;/td&gt;
&lt;td&gt;When you don't want to populate missing data, and leave it as is&lt;/td&gt;
&lt;td&gt;This is the default &lt;code&gt;fillOption&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;When you want to generate rows for missing time periods, but leave all the values as NULLs&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PREV&lt;/td&gt;
&lt;td&gt;When you want to copy the values of the previous row from the summarized data&lt;/td&gt;
&lt;td&gt;This is useful when you expect the numbers to be similar to the preceding time period&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LINEAR&lt;/td&gt;
&lt;td&gt;When you want to normalize the missing values, you can take the average of the immediately preceding and following row&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONST or x&lt;/td&gt;
&lt;td&gt;When you want to hardcode values where data is missing&lt;/td&gt;
&lt;td&gt;FILL (column_1, column_2, column_3, ...)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's another example of hardcoding values using the FILL(x) &lt;code&gt;fillOption&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fpnoqq0injchsv15lyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7fpnoqq0injchsv15lyp.png" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Working with timezones and offsets
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;SAMPLE BY&lt;/code&gt; extension also enables you to change timezones and add or subtract offsets from your timestamp columns to adjust for any issues you might encounter when dealing with different source systems, especially in other geographic areas. It is important to note that, by default, QuestDB aligns its &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#sample-calculation" rel="noopener noreferrer"&gt;sample calculation&lt;/a&gt; based on the &lt;code&gt;FIRST OBSERVATION&lt;/code&gt;, as shown in the example below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;total_passengers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_fare_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_tip_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_earnings&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-01T13:35:52'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2018-02-28'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsur8gw899nldyml09kgi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsur8gw899nldyml09kgi.png" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note now the &lt;code&gt;1d&lt;/code&gt; sample calculation starts at &lt;code&gt;13:35:52&lt;/code&gt; and ends at &lt;code&gt;13:35:51&lt;/code&gt; the next day. Apart from the one demonstrated above, there are two other ways to align your sample calculations -- to the &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#align-to-calendar-time-zone" rel="noopener noreferrer"&gt;&lt;code&gt;calendar time zone&lt;/code&gt;&lt;/a&gt;, and to &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#align-to-calendar-with-offset" rel="noopener noreferrer"&gt;&lt;code&gt;calendar with offset&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's look at the other two alignment methods now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Aligning sample calculation to another timezone
&lt;/h3&gt;

&lt;p&gt;When moving data from one system to another or via a complex pipeline, you can encounter issues with time zones. For the sake of demonstration, let's assume that you've identified that the data set you've loaded into the database is not for New York City but for Melbourne, Australia. These two cities are far apart and are in very different time zones.&lt;/p&gt;

&lt;p&gt;QuestDB allows you to fix this issue by aligning your data to another timezone using the &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#align-to-calendar-time-zone" rel="noopener noreferrer"&gt;&lt;code&gt;ALIGN TO CALENDAR TIME ZONE&lt;/code&gt; option&lt;/a&gt; with the &lt;code&gt;SAMPLE BY&lt;/code&gt; extension. In the example shown below, you can see how an &lt;code&gt;ALIGN TO CALENDAR TIME ZONE ('AEST')&lt;/code&gt; has helped align the &lt;code&gt;pickup_datetime&lt;/code&gt;, i.e., the designated timestamp column to the AEST timezone for Melbourne.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;total_passengers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_fare_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_tip_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_earnings&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
 &lt;span class="n"&gt;ALIGN&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;CALENDAR&lt;/span&gt; &lt;span class="nb"&gt;TIME&lt;/span&gt; &lt;span class="k"&gt;ZONE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AEST'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmm6fnz4mzcdv5i3q72th.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmm6fnz4mzcdv5i3q72th.png" width="800" height="463"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Aligning sample calculation with offsets
&lt;/h3&gt;

&lt;p&gt;Similar to the previous example, you can also align your sample calculation by &lt;a href="https://questdb.io/docs/reference/sql/sample-by/#align-to-calendar-with-offset" rel="noopener noreferrer"&gt;offsetting the designated timestamp&lt;/a&gt; column manually by any &lt;code&gt;hh:mm&lt;/code&gt; value between -23:59 to 23:59. In the following example, we're offsetting the sample calculation by -5:30, i.e., negative five hours and thirty minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pickup_datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;total_trips&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;passenger_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;total_passengers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trip_distance&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avg_trip_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_fare_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_tip_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fare_amount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tip_amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;total_earnings&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'taxi_trips'&lt;/span&gt;
 &lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;
 &lt;span class="n"&gt;ALIGN&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;CALENDAR&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="s1"&gt;'-05:30'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz6hjvojz5o45efijhda.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz6hjvojz5o45efijhda.png" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you learned how to exploit the &lt;a href="https://questdb.io/docs/reference/sql/sample-by" rel="noopener noreferrer"&gt;&lt;code&gt;SAMPLE BY&lt;/code&gt; extension&lt;/a&gt; in QuestDB to work efficiently with time-series data, especially in aggregated form. In addition, the &lt;code&gt;SAMPLE BY&lt;/code&gt; extension also allows you to fix specific common problems with time-series data attributable to complex data pipelines, disparate source systems in different geographical areas, software bugs, etc. All in all, SQL extensions like &lt;code&gt;SAMPLE BY&lt;/code&gt; provide a significant advantage when working with time-series data by enabling you to achieve more in fewer lines of SQL.&lt;/p&gt;

</description>
      <category>database</category>
      <category>tutorial</category>
      <category>opensource</category>
      <category>sql</category>
    </item>
    <item>
      <title>Alerting Dashboard for Tesla's Stock Price with QuestDB and Grafana</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Thu, 11 Mar 2021 14:12:26 +0000</pubDate>
      <link>https://dev.to/kovidr/alerting-dashboard-for-tesla-s-stock-price-with-questdb-and-grafana-3lk7</link>
      <guid>https://dev.to/kovidr/alerting-dashboard-for-tesla-s-stock-price-with-questdb-and-grafana-3lk7</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;There are many reasons why reacting to time series data is useful, and usually, the quicker you can respond to changes in this data, the better. The best tool for this job is easily a time series database, a type of database designed to write and read large amounts of measurements that change over time.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will learn how to read data from a REST API and stream it to QuestDB, an open-source time-series database. We will use Grafana to visualize the data and alerting to notify Slack on changes that interest us. We use Python to fetch data from the API and stream it to QuestDB and you can easily customize the scripts to check different stocks or even APIs.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuration
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before getting started with the tutorial, you will need the following things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.docker.com/products/docker-desktop" rel="noopener noreferrer"&gt;Docker desktop&lt;/a&gt;&lt;/strong&gt; - we have created a  &lt;a href="https://github.com/questdb/questdb-slack-grafana-alerts" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;  that will enable you to run Grafana and QuestDB in a Docker container. The project README also documents setup steps for Grafana, QuestDB, and Python.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://pypi.org/project/iexfinance/" rel="noopener noreferrer"&gt;IexFinance Account&lt;/a&gt;&lt;/strong&gt; - we will use the IexFinance API for polling stock prices, note that a free account on IexFinance has a limit of 50,000 API calls per month.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://slack.com/intl/en-au/help/articles/206845317-Create-a-Slack-workspace" rel="noopener noreferrer"&gt;Slack workspace&lt;/a&gt;&lt;/strong&gt; - to deliver alerts about Stock prices from Grafana, you'd need a Slack workspace with the ability to create incoming webhooks.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploy QuestDB &amp;amp; Grafana containers using Docker
&lt;/h2&gt;

&lt;p&gt;Firstly  &lt;a href="https://github.com/bsmth/questdb-slack-grafana-alerts" rel="noopener noreferrer"&gt;clone the repository from GitHub&lt;/a&gt; :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone git@github.com:questdb/questdb-slack-grafana-alerts.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;docker-compose up&lt;/code&gt; will bring up two containers that are networked together; Grafana is running on &lt;code&gt;localhost:3000&lt;/code&gt; and QuestDB has a web console available on &lt;code&gt;localhost:9000&lt;/code&gt; as well as a port open on &lt;code&gt;8812&lt;/code&gt;, which can accept Postgres protocol.&lt;/p&gt;

&lt;p&gt;To check if your QuestDB and Grafana containers are working, please visit the aforementioned URLs. Alternatively, you can check the status using &lt;code&gt;docker-compose ps&lt;/code&gt; on the command line, which should show you the following output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4uy99njdgtr5b3dtpyks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4uy99njdgtr5b3dtpyks.png" alt="Screen Shot 2021-03-06 at 12.25.45 am.png" width="800" height="90"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;docker-compose&lt;/code&gt; will also provide Grafana with the default connection credentials to use the Postgres authentication. This means you can use QuestDB as a default data source in Grafana right away without manual configuration steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Python Libraries
&lt;/h2&gt;

&lt;p&gt;All the Python libraries required for this tutorial are listed in the requirements.txt file. Install the requirements using pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ingest mock data into QuestDB
&lt;/h2&gt;

&lt;p&gt;We need some data in QuestDB to create some visualizations and alerts. We can use the IexFinance API to fetch stock prices and  &lt;a href="https://github.com/questdb/questdb-slack-grafana-alerts/blob/main/python/mock_stock_data_example.py" rel="noopener noreferrer"&gt;an additional script to generate dummy data&lt;/a&gt;. The IexFinance API has a cap of 50,000 requests per month in the free account, so our mock script can generate random prices so that we don't max out our trial during testing. To start ingesting mock data into QuestDB, run the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;python
python mock_stock_data_example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script will automatically create a table &lt;code&gt;stock_prices&lt;/code&gt;, and it will start ingesting mock data into this table that contains three columns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;stock&lt;/strong&gt;-listed name of the stock, e.g., TSLA for Tesla. QuestDB has an optimized data type,  &lt;a href="https://questdb.io/docs/concept/symbol/" rel="noopener noreferrer"&gt;symbol&lt;/a&gt; , for text columns that have repetitive values. Read more about that in QuestDB's official documentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;stockPrice&lt;/strong&gt; - price of the stock in USD in double.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;createdDateTime&lt;/strong&gt;- timestamp at which stockPrice was ingested in QuestDB.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the following screenshot, you can see that the data ingested in QuestDB:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgwl9biow15tfvp4wqey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgwl9biow15tfvp4wqey.png" alt="Screen Shot 2021-03-06 at 1.52.21 am.png" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Configure the IexFinance API
&lt;/h2&gt;

&lt;p&gt;Once you have tested the ingestion, you can start using the API with real data. Using this API, you can query stock prices in real-time. As mentioned earlier, there is a cap of 50000 free API calls per month, so make sure you don't cross that limit while on the free plan. To configure IexFinance API, follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a free account on IexFinance.&lt;/li&gt;
&lt;li&gt;Create an API token.&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;Reveal Secret Token&lt;/code&gt; and copy the &lt;code&gt;SECRET&lt;/code&gt; token.&lt;/li&gt;
&lt;li&gt;Create a new file .env in the ./python folder.&lt;/li&gt;
&lt;li&gt;Paste the token in the .env file in the format →&lt;code&gt;IEX_TOKEN=Skwf93hD&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotzgitwdl4ndqgqupypb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotzgitwdl4ndqgqupypb.png" alt="Screen Shot 2021-03-06 at 2.01.49 am.png" width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Configure a Slack Incoming Webhook
&lt;/h2&gt;

&lt;p&gt;Next, we need to create a Slack webhook for sending alert messages from Grafana:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to  &lt;a href="https://api.slack.com/apps?new_app=1" rel="noopener noreferrer"&gt;https://api.slack.com/apps?new_app=1&lt;/a&gt;  &lt;/li&gt;
&lt;li&gt;Name your Slack app &lt;em&gt;QuestDB Stock Price Alerts&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;In Features and functionality choose incoming webhooks&lt;/li&gt;
&lt;li&gt;Activate incoming webhooks and click &lt;code&gt;Add New Webhook&lt;/code&gt; to Workspace&lt;/li&gt;
&lt;li&gt;Select the channel to allow the app to post to and click &lt;code&gt;Allow&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Copy the Webhook URL which is in the following format → &lt;code&gt;https://hooks.slack.com/services/T123/B0123/2Fb...&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Create a Notification channel in Grafana
&lt;/h2&gt;

&lt;p&gt;Go to &lt;code&gt;localhost:3000&lt;/code&gt; in your browser. To enable connectivity between Grafana and Slack for alerting, click &lt;code&gt;Add Channel&lt;/code&gt; in the &lt;code&gt;Alerting &amp;gt; Notification channels&lt;/code&gt; section as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7tmy0dccs99hyaek7ah7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7tmy0dccs99hyaek7ah7.png" alt="Screen Shot 2021-03-06 at 12.19.37 am.png" width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Paste the Slack Incoming Webhook URL in the Url field while creating a new notification channel as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6aumcdfm85tpp1c0ntu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6aumcdfm85tpp1c0ntu.png" alt="Screen Shot 2021-03-06 at 12.21.27 am.png" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can quickly test if the webhook is working fine by pressing the Test button on the screen above. This will trigger a notification from Grafana to be published on Slack. You can see an example notification below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d8jl1s4r6364ri528ky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d8jl1s4r6364ri528ky.png" alt="Screen Shot 2021-03-06 at 12.22.52 am.png" width="800" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a Grafana Panel &amp;amp; Setup the Alert
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://grafana.com/docs/grafana/latest/panels/add-a-panel/" rel="noopener noreferrer"&gt;Set up a Grafana panel&lt;/a&gt;  that hosts the real-time graph of TSLA stock price using the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to &lt;strong&gt;+&lt;/strong&gt; &lt;code&gt;Create and select Dashboard&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;+&lt;/strong&gt; &lt;code&gt;Add new panel&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In the panel, click the pencil icon or click &lt;code&gt;Edit SQL&lt;/code&gt; and paste the following example query:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;createdDatetime&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
       &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stockPrice&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;avgPrice&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stock_prices&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;createdDatetime&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'TSLA'&lt;/span&gt;
&lt;span class="n"&gt;SAMPLE&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feh6tgs5nd6u8w0we09bn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feh6tgs5nd6u8w0we09bn.png" alt="Screen Shot 2021-03-06 at 3.46.12 am.png" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After creating the Grafana panel with the query shown in the image above. Save the dashboard. To create an alert on &lt;code&gt;TSLA&lt;/code&gt; stock price, perform the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Edit the panel in the dashboard.&lt;/li&gt;
&lt;li&gt;Go to the &lt;code&gt;Alert&lt;/code&gt; tab and name the alert Tesla Stock Price alert.&lt;/li&gt;
&lt;li&gt;Set &lt;em&gt;Evaluate every 10 seconds for 30 seconds&lt;/em&gt; (Evaluate every signifies how often the scheduler will evaluate the alert rule and for specifies how long the query needs to violate the thresholds before triggering alert notifications).&lt;/li&gt;
&lt;li&gt;Set the conditions to &lt;em&gt;WHEN min() OF query(5-second Avg. of TSLA, 30s, now() IS BELOW 762&lt;/em&gt;. In other words, the conditions for alerting are met if the minimum value of the query named 5-second Avg. of TSLA is below 762 in the last 30 seconds.&lt;/li&gt;
&lt;li&gt;In the No Data &amp;amp; Error Handling section, use the defaults.&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;Notifications → Send&lt;/code&gt; to, select the &lt;code&gt;notification channel&lt;/code&gt; that we set up earlier named Stock Price Alerts.&lt;/li&gt;
&lt;li&gt;Add the message The 5-second bucketed average of TSLA stock price has gone below 762 in the last 30 seconds.&lt;/li&gt;
&lt;li&gt;Save the Panel.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can see the steps in action below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87luoifckbrlp1pkcf52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87luoifckbrlp1pkcf52.png" alt="Screen Shot 2021-03-06 at 4.20.30 am.png" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the condition is met, Grafana will trigger an alert and send a notification to Slack. The notification will be something like the screenshot below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6e9k0xqa9v6cluiakoch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6e9k0xqa9v6cluiakoch.png" alt="Screen Shot 2021-03-06 at 4.33.08 am.png" width="800" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To understand the alert status changes more deeply, you can visit the State history. It will show you the timeline of transition from one status to another. You can see an example of state history below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy76mbpgl5pe4x7h42aa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhy76mbpgl5pe4x7h42aa.png" alt="Screen Shot 2021-03-06 at 4.38.14 am.png" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To learn more about building dashboards for time series data with Grafana, there's  &lt;a href="https://questdb.io/tutorial/2020/10/19/grafana/" rel="noopener noreferrer"&gt;another tutorial on QuestDB's website&lt;/a&gt;  with a link to example data to try out more features in detail.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this tutorial, you learned how Grafana could confluence with QuestDB using the PostgreSQL endpoint. Using the data ingested from an API into QuestDB, you learned how to visualize that data in a Grafana dashboard and set up alerts based on some predefined conditions. You also learned how to publish alert messages to external tools like Slack. For more information on any of the topics covered in this tutorial, please visit QuestDB's official documentation.&lt;/p&gt;

</description>
      <category>database</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>SQL Extensions for Time-Series Data in QuestDB</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Mon, 18 Jan 2021 13:24:36 +0000</pubDate>
      <link>https://dev.to/kovidr/sql-extensions-for-time-series-data-in-questdb-li8</link>
      <guid>https://dev.to/kovidr/sql-extensions-for-time-series-data-in-questdb-li8</guid>
      <description>&lt;p&gt;In this tutorial, you are going to learn about QuestDB SQL extensions which prove to be very useful with time-series data. Using some sample data sets, you will learn how designated timestamps work, and how to use extended SQL syntax to write queries on time-series data.&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Traditionally, SQL has been used for relational databases and data warehouses. In recent years there has been an exponential increase in the amount of data that connected systems produce, which has brought about a need for new ways to store and analyze such information. For this reason, time-series analytics have proved critical for making sense of real-time market data in financial services, sensor data from IoT devices, and application metrics.&lt;/p&gt;

&lt;p&gt;This explosion in the volume of time-series data led to the development of specialized databases designed to ingest and process time-series data as efficiently as possible. QuestDB achieves this while supporting standard ANSI SQL with native extensions for time series analysis.&lt;/p&gt;

&lt;p&gt;Apart from that, QuestDB also makes the syntax easier by implementing implicit clauses. It also includes a random data generation feature which is extremely useful for exploring the functionality of the database as well as in database testing. Although there is much to talk about QuestDB’s SQL dialect, in this tutorial you will learn about SQL extensions.&lt;/p&gt;

&lt;p&gt;Throughout this tutorial, we’ll be using two data sets. The first one is taxi trips data for New York City for the month of February 2018. It contains information about the number of passengers, trip fare, tip amount and the start datetime of the trip. You can find out average earnings per number of passengers, tipping behaviour of NYC taxi riders, busiest times of the day, and so on.&lt;/p&gt;

&lt;p&gt;The second data set contains weather information for 10 years starting from 1st January 2010 to 1st January 2020. This dataset contains information about temperature, windspeed, rainfall, depth of snow, visibility, and more. You can use this data to analyse how the weather patterns emerge over long periods of time. You can also compare weather during the same time of the year for different years. To get started, you can install the aforementioned data sets using the following shell script:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  SQL Extensions
&lt;/h1&gt;

&lt;p&gt;While implementing ANSI SQL to the greatest extent, QuestDB has introduced some time-series specific SQL extensions to enhance performance and query reading and writing experience of the database users and developers. Let’s look into all of the SQL extensions one by one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timestamp Search
&lt;/h3&gt;

&lt;p&gt;A time-series database isn’t complete if it doesn’t provide a method to search across time. In QuestDB, you can partition tables by time intervals. Each partition will be saved in a separate set of files on disk. To provide a relational database-like optimization of pruning partitions, QuestDB offers the feature of Timestamp search.&lt;/p&gt;

&lt;p&gt;To benefit from this feature, a table should have a designated timestamp column. Any timestamp column can be marked as the designated timestamp column either while creating the table or while creating temporary sub-tables within a query. The designated timestamp column forces the table to have records in increasing time order. Hence, it implicitly enforces a constraint which rejects any out-of-order inserts. Rather than rejecting the out-of-order inserts, QuestDB is &lt;a href="https://github.com/questdb/questdb/issues/172" rel="noopener noreferrer"&gt;already working on accepting delayed records out of order&lt;/a&gt;. Timestamp search can also be performed using the normal ≥, ≤, &amp;lt;, &amp;gt; operators but it is not as efficient as it is using designated timestamps.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Another benefit of the designated timestamp column is that it enables the efficient use of ASOF joins which are specialized joins to join tables based on timestamp where timestamps don’t match exactly. A prerequisite for using getting deterministic results from an ASOF join is that the data in the table should be ordered by time. Designated timestamp columns enforce time ordering in a table.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The two sample data sets were imported directly from a CSV file and a table was created on-the-fly. Although you can &lt;a href="https://questdb.io/docs/reference/api/rest/#overview" rel="noopener noreferrer"&gt;create a designated timestamp while importing the data&lt;/a&gt;, it is important to understand how to deal with tables that don’t have a designated timestamp. So, let’s create the designated timestamp now and partition the two tables by month.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Using designated timestamp search notation, you can simplify your timestamp-based searches on tables. The following example queries the weather dataset. In this example, you can see that the same operator can be used to query many different time ranges. The first part of the UNION will give you the count of records for the whole year of 2019 while the second part of the UNION will give you the count of records for the month of December in 2019, and so on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp4v5xxtq3qx2qvept5dy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp4v5xxtq3qx2qvept5dy.png" alt="Alt Text" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  &lt;a href="https://questdb.io/docs/reference/sql/latest-by/" rel="noopener noreferrer"&gt;LATEST BY&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This SQL extension finds the latest entry for a given key or combination of keys by timestamp. The functionality of LATEST BY is similar to functions like FIRST, FIRST_VALUE, etc. which are available in traditional relational databases and data warehouses.&lt;/p&gt;

&lt;p&gt;In a relational database, you’d either have to first find out the latest timestamp and using a subquery find the farePlusTip amount for the passengerCount, or you’d have to use one of the aforementioned analytic functions like FIRST_VALUE. QuestDB makes life easier for database users and developers by creating a new clause for serving the purpose of finding the latest records per group.&lt;/p&gt;

&lt;p&gt;In the following example, you will see that by using LATEST BY clause, based on the passengerCount, we can find out what the farePlusTrip amount was for the latest trip completed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foxkg6o9r90zs0z4jboci.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foxkg6o9r90zs0z4jboci.png" alt="Alt Text" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  &lt;a href="https://questdb.io/docs/reference/sql/sample-by/" rel="noopener noreferrer"&gt;SAMPLE BY&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This is another extension that is optimal for time-series data as it allows the grouping of data based on timestamp without explicitly providing timestamp ranges in the where clause. You can bucket your data into chunks of time using this extension.&lt;/p&gt;

&lt;p&gt;In regular SQL, you’d need to use a combination of CASE WHEN statements, GROUP BY clause, and WHERE clause to get similar results. In QuestDB, SAMPLE BY does the trick. To use this SQL extension, you need to make sure that the table has a designated timestamp column.&lt;/p&gt;

&lt;p&gt;In the following example, you’ll see the data is sampled or grouped by a day using 24h as the SAMPLE_SIZE in the SAMPLE BY clause. Depending upon the frequency of data ingested into the table, you might need to adjust the size of the bucket by adjusting SAMPLE_SIZE.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fv3oxjor0jicddw93h17u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fv3oxjor0jicddw93h17u.png" alt="Alt Text" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is common in time-series database to have really low granularity. Hence, it is common to have data grouped by intervals of time ranging from seconds to years. Here are some more examples demonstrating how to go about using the SAMPLE BY clause for different sample sizes:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  Changes to the usual SQL Syntax
&lt;/h1&gt;

&lt;p&gt;Apart from the SQL extensions, there are a few changes to the usual SQL syntax to enhance database user experience. The changes are related to GROUP BY and HAVING clauses. The idea behind doing this is to simplify query writing, improving readability, and ease-of-use of the SQL dialect while Reducing SQL verbosity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optional &lt;a href="https://questdb.io/docs/reference/sql/group-by/" rel="noopener noreferrer"&gt;GROUP BY&lt;/a&gt; clause
&lt;/h3&gt;

&lt;p&gt;Because of the widespread use of aggregation functions in time-series databases, QuestDB implicitly groups the aggregation results to make the query writing experience better. While GROUP BY keyword is supported by QuestDB, it would not make any difference to the result set if you include it in your query or not. Let’s see an example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fngy5c9pf7mahnpqnco88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fngy5c9pf7mahnpqnco88.png" alt="Alt Text" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  Implicit &lt;a href="https://questdb.io/docs/concept/sql-extensions/#implicit-having" rel="noopener noreferrer"&gt;HAVING&lt;/a&gt; clause
&lt;/h3&gt;

&lt;p&gt;As HAVING is always used only with the GROUP BY clause, HAVING clause automatically becomes implicit with an optional GROUP BY clause as mentioned above. Let’s see an example for this too:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmxsfpgn0cjfjjsp5qaks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmxsfpgn0cjfjjsp5qaks.png" alt="Alt Text" width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h3&gt;
  
  
  Optional &lt;a href="https://questdb.io/docs/concept/sql-extensions/#optionality-of-select--from" rel="noopener noreferrer"&gt;SELECT * FROM&lt;/a&gt; phrase
&lt;/h3&gt;

&lt;p&gt;QuestDB goes a step further by making the SELECT * FROM phrase optional. This one really helps reduce verbosity by a lot when there are nested subqueries involved. In QuestDB, just writing the name of the table and executing the statement will act as a SELECT * FROM TABLE_NAME statement. Please look at the example below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyro0s2ylono1jvve5pt3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyro0s2ylono1jvve5pt3.png" alt="Alt Text" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All of these improvements help reduce the effort required to write and maintain queries in a time-series database like QuestDB at scale.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this tutorial, you learned how QuestDB supports SQL and enhances performance and developer experience by writing custom SQL extensions specially designed for time-series databases. You also learned about a few syntactical changes in QuestDB’s SQL dialect. If you are interested in knowing more about QuestDB, please visit QuestDB’s &lt;a href="https://questdb.io/docs/introduction/" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This tutorial was originally published on &lt;a href="https://towardsdatascience.com/sql-extensions-for-time-series-data-in-questdb-f6b53acf3213" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>database</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
    <item>
      <title>When To Cache?</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Tue, 02 Jun 2020 16:50:46 +0000</pubDate>
      <link>https://dev.to/kovidr/when-to-cache-34gp</link>
      <guid>https://dev.to/kovidr/when-to-cache-34gp</guid>
      <description>&lt;p&gt;In Spark, memory is used for two purposes -&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; - store/cache the data that will be used later. This data usually occupies memory for a long time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt; - memory space used for certain operations like joins, srots, aggregations and especially shuffles (when data is shuffled between two executors because of partition skew)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One has to be careful with caching. You want to cache only something you're sure is going to be needed later for a transformation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Resource Allocation and Caching&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The idea behind dynamic allocation is that you define the initial, min and max number of executors. Spark starts with the initial executors, then based on the load it decides whether or not to start the other executors. Once the heavy lifting the processing is done, Spark asks the executors to be released. This is a great way to increase efficiency but what if you had cached something?&lt;/p&gt;

&lt;p&gt;Because you had cached something, now the executors can't be freed unless that cached data is moved to another set of executors. And that's what you need to bear in mind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache vs. Persist&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We'll just talk about RDDs (and not Datasets). So, there's no real difference between &lt;code&gt;cache()&lt;/code&gt; and &lt;code&gt;persist()&lt;/code&gt;. They're both used for the same purpose. Both of these functions cache your data in &lt;code&gt;MEMORY_ONLY&lt;/code&gt; as that's the default storage level for both in RDDs. For Datasets, it's &lt;code&gt;MEMORY_AND_DISK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can obviously go and change the storage level depending on the use case. Also, there's no &lt;code&gt;uncache()&lt;/code&gt; function. There's just &lt;code&gt;unpersist()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Databricks has improved caching by quite a bit by introducing Delta caching but it is limited to some file formats.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>pyspark</category>
      <category>spark</category>
    </item>
    <item>
      <title>Defensive SQL Query Writing</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Mon, 01 Jun 2020 15:44:59 +0000</pubDate>
      <link>https://dev.to/kovidr/defensive-sql-query-writing-3jm5</link>
      <guid>https://dev.to/kovidr/defensive-sql-query-writing-3jm5</guid>
      <description>&lt;p&gt;Derived from defensive programming, defencive query writing is a practice which tries to make sure that a query run doesn't fail. Just like in application development, try to remove the scope for silly mistakes and control unforseen circumstances. Sometimes the decision comes down to whether you want your query to fail or you want it to run even if with some incorrect data. I'll share a couple of simple examples where we can employ these practices while writing queries. &lt;/p&gt;

&lt;h2&gt;
  
  
  Checking If Objects Exist
&lt;/h2&gt;

&lt;p&gt;Take the very basic example of creating and dropping database objects. Rather than using a &lt;code&gt;CREATE TABLE xyz&lt;/code&gt;, use &lt;code&gt;CREATE TABLE IF NOT EXISTS xyz (id int)&lt;/code&gt; or if you want to recreate the table losing all the data you can run &lt;code&gt;DROP TABLE IF EXISTS xyz&lt;/code&gt; and then &lt;code&gt;CREATE TABLE IF NOT EXISTS xyz (id int)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The same practice can be used with the creation and deletion of databases, views, indexes, triggers, procedures, functions and more. I have come to realize that in most cases, using this is helpful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using database and column aliases
&lt;/h2&gt;

&lt;p&gt;Prevent yourself from getting &lt;em&gt;ambiguous column&lt;/em&gt; errors. See in the example below, the column &lt;code&gt;city&lt;/code&gt; might be present both in &lt;code&gt;TABLE_1&lt;/code&gt; and &lt;code&gt;TABLE_2&lt;/code&gt;. How do you expect the database to know which field you want it to pick up.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;It's generally a very good practice to create aliases for database objects and then access those database objects and their child objects using the alias rather than the complete name. Obviously for doing this efficiently, you'd need to follow a SQL Style Sheet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using LIMIT
&lt;/h2&gt;

&lt;p&gt;No, I'm not talking about using &lt;code&gt;LIMIT&lt;/code&gt; to restrict the number of records in your final query. Rather, I am talking about queries like this.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;LIMIT&lt;/code&gt; clase in the subquery returning a column is important because it prevents the query from failing if the subquery returns more than one row, that is, if there is more than one record in &lt;code&gt;TABLE_2&lt;/code&gt; for every record in &lt;code&gt;TABLE_1&lt;/code&gt;. This is a really useful trick to write better queries.&lt;/p&gt;

&lt;p&gt;These are three of the most common scenarios which, if not taken care of, can prevent your query from running at all. Obviously, all of these come with an asterix. More on that later.&lt;/p&gt;

&lt;p&gt;Please feel free to share other practices that you have followed!&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Complete Data Engineer's Vocabulary</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Sun, 31 May 2020 08:59:27 +0000</pubDate>
      <link>https://dev.to/kovidr/complete-data-engineer-s-vocabulary-kio</link>
      <guid>https://dev.to/kovidr/complete-data-engineer-s-vocabulary-kio</guid>
      <description>&lt;p&gt;I recently wrote a post on for Towards Data Science summarizing most of the technologies and basic concepts that a data engineer should know about - &lt;a href="https://towardsdatascience.com/complete-data-engineers-vocabulary-87967e374fad" rel="noopener noreferrer"&gt;Complete Data Engineer's Vocabulary&lt;/a&gt;. I say that it's the complete vocabulary but it really isn't - it tries to cover most of the tech that people are using these days. I'll make sure to keep it updated.&lt;/p&gt;

&lt;p&gt;The whole idea was to summarize every technology or concept in 10 words or less - an activity that I thoroughly enjoyed. It's one thing to use a technology or know a concept, but it really gets tough when you have to explain these things to someone else and that too using the minimum amount of words possible.&lt;/p&gt;

&lt;p&gt;I have already had a lot of suggestions for additions on the original post. More suggestions are welcome!&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>python</category>
      <category>sql</category>
    </item>
    <item>
      <title>Platform For Online Concerts</title>
      <dc:creator>Kovid Rathee</dc:creator>
      <pubDate>Sat, 30 May 2020 14:56:56 +0000</pubDate>
      <link>https://dev.to/kovidr/platform-for-online-concerts-2570</link>
      <guid>https://dev.to/kovidr/platform-for-online-concerts-2570</guid>
      <description>&lt;p&gt;Amidst the current Covid-19 crisis, the world is suffering from all kinds of problems, but the one I wanted to ask pertains to independent musicians, especially classical musicians, who don't have any gigs now because everything is shut down. Is there a good platform which can provide these arists a great way to share their music with their audience?&lt;/p&gt;

&lt;p&gt;Livestreams on Facebook, Instagram don't really work because of bad quality and bad monetization options. YouTube does have a subscription option which could work but it doesn't really make sense just for concerts. How about a per gig kind of deal where people could listen to their favorite artists perform by paying.&lt;/p&gt;

&lt;p&gt;Stuff like Patreon, Ko-fi and YouTube's SuperChat etc. definitely helps but I'm thinking more about a dedicated platform for niche audiences - which some of these classical arts could also survive. On Facebook, Instagram etc., there's just too much noise. You can't play a Mozart concert on MTV for the same reason.&lt;/p&gt;

&lt;p&gt;Suggestions/ideas are welcome! Stay safe!&lt;/p&gt;

</description>
      <category>startup</category>
      <category>music</category>
    </item>
  </channel>
</rss>
