<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Edipo Vinicius da Silva</title>
    <description>The latest articles on DEV Community by Edipo Vinicius da Silva (@edipox).</description>
    <link>https://dev.to/edipox</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F68304%2F2b03c27f-5b57-4e6f-9ce6-566a617f5187.png</url>
      <title>DEV Community: Edipo Vinicius da Silva</title>
      <link>https://dev.to/edipox</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/edipox"/>
    <language>en</language>
    <item>
      <title>Handling PostgreSQL slow counting on Elixir</title>
      <dc:creator>Edipo Vinicius da Silva</dc:creator>
      <pubDate>Thu, 19 Apr 2018 18:10:04 +0000</pubDate>
      <link>https://dev.to/edipox/handling-postgresql-slow-counting-onelixir-mkc</link>
      <guid>https://dev.to/edipox/handling-postgresql-slow-counting-onelixir-mkc</guid>
      <description>



&lt;p&gt;Some time ago I found an unexpected issue when working on a simple &lt;a href="http://jsonapi.org/"&gt;JSON API&lt;/a&gt; project that should provide paginated resources. &lt;/p&gt;

&lt;p&gt;Most JSON API friendly libraries perform a record count in order to provide the pagination links on each response, especially for the &lt;strong&gt;last page link&lt;/strong&gt;. But under some conditions a simple count query can be &lt;strong&gt;very&lt;/strong&gt; slow:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Specifically, in this case, this query was taking over 6sec when performed on a table with over 2.7M records.&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding the issue
&lt;/h2&gt;

&lt;p&gt;This count query is high time consuming due to &lt;a href="http://momjian.us/main/writings/pgsql/mvcc.pdf"&gt;PostgreSQL's MVCC implementation&lt;/a&gt;. The MVCC (Multiversion Concurrency Control) is a method commonly used to provide data consistency when multiple processes are accessing the same data. This is usually done by taking snapshots of the data, so when concurrent transactions are working on the same data, each one will read a "version" of a given database object based on the transaction time.&lt;/p&gt;

&lt;p&gt;In order to count the table rows, PostgreSQL needs to ensure that they exist. But given the multiple snapshots/versions of each record, PostgreSQL is unable to summarize the whole table directly. So instead PostgreSQL reads each row, performing a &lt;strong&gt;sequential scan.!&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to fix this?
&lt;/h2&gt;

&lt;p&gt;There are different approaches to fix this issue, including a &lt;a href="http://www.varlena.com/GeneralBits/120.php"&gt;trigger-based mechanism&lt;/a&gt;. In my case, it was acceptable to use an estimated count of the rows, and luckily, &lt;a href="https://www.postgresql.org/docs/current/static/catalog-pg-class.html"&gt;PostgreSQL provides exactly that via &lt;strong&gt;pg_class reltuples&lt;/strong&gt;&lt;/a&gt;:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;reltuples&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;estimate&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="o"&gt;=&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Wait.. What is reltuples?
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;[It is the] Number of rows in the table. This is only an estimate used by the planner.&lt;/em&gt; - &lt;a href="https://www.postgresql.org/docs/current/static/catalog-pg-class.html"&gt;PostgreSQL: Documentation: pg_class&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The accuracy of this estimate depends on how frequently the planner updates pg_class reltuples:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It is updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX.&lt;/em&gt; - &lt;a href="https://www.postgresql.org/docs/current/static/catalog-pg-class.html"&gt;PostgreSQL: Documentation: pg_class&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You are probably not executing CREATE and INDEX that often, but don't worry, ANALYZE will also update this value. VACUUM and ANALYZE are automatically executed by the &lt;a href="https://www.postgresql.org/docs/current/static/routine-vacuuming.html#AUTOVACUUM"&gt;autovacuum daemon&lt;/a&gt; which is enabled by default.&lt;/p&gt;

&lt;p&gt;Notice this solution is based on the &lt;strong&gt;total&lt;/strong&gt; estimate record count, so it won't work for cases when you need to filter query results.&lt;/p&gt;




&lt;h2&gt;
  
  
  OK, let's write some Elixir code!
&lt;/h2&gt;

&lt;p&gt;In our implementation, we were using &lt;a href="https://github.com/drewolson/scrivener_ecto"&gt;scrivener_ecto&lt;/a&gt; which is a very nice package that helps you to add pagination to your Ecto queries.&lt;/p&gt;

&lt;p&gt;Recently we added the &lt;a href="https://github.com/drewolson/scrivener_ecto/pull/42"&gt;total_entries&lt;/a&gt; option to &lt;code&gt;scrivener_ecto&lt;/code&gt; so any custom logic can be used to calculate the total_entries and fix similar situations.&lt;/p&gt;

&lt;p&gt;This is the function that handles the pagination based on a given &lt;code&gt;conn&lt;/code&gt; and page parameters. We pass the query to &lt;code&gt;Scrivener.paginate&lt;/code&gt; with a config struct containing the &lt;code&gt;total_entries&lt;/code&gt; option:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="nv"&gt;@spec&lt;/span&gt; &lt;span class="n"&gt;handle_pagination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="no"&gt;Plug&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;atom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_size&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;atom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;
        &lt;span class="no"&gt;Plug&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Conn&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;handle_pagination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;%&lt;/span&gt;&lt;span class="no"&gt;Scrivener&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;module:&lt;/span&gt; &lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;page_number:&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;page_size:&lt;/span&gt; &lt;span class="n"&gt;page_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;options:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;total_entries:&lt;/span&gt; &lt;span class="n"&gt;total_entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Scrivener&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paginate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Notice the call to &lt;code&gt;total_entries&lt;/code&gt; on the previous function. We get the estimated value by executing SQL code to retrieve &lt;code&gt;reltuples&lt;/code&gt; directly:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Retrieves the estimated record count&lt;/span&gt;
&lt;span class="nv"&gt;@spec&lt;/span&gt; &lt;span class="n"&gt;total_entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;total_entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="no"&gt;Ecto&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;Adapters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;SQL&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sd"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT reltuples::BIGINT AS estimate FROM pg_class WHERE relname=$1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We have a missing piece on this puzzle, given that we don’t know the table name to query &lt;em&gt;pg_class’ reltuples&lt;/em&gt;. To fix this we read the &lt;code&gt;__schema__(:source)&lt;/code&gt; from the query:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Determines the table name based on a given query&lt;/span&gt;
&lt;span class="nv"&gt;@spec&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;module_from_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__schema__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieves the model module from a given query&lt;/span&gt;
&lt;span class="nv"&gt;@spec&lt;/span&gt; &lt;span class="n"&gt;module_from_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;module_from_query&lt;/span&gt;&lt;span class="p"&gt;(%{&lt;/span&gt;&lt;span class="ss"&gt;from:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;}}),&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;
&lt;span class="k"&gt;defp&lt;/span&gt; &lt;span class="n"&gt;module_from_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;In certain cases, when it’s possible to use relatively accurate values, we can take advantage of PostgreSQL estimated count capabilities. This plus the very open elixir community that allows us to add new options to existing packages makes an almost effortless transition from an unacceptable issue to a &lt;strong&gt;very&lt;/strong&gt; efficient endpoint: taking less than 0.5 ms in my computer with half a million records.&lt;/p&gt;

&lt;p&gt;Suggestions and comments are more than welcome!&lt;/p&gt;


</description>
      <category>elixir</category>
      <category>postgres</category>
      <category>phoenixframework</category>
      <category>jsonapi</category>
    </item>
  </channel>
</rss>
