<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Haki</title>
    <description>The latest articles on DEV Community by Haki (@haki).</description>
    <link>https://dev.to/haki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F163874%2F99cf1c38-5d6f-41e9-bb23-95c99166a4f6.png</url>
      <title>DEV Community: Haki</title>
      <link>https://dev.to/haki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/haki"/>
    <language>en</language>
    <item>
      <title>How to Get the First or Last Value in a Group Using Group By in SQL</title>
      <dc:creator>Haki</dc:creator>
      <pubDate>Mon, 12 Aug 2019 21:00:00 +0000</pubDate>
      <link>https://dev.to/haki/how-to-get-the-first-or-last-value-in-a-group-using-group-by-in-sql-3j0o</link>
      <guid>https://dev.to/haki/how-to-get-the-first-or-last-value-in-a-group-using-group-by-in-sql-3j0o</guid>
      <description>&lt;p&gt;I recently had to produce reports on a table containing events of a user's account balance. The user can deposit and withdraw from their account, and support personnel can set the account's credit, which is the maximum amount the user can overdraw.&lt;/p&gt;

&lt;p&gt;The table looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT * FROM event;
 id | account |    type    |      happened_at       |               data
----+---------+------------+------------------------+-----------------------------------
  1 |       1 | created    | 2019-08-01 15:14:13+03 | {"credit": 0, "delta_balance": 0}
  2 |       1 | deposited  | 2019-08-01 15:15:15+03 | {"delta_balance": 100}
  3 |       1 | withdraw   | 2019-08-02 09:35:33+03 | {"delta_balance": -50}
  4 |       1 | credit_set | 2019-08-03 16:14:12+03 | {"credit": 50}
  5 |       1 | withdraw   | 2019-08-03 14:45:44+03 | {"delta_balance": -30}
  6 |       1 | credit_set | 2019-08-03 16:14:12+03 | {"credit": 100}
  7 |       1 | withdraw   | 2019-08-03 16:15:09+03 | {"delta_balance": -50}
(7 rows)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---DcvcHFj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/v1/%7Bstatic%7D/images/00-sql-group-by-first-last-value.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---DcvcHFj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/v1/%7Bstatic%7D/images/00-sql-group-by-first-last-value.jpg" alt='What deposit boxes used to look like. Photo by &amp;lt;a href="https://unsplash.com/@tjevans"&amp;gt;Tim Evans&amp;lt;/a&amp;gt;'&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To get the current balance of an account, we sum the changes in &lt;code&gt;delta_balance&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    account,
    SUM((data-&amp;gt;'delta_balance')::int) AS balance
FROM
    event
GROUP BY
    account;

 account | balance
---------+---------
       1 |     -30
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;data&lt;/code&gt; field contains information specific to each type of event. To extract the value of &lt;code&gt;delta_balance&lt;/code&gt; from the &lt;code&gt;data&lt;/code&gt; column we use the &lt;a href="https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-JSON-OP-TABLE"&gt;arrow operator provided by PostgreSQL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The result of the query shows that the current balance of account 1 is -30. This means the account is in overdraft. To check if this is within the allowed range, we need to compare it to the credit set for this account. The credit for account 1 was initially set to 0 when the account was created. The credit was then adjusted twice, and is currently set to 100.&lt;/p&gt;

&lt;p&gt;To get the current state of an account, we need its aggregated balance and the latest credit that was set for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;In Oracle there is a function called &lt;a href="https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions071.htm"&gt;&lt;code&gt;last&lt;/code&gt;&lt;/a&gt; we can be use to get the last &lt;code&gt;credit_set&lt;/code&gt; event. A query using &lt;code&gt;last&lt;/code&gt; might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Oracle&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;KEEP&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DENSE_RANK&lt;/span&gt; &lt;span class="k"&gt;LAST&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;credit&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;PostgreSQL also has a &lt;a href="https://www.postgresql.org/docs/current/functions-window.html#id-1.5.8.26.6.2.2.10.1.1"&gt;&lt;code&gt;LAST_VALUE&lt;/code&gt;&lt;/a&gt; analytic function. Analytics functions cannot be used in a group by the way aggregate functions do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    account,
    LAST_VALUE(data) OVER (
        PARTITION BY account ORDER BY id
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS credit
FROM
    event
WHERE
    type = 'credit_set'
GROUP BY
    account;

ERROR:  column "event.data" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3:     LAST_VALUE(data) OVER (
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The error tells us that the &lt;code&gt;data&lt;/code&gt; field used in the analytic function must be used in the group by. This is not really what we want. To use PostgreSQL's &lt;code&gt;LAST_VALUE&lt;/code&gt; function, we need to remove the group by:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    account,
    LAST_VALUE(data) OVER (
        PARTITION BY account ORDER BY id
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS credit
FROM
    event
WHERE
    type = 'credit_set';

 account |     credit
---------+-----------------
       1 | {"credit": 100}
       1 | {"credit": 100}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;These are not exactly the results we need. Analytic, or window functions, operate on a set of rows, and not in a group by.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL doesn't have a built-in function to obtain the first or last value in a group using group by.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To get the last value in a group by, we need to get creative!&lt;/p&gt;




&lt;h2&gt;
  
  
  Old Fashioned SQL
&lt;/h2&gt;

&lt;p&gt;The plain SQL solution is to divide and conquer. We already have a query to get the current balance of an account. If we write another query to get the credit for each account, we can join the two together and get the complete state of an account.&lt;/p&gt;

&lt;p&gt;To get the last event for each account in PostgreSQL we can use &lt;a href="https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT"&gt;&lt;code&gt;DISTINCT ON&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT DISTINCT ON (account)
    account,
    data-&amp;gt;'credit' AS credit
FROM
    event
WHERE
    type = 'credit_set'
ORDER BY
    account,
    id DESC;

account | credit
---------+--------
       1 | 100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Great! Using &lt;code&gt;DISTINCT ON&lt;/code&gt; we got the last credit set for each account.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;DISTINCT ON: I've written about &lt;a href="https://hakibenita.com/the-many-faces-of-distinct-in-postgre-sql"&gt;the many faces of DISTINCT in PostgreSQL&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The next step is to join the two queries, and get the complete account state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    a.account,
    a.balance,
    b.credit
FROM
    (
    SELECT
        account,
        SUM((data-&amp;gt;'delta_balance')::int) AS balance
    FROM
        event
    GROUP BY
        account

    ) AS a
    JOIN
    (

    SELECT DISTINCT ON (account)
        account,
        data-&amp;gt;'credit' AS credit
    FROM
        event
    WHERE
        type = 'credit_set'
    ORDER BY
        account,
        id DESC

    ) AS b

    ON a.account = b.account;

 account | balance | credit
---------+---------+--------
       1 |     -30 | 100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We got the expected result.&lt;/p&gt;

&lt;p&gt;Before we move on, let's take a glance at the execution plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Hash Join  (cost=44.53..49.07 rows=4 width=44)
   Hash Cond: (event.account = b.account)
   -&amp;gt;  HashAggregate  (cost=25.00..27.00 rows=200 width=12)
         Group Key: event.account
         -&amp;gt;  Seq Scan on event  (cost=0.00..17.50 rows=750 width=36)
   -&amp;gt;  Hash  (cost=19.49..19.49 rows=4 width=36)
         -&amp;gt;  Subquery Scan on b  (cost=19.43..19.49 rows=4 width=36)
               -&amp;gt;  Unique  (cost=19.43..19.45 rows=4 width=40)
                     -&amp;gt;  Sort  (cost=19.43..19.44 rows=4 width=40)
                           Sort Key: event_1.account, event_1.id DESC
                           -&amp;gt;  Seq Scan on event event_1  (cost=0.00..19.39 rows=4 width=40)
                                 Filter: (type = 'credit_set'::text)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The event table is being scanned twice, once for each subquery. The &lt;code&gt;DISTINCT ON&lt;/code&gt; subquery also requires a sort by &lt;code&gt;account&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt;. The two subqueries are then joined using a hash-join.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL is unable to combine the two subqueries into a single scan of the table.&lt;/strong&gt; If the event table is very large, performing two full table scans, and a sort and a hash join, might become slow and consume a lot of memory.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Common Table Expression: It's tempting to use common table expression (CTE) to make the query more readable. But, &lt;a href="https://hakibenita.com/be-careful-with-cte-in-postgre-sql"&gt;CTE's are currently optimization fences&lt;/a&gt;, and using it here will most definitely prevent PostgreSQL from performing any optimization that involves both subqueries.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Array Trick
&lt;/h2&gt;

&lt;p&gt;Using good ol' SQL got us the answer, but it took two passes on the table. We can do better with the following trick:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db#=&amp;gt;SELECT
    account,
    SUM((data-&amp;gt;'delta_balance')::int) AS balance,
    (MAX(ARRAY[id, (data-&amp;gt;'credit')::int]) FILTER (WHERE type = 'credit_set'))[2] AS credit
FROM
    event
GROUP BY
    account;

 account | balance | credit
---------+---------+--------
       1 |     -30 |    100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is so much simpler than the previous question, so let's break it down.&lt;/p&gt;

&lt;h3&gt;
  
  
  How PostgreSQL Compares Arrays
&lt;/h3&gt;

&lt;p&gt;To understand what exactly is going on here, we first need to understand &lt;a href="https://www.postgresql.org/docs/current/functions-array.html"&gt;how PostgreSQL compares arrays&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Array comparisons compare the array contents element-by-element, using the default B-tree comparison function for the element data type.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When comparing arrays, PostgreSQL will go element by element and compare the values according to their type. To demonstrate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT greatest(ARRAY[1, 200], ARRAY[2, 100]);
 greatest
----------
 {2,100}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The first element of the second array (2) is larger than the first element of the first array (1), so it's the greatest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT greatest(ARRAY[1, 200], ARRAY[1, 201]);
 greatest
----------
 {1,201}
(1 row)

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The first elements of both arrays are equal (1), so PostgreSQL moves on to the next element. In this case, the second element of the second array (201) is the greatest.&lt;/p&gt;

&lt;h3&gt;
  
  
  Max by Key...
&lt;/h3&gt;

&lt;p&gt;Using this feature of PostgreSQL, we construct an array where the first element is the value to sort by, and the second element is the value we want to keep. In our case, we want to get the credit by the max &lt;code&gt;id&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Not all events set credit, so we need to restrict the result to &lt;code&gt;credit_set&lt;/code&gt; events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The result of this expression is an array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db#=&amp;gt;SELECT
    account,
    MAX(ARRAY[id, (data-&amp;gt;'credit')::int]) FILTER (WHERE type = 'credit_set'))
FROM
    event
GROUP BY
    account;

 account |   max
---------+---------
       1 | {6,100}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We only want the second element, the value of &lt;code&gt;credit&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And this is it! This way we can get the last credit set for each account.&lt;/p&gt;

&lt;p&gt;Next, let's examine the execution plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=32.50..34.50 rows=200 width=16)
   Group Key: account
   -&amp;gt;  Seq Scan on event  (cost=0.00..17.50 rows=750 width=72)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Simple plan for a simple query!&lt;/p&gt;

&lt;p&gt;The main benefit of this approach is that it only needs one pass of the table and no sort.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveats
&lt;/h3&gt;

&lt;p&gt;This approach is very useful, but it has some restrictions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All elements must be of the same type.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL does not support &lt;a href="https://www.postgresql.org/docs/current/arrays.html#ARRAYS-DECLARATION"&gt;arrays with different types of elements&lt;/a&gt;. As an example, if we wanted to get the last credit set by date, and not by id:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    account,
    SUM((data-&amp;gt;'delta_balance')::int) AS balance,
    (MAX(
        ARRAY[happened_at, (data-&amp;gt;'credit')::int]
    ) FILTER (WHERE type = 'credit_set'))[2] AS credit
FROM
    event
GROUP BY
    account;

ERROR:  ARRAY types timestamp with time zone and integer cannot be matched
LINE 4:     (MAX(ARRAY[happened_at, (data-&amp;gt;'credit')::int]) FILTER...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;PostgreSQL tells us that an array cannot contain both timestamps and integers.&lt;/p&gt;

&lt;p&gt;We can overcome this restriction in some cases by casting one of the elements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    account,
    SUM((data-&amp;gt;'delta_balance')::int) AS balance,
    (MAX(
        ARRAY[EXTRACT('EPOCH' FROM happened_at), (data-&amp;gt;'credit')::int]
    ) FILTER (WHERE type = 'credit_set'))[2] AS credit
FROM
    event
GROUP BY
    account;

 account | balance | credit
---------+---------+--------
       1 |     -30 |    100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We &lt;a href="https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT"&gt;converted the timestamp to epoch&lt;/a&gt;, which is the number of seconds since 1970. Once both elements are of the same type, we can use the array trick.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query might consume a little more memory.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one if a bit of a stretch, but as &lt;a href="https://dev.tohow-we-solved-a-storage-problem-in-postgre-sql-without-adding-a-single-bytes-of-storage"&gt;we demonstrated in the&lt;br&gt;
past&lt;/a&gt;,&lt;br&gt;
large group and sort keys consume more memory in joins and sorts. Using the array trick, the group key is an array, which is a bit larger than the plain fields we usually sort by.&lt;/p&gt;

&lt;p&gt;Also, the array trick can be used to "piggyback" more than one value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'EPOCH'&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;happened_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This query will return both the last credit set, and the date in which it was set. The entire array is used for sorting, so the more values we put in the array, the larger the group key gets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The array trick is very useful, and can significantly simplify complicated queries and improve performance. We use it to produce reports on time series data, and to generate read models from event tables.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Call out to my readers:&lt;/strong&gt; I'm pretty sure using &lt;a href="https://www.postgresql.org/docs/current/xaggr.html"&gt;user-defined aggregate function in PostgreSQL&lt;/a&gt; it should be possible to create a function with the signature &lt;code&gt;MAX_BY(key, value)&lt;/code&gt;. I haven't had time to dig deep into custom aggregate functions, but if any of the readers do, please share your implementation and i'll be happy to post it here.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;UPDATED: Aug 17, 2019&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Comments From Readers
&lt;/h2&gt;

&lt;p&gt;In the few days following the publication of this article, I received several suggestions and comments from readers. This is a summary of the comments I received, and my thoughts on them.&lt;/p&gt;




&lt;p&gt;One &lt;a href="https://www.reddit.com/r/PostgreSQL/comments/cpskf7/how_to_get_the_first_or_last_value_in_a_group/ewsbdqo/"&gt;commenter on Reddit&lt;/a&gt; suggested using &lt;code&gt;ARRAY_AGG&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT
    account,
    ((ARRAY_AGG(data ORDER BY happened_at DESC)
        FILTER (WHERE type = 'credit_set'))[1] -&amp;gt; 'credit')::int AS credit
FROM
    event
GROUP BY account;

 account | credit
---------+--------
       1 |     50
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This approach obviously works, and it doesn't require the key and the value to be of the same type, which is a big limitation of the array trick.&lt;/p&gt;

&lt;p&gt;The downside to this approach is that it requires a sort which might become expensive with very large data sets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                            QUERY PLAN
------------------------------------------------------------------
 GroupAggregate  (cost=1.17..1.36 rows=7 width=8)
   Group Key: account
   -&amp;gt;  Sort  (cost=1.17..1.19 rows=7 width=76)
         Sort Key: account
         -&amp;gt;  Seq Scan on event  (cost=0.00..1.07 rows=7 width=76)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;






&lt;p&gt;Another &lt;a href="https://www.reddit.com/r/PostgreSQL/comments/cpskf7/how_to_get_the_first_or_last_value_in_a_group/ewu6juf/"&gt;commenter on Reddit&lt;/a&gt; suggested using window functions in combination with &lt;code&gt;DISTINCT ON&lt;/code&gt;. This is the original suggestion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;FIRST_VALUE&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;LAST_VALUE&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt;
&lt;span class="n"&gt;WINDOW&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;account&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;UNBOUNDED&lt;/span&gt; &lt;span class="n"&gt;PRECEDING&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;UNBOUNDED&lt;/span&gt; &lt;span class="n"&gt;FOLLOWING&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The query uses both &lt;code&gt;DISTINCT ON&lt;/code&gt; and window functions. It works by calculating the aggregates using the window function on the entire set (all rows of the account), and then fetch the first or last row using &lt;code&gt;DISTINCT ON&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To make the window functions behave like a "group by" and calculate the aggregates on the entire set, the bound is defined as &lt;code&gt;BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING&lt;/code&gt;, meaning "for the entire partition".&lt;/p&gt;

&lt;p&gt;To avoid repeating the window for every aggregate, the query uses &lt;a href="https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW"&gt;a WINDOW clause&lt;/a&gt; to define a named window that can be used multiple times in the query.&lt;/p&gt;

&lt;p&gt;This query however, is not really working because the where clause is restricted to events with type &lt;code&gt;credit_set&lt;/code&gt;. To get the complete status of the account, we also need to aggregate the balance of &lt;em&gt;all&lt;/em&gt; events.&lt;/p&gt;

&lt;p&gt;To actually make this approach work, we need to make the following adjustments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db=# SELECT DISTINCT ON (account)
    account,
    LAST_VALUE((data-&amp;gt;'credit')::int) OVER (
        PARTITION BY account
        ORDER BY (
            CASE
                WHEN type = 'credit_set' THEN happened_at
                ELSE null
            END
        ) ASC NULLS FIRST
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as credit,
    SUM(COALESCE((data-&amp;gt;'delta_balance')::int, 0)) OVER (PARTITION BY account) AS balance
FROM
    event;

 account | credit | balance
---------+--------+---------
       1 |    100 |     -30
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What changes did we make:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We had to ditch the where clause so all events are processed.&lt;/li&gt;
&lt;li&gt;We also had to do some "creative sorting" to get the &lt;code&gt;last_credit&lt;/code&gt; set event.&lt;/li&gt;
&lt;li&gt;We removed the named window because it was no longer reused.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plan also gotten more complicated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Unique  (cost=1.19..1.55 rows=7 width=24)
   -&amp;gt;  WindowAgg  (cost=1.19..1.54 rows=7 width=24)
         -&amp;gt;  WindowAgg  (cost=1.19..1.38 rows=7 width=48)
               -&amp;gt;  Sort  (cost=1.19..1.20 rows=7 width=44)
                     Sort Key: account, (CASE WHEN (type = 'credit_set'::text)
                                         THEN happened_at ELSE NULL::timestamp with time zone
                                         END) NULLS FIRST
                     -&amp;gt;  Seq Scan on event  (cost=0.00..1.09 rows=7 width=44)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Two sorts, and several  aggregates. The bottom line, in my opinion, is that this approach is harder to maintain and it yields a significantly more complicated plan. I wouldn't use it in this case. However, like the previous approach, it is not restricted by the type of the key and value.&lt;/p&gt;




&lt;p&gt;In response to &lt;a href="https://twitter.com/mdevanr/status/1161275759786418177"&gt;my tweet&lt;/a&gt;, a reader pointed me to &lt;a href="https://wiki.postgresql.org/wiki/First/last_(aggregate)"&gt;an old wiki page&lt;/a&gt; with an implementation of two custom aggregate functions &lt;code&gt;FIRST&lt;/code&gt; and &lt;code&gt;LAST&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After creating the custom aggregates in the database as instructed in the wiki page, the query can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'delta_balance'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;credit&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The main issue I found with this approach is that the order seems to be arbitrary. First and last can only be defined in the context of some order. I couldn't find a way to provide a field to sort by, so I consider this approach flawed for this use case.&lt;/p&gt;




&lt;p&gt;As I suspected, this use case is ideal for custom aggregates and extensions, and indeed, &lt;a href="https://www.reddit.com/r/PostgreSQL/comments/cpskf7/how_to_get_the_first_or_last_value_in_a_group/ewudq11/"&gt;another reader on Reddit&lt;/a&gt; pointed me &lt;a href="https://pgxn.org/dist/first_last"&gt;the extension "first_last"&lt;/a&gt;. The API is roughly similar to the custom aggregate above, but is also offers a way to sort the results so the first and last are not arbitrary.&lt;/p&gt;

&lt;p&gt;I did not install the extension, but the query should look something like that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'delta_balance'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;LAST&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'credit'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;happened_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;FILTER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'credit_set'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;credit&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



</description>
      <category>postgres</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
