<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Craig Kerstiens</title>
    <description>The latest articles on DEV Community by Craig Kerstiens (@craigkerstiens).</description>
    <link>https://dev.to/craigkerstiens</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F11870%2FxntmISMa.jpg</url>
      <title>DEV Community: Craig Kerstiens</title>
      <link>https://dev.to/craigkerstiens</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/craigkerstiens"/>
    <language>en</language>
    <item>
      <title>Getting started with GitHub event data on Citus</title>
      <dc:creator>Craig Kerstiens</dc:creator>
      <pubDate>Tue, 19 Sep 2017 17:37:10 +0000</pubDate>
      <link>https://dev.to/craigkerstiens/getting-started-with-github-event-data-on-citus</link>
      <guid>https://dev.to/craigkerstiens/getting-started-with-github-event-data-on-citus</guid>
      <description>&lt;p&gt;Getting an example schema and data is often one of the more time consuming parts of testing a database. To make that easier for you, we're going to walk through Citus with an open data set which almost any developer can relate to–github event data. If you already have your own schema, data, and queries you want to test with, by all means use it. If you need any help with getting setup, join us in our &lt;a href="https://slack.citusdata.com" rel="noopener noreferrer"&gt;Slack channel&lt;/a&gt; and we'll be happy to talk through different data modeling options for your own data. &lt;/p&gt;

&lt;h3&gt;
  
  
  An overview of the schema and queries
&lt;/h3&gt;

&lt;p&gt;The data model we're going to work with here is simple, we have &lt;code&gt;users&lt;/code&gt; and &lt;code&gt;events&lt;/code&gt;. An event can be a fork or a commit related to an organization and of course many more.&lt;/p&gt;

&lt;p&gt;To get started we're going to login to &lt;a href="https://console.citusdata.com/users/sign_up" rel="noopener noreferrer"&gt;Citus Cloud&lt;/a&gt; and provision a production cluster. &lt;em&gt;You can absolutely use the dev plan which only costs ~ $3 a day, but in this case we're going to use the production instance so we can easily resize it towards the ends&lt;/em&gt;. Once you've provisioned your cluster you can connect to it with your standard &lt;a href="http://www.craigkerstiens.com/2013/02/13/How-I-Work-With-Postgres/" rel="noopener noreferrer"&gt;Postgres psql&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;psql&lt;/span&gt; &lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;citus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;nHmf5NObkfOsKmvfni0Fg&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fnq7xkf34cjb6vfdubz3cp6427a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citusdata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;citus&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="n"&gt;sslmode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;require&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd3vv6lp55qjaqc.cloudfront.net%2Fitems%2F412o3i1S25441s0s3M02%2FCraigs_Github_Test___Citus_Console.png%3FX-CloudApp-Visitor-Id%3De4475d145dcf11ebcffabf840edcc11f%26v%3D7ee6cbe7" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd3vv6lp55qjaqc.cloudfront.net%2Fitems%2F412o3i1S25441s0s3M02%2FCraigs_Github_Test___Citus_Console.png%3FX-CloudApp-Visitor-Id%3De4475d145dcf11ebcffabf840edcc11f%26v%3D7ee6cbe7"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we're going to set up our two tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;                                                                   
&lt;span class="p"&gt;(&lt;/span&gt;                                                                                            
    &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                         
    &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                         
    &lt;span class="n"&gt;event_public&lt;/span&gt; &lt;span class="nb"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                    
    &lt;span class="n"&gt;repo_id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                          
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                           
    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                              
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                          
    &lt;span class="n"&gt;org&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                               
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;                                                                     
&lt;span class="p"&gt;);&lt;/span&gt;                                                                                           

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;github_users&lt;/span&gt;                                                                    
&lt;span class="p"&gt;(&lt;/span&gt;                                                                                            
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                          
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                                
    &lt;span class="n"&gt;login&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                              
    &lt;span class="n"&gt;avatar_url&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                         
    &lt;span class="n"&gt;gravatar_id&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                                                        
    &lt;span class="n"&gt;display_login&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;                                                                       
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the &lt;code&gt;payload&lt;/code&gt; field of events we have a &lt;a href="https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/" rel="noopener noreferrer"&gt;JSONB&lt;/a&gt; datatype. JSONB is the JSON datatype in binary form in Postgres. This makes it easy to store a more flexible schema in a single column and with Postgres we can create a &lt;code&gt;GIN&lt;/code&gt; index on this which will index every key and value within it. With a &lt;code&gt;GIN&lt;/code&gt; index it becomes fast and easy to query with various conditions directly on that payload. So we'll go ahead and create a couple of indexes before we load our data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX event_type_index ON github_events (event_type);                                                  
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we’ll actually take those standard Postgres tables and tell Citus to shard them out. To do so we’ll run a query for each table. With this query we’ll specify the table we want to shard, as well as the key we want to shard it on. In this case we’ll shard both the events and users table on &lt;code&gt;user_id&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;create_distributed_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'github_events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                                 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;create_distributed_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'github_users'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we're ready to load some data. You can download the two example files &lt;a href="https://examples.citusdata.com/users.csv" rel="noopener noreferrer"&gt;users.csv&lt;/a&gt; and &lt;a href="https://examples.citusdata.com/events.csv" rel="noopener noreferrer"&gt;events.csv&lt;/a&gt;. We also have a &lt;a href="https://examples.citusdata.com/large_events.csv" rel="noopener noreferrer"&gt;large_events.csv&lt;/a&gt; available, which may be more interesting to try out, though admittedly takes longer to download and load. Once downloaded connect with psql and load the data with \copy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="n"&gt;github_users&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;CSV&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Querying
&lt;/h3&gt;

&lt;p&gt;Now we're all setup for the fun part, actually running some queries. Let's start with something really basic. A simple &lt;code&gt;count (*)&lt;/code&gt; to see how much data we loaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT count(*) from github_events;
 count
--------
 126245
(1 row)

Time: 177.491 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So a nice simple count works. We'll come back to that sort of aggregation in a bit, but for now let’s look at a few other queries. Within the JSONB &lt;code&gt;payload&lt;/code&gt; column, we've got a good bit of data, but it varies based on event type. For a &lt;code&gt;PushEvent&lt;/code&gt; type there is a size associated with it which includes the number of distinct commits in each push. With this we could perform something like the total number of commits per hour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'hour'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'distinct_size'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;num_commits&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;github_events&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'PushEvent'&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;hour&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;num_commits&lt;/span&gt;
&lt;span class="c1"&gt;---------------------+-------------&lt;/span&gt;
 &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt; &lt;span class="mi"&gt;05&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="mi"&gt;22160&lt;/span&gt;
 &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt; &lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="mi"&gt;53562&lt;/span&gt;
 &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt; &lt;span class="mi"&gt;07&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="mi"&gt;46540&lt;/span&gt;
 &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt; &lt;span class="mi"&gt;08&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;       &lt;span class="mi"&gt;35002&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;186&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;176&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But we also had our &lt;code&gt;users&lt;/code&gt; table. Since we sharded both users and events on the same id it means that data is &lt;a href="https://www.citusdata.com/blog/2016/12/22/scaling_out_sql_with-colocation/" rel="noopener noreferrer"&gt;co-located&lt;/a&gt; together and can easily be joined. &lt;em&gt;In certain cases, like for &lt;a href="https://www.citusdata.com/blog/2016/10/03/designing-your-saas-database-for-high-scalability/" rel="noopener noreferrer"&gt;multi-tenant&lt;/a&gt; data models you'll find sharding on a tenant_id makes scaling out quite straight-forward.&lt;/em&gt; If we join on the &lt;code&gt;user_id&lt;/code&gt; it should pass down to all the distributed shards without us having to do any extra work. An example of something we may want to do with this, is to find the users who created the most repositories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;login&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;github_users&lt;/span&gt; &lt;span class="n"&gt;gu&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'CreateEvent'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;@&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'{"ref_type": "repository"}'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;login&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
             &lt;span class="n"&gt;login&lt;/span&gt;              &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="c1"&gt;---------------------------------+-------&lt;/span&gt;
 &lt;span class="n"&gt;atomist&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;                &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;60&lt;/span&gt;
 &lt;span class="n"&gt;isisliu&lt;/span&gt;                         &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;60&lt;/span&gt;
 &lt;span class="n"&gt;atomist&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;staging&lt;/span&gt;        &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;55&lt;/span&gt;
 &lt;span class="n"&gt;direwolf&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;github&lt;/span&gt;                 &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;50&lt;/span&gt;
 &lt;span class="n"&gt;circle&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;                 &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;40&lt;/span&gt;
 &lt;span class="n"&gt;uncoil&lt;/span&gt;                          &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;23&lt;/span&gt;
 &lt;span class="n"&gt;kvo91&lt;/span&gt;                           &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;14&lt;/span&gt;
 &lt;span class="n"&gt;ranasarikaya&lt;/span&gt;                    &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;10&lt;/span&gt;
 &lt;span class="n"&gt;Alexgallo91&lt;/span&gt;                     &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;9&lt;/span&gt;
 &lt;span class="n"&gt;marcvl&lt;/span&gt;                          &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;9&lt;/span&gt;
 &lt;span class="n"&gt;Joshua&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;Zheng&lt;/span&gt;                    &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="mi"&gt;8&lt;/span&gt;
 &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scaling out
&lt;/h3&gt;

&lt;p&gt;One of the major benefits of Citus, is that when you need to, you can scale out your database as opposed to scaling up. This means a more horizontal path to scaling your database and you won't run into some ceiling of the largest instance you can find. Of course there are a &lt;a href="https://www.citusdata.com/blog/2017/01/17/parallel-indexing-with-citus/" rel="noopener noreferrer"&gt;few other benefits&lt;/a&gt; to scaling out as opposed to up as well. When you do need to scale out, on Citus Cloud it's as simple as going to the settings and resizing your formation. Once you change your size and it takes effect, you then need to rebalance your data so it's distributed across all nodes. &lt;/p&gt;

&lt;p&gt;First let's look to see how the data resides:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;nodename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_dist_shard_placement&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;nodename&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                 &lt;span class="n"&gt;nodename&lt;/span&gt;                 &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="c1"&gt;------------------------------------------+-------&lt;/span&gt;
 &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;32&lt;/span&gt;
 &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;32&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;83&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;659&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;As you can see there is an equal number of shards on each node&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now let's hop back into the console in our settings area and resize the cluster. To do this login to Citus Cloud, click on the settings tab and hit resize. You'll now see the slider that allows you to resize your cluster, scale to what you desire and click Resize. Give it a few minutes and all your nodes will now be available.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd3vv6lp55qjaqc.cloudfront.net%2Fitems%2F2b2f133M3o2u1B1f1s0L%2Fresizer.gif%3FX-CloudApp-Visitor-Id%3De4475d145dcf11ebcffabf840edcc11f%26v%3Df0fb03bd" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fd3vv6lp55qjaqc.cloudfront.net%2Fitems%2F2b2f133M3o2u1B1f1s0L%2Fresizer.gif%3FX-CloudApp-Visitor-Id%3De4475d145dcf11ebcffabf840edcc11f%26v%3Df0fb03bd"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But&lt;/strong&gt;, nothing has changed to your data. To begin taking effect of your new nodes you'll want to run the rebalancer. When you run the rebalancer we move shards from one physical instance to another so your data is more evenly distributed. As this happens, writes continue to flow as normal and writes are held at the coordinator. Once the operation completes those writes on the coordinator continue to flow through. This means no reads were delayed and no writes were lost throughout the entire operation. Even better, for multi-tenant apps we move all co-located shards in concert with each other so joins between them continue to operate as you'd expect. &lt;/p&gt;

&lt;p&gt;Now, let's run the rebalancer. When you run it you'll see output for each shard that is moved from one node to another:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;rebalance_table_shards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'github_events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;NOTICE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;00000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Moving&lt;/span&gt; &lt;span class="n"&gt;shard&lt;/span&gt; &lt;span class="mi"&gt;102072&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;197&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;188&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;CONTEXT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;PL&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;pgSQL&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;rebalance_table_shards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;real&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt; &lt;span class="k"&gt;at&lt;/span&gt; &lt;span class="n"&gt;RAISE&lt;/span&gt;
&lt;span class="k"&gt;LOCATION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;exec_stmt_raise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pl_exec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3165&lt;/span&gt;
&lt;span class="n"&gt;NOTICE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;00000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Moving&lt;/span&gt; &lt;span class="n"&gt;shard&lt;/span&gt; &lt;span class="mi"&gt;102073&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;197&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;247&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;111&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="n"&gt;CONTEXT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;PL&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;pgSQL&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;rebalance_table_shards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;real&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt; &lt;span class="k"&gt;at&lt;/span&gt; &lt;span class="n"&gt;RAISE&lt;/span&gt;
&lt;span class="k"&gt;LOCATION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;exec_stmt_raise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pl_exec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3165&lt;/span&gt;
&lt;span class="n"&gt;NOTICE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;00000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Moving&lt;/span&gt; &lt;span class="n"&gt;shard&lt;/span&gt; &lt;span class="mi"&gt;102074&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;197&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;188&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can re-run the query to show us all the shard placements and see our new even distribution of shards:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;nodename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_dist_shard_placement&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;nodename&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                  &lt;span class="n"&gt;nodename&lt;/span&gt;                  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="c1"&gt;--------------------------------------------+-------&lt;/span&gt;
 &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;197&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;188&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;16&lt;/span&gt;
 &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;41&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;    &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;16&lt;/span&gt;
 &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;198&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;16&lt;/span&gt;
 &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;197&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;247&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;111&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compute&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amazonaws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;    &lt;span class="mi"&gt;16&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And with this we can go back to our &lt;code&gt;count (*)&lt;/code&gt; query as well. Now that we've doubled the resources in our cluster, queries that can be parallelized will be performed much faster. Running our basic &lt;code&gt;count (*)&lt;/code&gt; query we'll see that the query time is now nearly half of what it was before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
 &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="c1"&gt;--------&lt;/span&gt;
 &lt;span class="mi"&gt;126245&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;97&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;792&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Get started today
&lt;/h3&gt;

&lt;p&gt;If you're in need of a dataset to give to kick the tires with Citus I hope this helps. Feel free to download Citus for free and or sign-up on &lt;a href="https://console.citusdata.com/users/sign_up" rel="noopener noreferrer"&gt;Citus Cloud&lt;/a&gt; and get started.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://www.citusdata.com/blog/2017/01/27/getting-started-with-github-events-data/" rel="noopener noreferrer"&gt;Originally posted on the Citus Data blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>Sharding a multi-tenant app with Postgres</title>
      <dc:creator>Craig Kerstiens</dc:creator>
      <pubDate>Wed, 19 Jul 2017 15:06:28 +0000</pubDate>
      <link>https://dev.to/craigkerstiens/sharding-a-multi-tenant-app-with-postgresql</link>
      <guid>https://dev.to/craigkerstiens/sharding-a-multi-tenant-app-with-postgresql</guid>
      <description>&lt;p&gt;Whether you’re building marketing analytics, a portal for e-commerce sites, or an application to cater to schools, if you’re building an app and your customer is another business then a multi-tenant approach is the norm. The same code runs for all customers, but each customer sees their own private data set, &lt;em&gt;except in the cases of holistic internal reporting&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;Early in your application’s life, customer data has a simple structure which evolves organically. Typically all information relates to a central customer/user/tenant table. With a smaller amount of data (10’s of GB) it’s easy to scale the application by &lt;a href="http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/" rel="noopener noreferrer"&gt;throwing more hardware&lt;/a&gt; at it, but what happens when you’ve had so much success that your data no longer fits in memory on a single box, or you need more concurrency? You scale out by re-architecting your application–and it’s often painful (and expensive.) &lt;/p&gt;

&lt;h2&gt;
  
  
  Options for scaling out your database
&lt;/h2&gt;

&lt;p&gt;This scale-out model for databases has worked well for the likes of &lt;a href="http://research.google.com/pubs/pub41344.html" rel="noopener noreferrer"&gt;Google&lt;/a&gt; and &lt;a href="http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;, but doesn't have to be as complicated as you might think.&lt;/p&gt;

&lt;p&gt;If you're able to model your multi-tenant data in the right way &lt;a href="http://www.craigkerstiens.com/2012/11/30/sharding-your-database/" rel="noopener noreferrer"&gt;sharding&lt;/a&gt; can be much simpler–you do not need to re-architect your application to scale out, and youcan keep the power you need from a database including joins, indexing, and more. I work at &lt;a href="https://www.citusdata.com/" rel="noopener noreferrer"&gt;Citus Data&lt;/a&gt;, where we’ve created a database that scales out Postgres (an extension to Postgres, actually): we’ve done the hard work of sharding so you don’t have to. While Citus lets you scale out your processing power and memory and storage, how you model your data will determine the ease and flexibility you get from the system. If you're building a multi-tenant SaaS application, hopefully the following example highlights how you can plan early for scaling without having to contort too much of your application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling out Postgres with ease by adopting a multi-tenant data model
&lt;/h3&gt;

&lt;p&gt;At the core of most SaaS applications, tenancy is already built in, whether you realize it or not. By “tenancy”, we mean the notion that your SaaS application has multiple customers (“tenants”) who are all sharing the same application but whose data needs to be kep separate from each other. (The same way that multiple tenants can live in the same building, but each have their own separate apartment.)  &lt;/p&gt;

&lt;p&gt;Anyway, as we mentioned above, you may have a users table. Let's look at a very basic SaaS schema that highlights this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;stores&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;owner_email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;owner_password&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;last_login_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;store_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;purchases&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;store_id&lt;/span&gt; &lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;purchased_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above schema highlights an &lt;em&gt;overly simplified&lt;/em&gt; multi-tenant e-commerce site. Say for example, someone like an Etsy or Shopify. And of course there are a number of queries you would run against this:&lt;/p&gt;

&lt;p&gt;List the products for a particular store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
       &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;store_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or let’s say you want to compute how many purchases exist weekly for a given store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purchased_at&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;purchases&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;stores&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stores_id&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;store_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;â€˜&lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From here you could envision how to give each store its own presence and analytics. Now if we fast-forward a bit and start to look at scaling this out then we have a choice to make on how we'll shard the data.&lt;/p&gt;

&lt;p&gt;The easiest level to do this at is the tenant level or in this case on &lt;code&gt;store_id&lt;/code&gt;. With the above data model the largest tables over time are likely to be products and purchases, we could shard on both of these. Though if we choose products or purchases, the difficulty lies in the fact that we may want to do queries that focus on some high level item such as store. If we choose &lt;code&gt;store_id&lt;/code&gt; then all data for a particular store would exist on the same node, this would allow you push down all computations directly to the a single node.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-tenancy and co-location, a perfect pair
&lt;/h3&gt;

&lt;p&gt;Co-locating data within the same physical instance avoids sending data over the network during joins. This can result in much faster operations. With Citus, there are a number of ways to move your data around so you can join and query it in a flexible manner, but for this class of multi-tenant SaaS apps it’s simple if you can ensure data ends up on the shard. To do this though we need to push down our &lt;code&gt;store_id&lt;/code&gt; to all of our tables. &lt;/p&gt;

&lt;p&gt;The key that makes this all possible is including your &lt;code&gt;store_id&lt;/code&gt; on all tables. By doing this you can easily shard out all your data so it’s located on the same shard. In the above data model we coincidentally had &lt;code&gt;store_id&lt;/code&gt; on all of our tables, but if it weren’t there you could add it. This would put you in a good position to distribute all your data so it’s stored on the same nodes. So now let’s try sharding our tenants, in this case stores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;master_create_distributed_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stores'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'hash'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;master_create_distributed_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'products'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'store_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'hash'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;master_create_distributed_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'purchases'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'store_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'hash'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;master_create_worker_shards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stores'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;master_create_worker_shards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'products'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;master_create_worker_shards&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'purchases'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you’re all set. &lt;em&gt;Again, you’ll notice that we shard everything by store_id–this allows all queries to be routed to a single Postgres instance.&lt;/em&gt; The same queries as before should work just fine for you as long as you have store_id on your query. An example layout of your data now may look something like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5jsbkzwafbil1d1gs5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5jsbkzwafbil1d1gs5m.png" alt="Example of multi-tenant sharded tables" width="715" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The alternative to colocation is to choose some lower-level shard key such as orders or products. This has a trade-off of making joins and querying more difficult because you have to send more data over the network and make sure things work in a distributed way. This lower-level key can be useful for consumer focused datasets, if your analytics are always against the entire data set as is often the case in metrics-focused use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  To scale out your Postgres database, you have more good choices than you think.
&lt;/h3&gt;

&lt;p&gt;It’s important to note as we did in our &lt;a href="https://www.citusdata.com/blog/2016/07/25/sharding-json-in-postgres-and-performance/" rel="noopener noreferrer"&gt;sharding JSON in Postgres post&lt;/a&gt; that different distribution models can have different benefits and trade-offs. In some cases modeling on a lower-level entity id such as products or purchases can be the right choice– you gain more parallelism for analytics and trade off simplicity in querying a single store. Either choice of picking a multi-tenant data model or adopting a more &lt;a href="https://www.citusdata.com/blog/2016/07/25/sharding-json-in-postgres-and-performance/" rel="noopener noreferrer"&gt;distributed document model&lt;/a&gt; can be made to scale, but each comes with its own trade-offs. If you have the need today to scale out your multi-tenant app then give &lt;a href="https://console.citusdata.com" rel="noopener noreferrer"&gt;Citus Cloud&lt;/a&gt; a try or if you have any questions on which type of scale-out data model might work best for your application,  please don’t hesitate to &lt;a href="https://www.citusdata.com/about/contact_us" rel="noopener noreferrer"&gt;reach out to my team at Citus&lt;/a&gt;. We can help. (And did we mention that Citus is available as open source, as a database service in AWS, and on-prem?)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://www.citusdata.com/blog/2016/08/10/sharding-for-a-multi-tenant-app-with-postgres/" rel="noopener noreferrer"&gt;Originally posted on the Citus Data blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>Working with time in Postgres</title>
      <dc:creator>Craig Kerstiens</dc:creator>
      <pubDate>Mon, 10 Jul 2017 17:06:58 +0000</pubDate>
      <link>https://dev.to/craigkerstiens/working-with-time-in-postgres</link>
      <guid>https://dev.to/craigkerstiens/working-with-time-in-postgres</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="http://www.craigkerstiens.com/2017/06/08/working-with-time-in-postgres/"&gt;Originally posted on my personal blog at craigkerstiens.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A massive amount of reporting queries, whether really intensive data analysis or just basic insights into your business involving looking at data over a certain time period. Postgres has really rich support for dealing with time out of the box, something that's often very underweighted when dealing with a database. Sure, if you have a time-series database it's implied, but even then how flexible and friendly is it from a query perspective? With Postgres there's a lot of key items available to you, let's dig in at the things that make your life easier when querying. &amp;lt;!--more--&amp;gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Date math
&lt;/h3&gt;

&lt;p&gt;The most common thing I find myself doing is looking at users that have done something within some specific time window. If I'm executing this all from my app I can easily inject specific dates, but Postgres makes this really easy for you. Within Postgres you have a type called an interval that is some window of time. And fortunately, Postgres takes care of the heavy lifting of how might something translate to or from hours/seconds/milliseconds/etc. Here are just a few examples of things you could do with intervals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;'1 day'::interval&lt;/li&gt;
&lt;li&gt;'5 days'::interval&lt;/li&gt;
&lt;li&gt;'1 week'::interval&lt;/li&gt;
&lt;li&gt;'30 days'::interval&lt;/li&gt;
&lt;li&gt;'1 month'::interval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;A note that if you're looking to remove something like a full month, you actually want to use 1 month instead of trying to calculate yourself.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With a given interval you can easily shift some window of time, such as finding all users that have signed up for your service within the past week:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'1 week'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Date functions
&lt;/h3&gt;

&lt;p&gt;Date math makes it pretty easy for you to go and find some specific set of data that applies, but what do you do when you want a broader report around time? There's a few options here. One is to leverage the built-in Postgres functions that help you work with dates and times. &lt;code&gt;date_trunc&lt;/code&gt; is one of the most used ones that will truncate a date down to some interval level. Here you can use the same general values as the above, but simply pass in the type of interval it will be. So if we wanted to find the count of users that signed up per week:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT date_trunc('week', created_at), 
       count(*)
FROM users
GROUP BY 1
ORDER BY 1 DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us a nice roll-up of how many users signed up each week. What's missing here though is if you have a week that has no users. In that case because no users signed up there is no count of 0, it just simply doesn't exist. If you did want something like this you could generate some range of time and then do a cross join with it against users to see which week they fell into. To do this first you'd generate a series of dates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2017-01-01'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'1 week'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;weeks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we're going to join this against the actual users table and check that the &lt;code&gt;created_at&lt;/code&gt; falls within the right range.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with weeks as (
  select week
  from generate_series('2017-01-01'::date, now()::date, '1 week'::interval) week
)

SELECT weeks.week,
       count(*)
FROM weeks,
     users
WHERE users.created_at &amp;gt; weeks.week
  AND users.created_at &amp;lt;= (weeks.week - '1 week'::interval)
GROUP BY 1
ORDER BY 1 DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Timestamp vs. Timestamptz
&lt;/h3&gt;

&lt;p&gt;What about storing the times themselves? Postgres has two types of timestamps. It has a generic timestamp and one with timezone embedded in it. In most cases you should generally opt for timestamptz. Why not timestamp? What happens if you move a server, or your server somehow swaps its configuration. Or perhaps more practically what about daylight savings time? In general you might think that you can simply just put in the time as you see it, but when different countries around the world observe things like daylight savings time differently it introduces complexities into your application. &lt;/p&gt;

&lt;p&gt;With timestamptz it'll be aware of the extra parts of your timezone as it comes in. Then when you query from one timezone that accounts for daylights savings you're all covered. There's a &lt;a href="http://phili.pe/posts/timestamps-and-time-zones-in-postgresql/"&gt;number of articles&lt;/a&gt; that cover a bit more in depth on the logic between timestamp and timestamp with timezone, so if you're curious I encourage you to check them out, but by default you mostly just need to use timestamptz.&lt;/p&gt;

&lt;h3&gt;
  
  
  More
&lt;/h3&gt;

&lt;p&gt;There's a number of other functions and capabilities when it comes to dealing with time in Postgres. You can &lt;code&gt;extract&lt;/code&gt; various parts of a timesetamp or interval such as hour of the day or the month. You can grab the day of the week with &lt;code&gt;dow&lt;/code&gt;. And one of my favorites which is when we celebrate happy hour at Citus, there's a literal for UTC 00:00:00 00:00:00 which is &lt;a href="https://www.postgresql.org/message-id/20050124200645.GA6126%40winnie.fuhr.org"&gt;&lt;code&gt;allballs()&lt;/code&gt;&lt;/a&gt;. If you need to work with dates and times in Postgres I encourage you to check out the &lt;a href="https://www.postgresql.org/docs/current/static/functions-datetime.html"&gt;docs&lt;/a&gt; before you try to re-write something of your own, chances are what you need may already be there.&lt;/p&gt;

</description>
      <category>postgres</category>
    </item>
    <item>
      <title>Getting started with JSONB in Postgres</title>
      <dc:creator>Craig Kerstiens</dc:creator>
      <pubDate>Sun, 02 Apr 2017 18:49:26 +0000</pubDate>
      <link>https://dev.to/craigkerstiens/getting-started-with-jsonb-in-postgres</link>
      <guid>https://dev.to/craigkerstiens/getting-started-with-jsonb-in-postgres</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="http://www.craigkerstiens.com/2017/03/12/getting-started-with-jsonb-in-postgres/"&gt;Originally posted on my blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;JSONB is an awesome datatype in Postgres. I find myself using it on a weekly basis these days. Often in using some API (such as &lt;a href="https://www.clearbit.com"&gt;clearbit&lt;/a&gt;) I'll get a JSON response back, instead of parsing that out into a table structure it's really easy to throw it into a JSONB then query for various parts of it. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you're not familiar with JSONB, it's a binary representation of JSON in your database. You can read a bit more about it vs. JSON &lt;a href="https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In working with JSONB here's a few quick tips to get up and running with it even faster: &lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing
&lt;/h3&gt;

&lt;p&gt;For the most part you don't have to think to much about this. With Postgres powerful indexing types you can add one index and have everything within the JSON document, all the keys and all the values, automatically indexed. The key here is to add a &lt;code&gt;GIN&lt;/code&gt; index. Once this is done queries should be much faster where you're searching for some value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_data&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;companies&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Querying
&lt;/h3&gt;

&lt;p&gt;Querying is a little bit more work, but once you get the basics it can be pretty straight forward. There's a few new operators you'll want to quickly ramp up on and from there querying becomes easy.&lt;/p&gt;

&lt;p&gt;For the most basic part you now have an operator so traverse down the various keys. First let's get some idea of what the JSON looks like so we can have something to work with. Here's a sample set of data that we get back from Clearbit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "domain": "citusdata.com",
  "company": {
    "id": "b1ff2bdf-0d8d-4d6d-8bcc-313f6d45996a",
    "url": "http:\/\/citusdata.com",
    "logo": "https:\/\/logo.clearbit.com\/citusdata.com",
    "name": "Citus Data",
    "site": {
      "h1": null,
      "url": "http:\/\/citusdata.com",
      "title": "Citus Data",
    },
    "tags": [
      "SAAS",
      "Enterprise",
      "B2B",
      "Information Technology &amp;amp; Services",
      "Technology",
      "Software"
    ],
    "domain": "citusdata.com",
    "twitter": {
      "id": "304455171",
      "bio": "Builders of Citus, the extremely scalable PostgreSQL database.",
      "site": "https:\/\/t.co\/hKpZjIy7Ej",
      "avatar": "https:\/\/pbs.twimg.com\/profile_images\/630900468995108865\/GJFCCXrv_normal.png",
      "handle": "citusdata",
      "location": "San Francisco, CA",
      "followers": 3770,
      "following": 570
    },
    "category": {
      "sector": "Information Technology",
      "industry": "Internet Software &amp;amp; Services",
      "subIndustry": "Internet Software &amp;amp; Services",
      "industryGroup": "Software &amp;amp; Services"
    },
    "emailProvider": false
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sorry it's a bit long, but it gives us a good example to work with. &lt;/p&gt;

&lt;h3&gt;
  
  
  Basic lookups
&lt;/h3&gt;

&lt;p&gt;Now let's query something fairly basic, the domain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'domain'&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;companies&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'citusdata.com'&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="c1"&gt;----------------------&lt;/span&gt;
 &lt;span class="nv"&gt;"citusdata.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-&amp;gt;&lt;/code&gt; is likely the first operator you'll use in JSONB. It's helpful to traverse the JSON. Though of you're looking to get the value as text you'll actually want to use &lt;code&gt;-&amp;gt;&amp;gt;&lt;/code&gt;. Instead of giving you some quoted response back or JSON object you're going to get it as text which will be a bit cleaner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'domain'&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;companies&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'citusdata.com'&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="c1"&gt;----------------------&lt;/span&gt;
 &lt;span class="n"&gt;citusdata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Filtering for values
&lt;/h3&gt;

&lt;p&gt;Now with something like clearbit you may want to filter out for only certain type of companies. We can see in the example data that there's a bunch of tags. If we wanted to find only companies that had the tag B2B we could use the &lt;code&gt;?&lt;/code&gt; operator once we've targetted down to that part of the JSON. The &lt;code&gt;?&lt;/code&gt; operator will tell us if some part of JSON has a top level key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;companies&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'company'&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'tags'&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'B2B'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JSONB but pretty
&lt;/h3&gt;

&lt;p&gt;In querying JSONB you'll typically get a nice compressed set of JSON back. While this is all fine if you're putting it into your application, if you're manually debugging and testing things you probably want something a bit more readable. Of course Postgres has your back here and you can wrap your JSONB with a pretty print function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;jsonb_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;companies&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Much more
&lt;/h3&gt;

&lt;p&gt;There's a lot more in the &lt;a href="https://www.postgresql.org/docs/9.5/static/functions-json.html"&gt;docs&lt;/a&gt; that you can find handy for the specialized cases when you need them. &lt;code&gt;jsonb_each&lt;/code&gt; will expand a JSONB document into individual rows. So if you wanted to count the number of occurences of every tag for a company, this would help. Want to parse out a JSONB to a row/record in Postgres there's &lt;code&gt;jsonb_to_record&lt;/code&gt;. The docs are your friends for about everything you want to do but hopefully these few steps help kick start things if you want to get started with &lt;code&gt;JSONB&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>postgres</category>
    </item>
  </channel>
</rss>
