<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kashyap</title>
    <description>The latest articles on DEV Community by Kashyap (@kgrz).</description>
    <link>https://dev.to/kgrz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F198648%2Ff1530c76-8786-43ca-9573-b12f9305da8c.jpeg</url>
      <title>DEV Community: Kashyap</title>
      <link>https://dev.to/kgrz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kgrz"/>
    <language>en</language>
    <item>
      <title>PostgreSQL backup notes</title>
      <dc:creator>Kashyap</dc:creator>
      <pubDate>Mon, 16 Mar 2020 06:26:51 +0000</pubDate>
      <link>https://dev.to/kgrz/postgresql-backup-notes-3c1g</link>
      <guid>https://dev.to/kgrz/postgresql-backup-notes-3c1g</guid>
      <description>&lt;p&gt;(Originally posted at: &lt;a href="https://kgrz.io/postgresql-backup-notes.html"&gt;https://kgrz.io/postgresql-backup-notes.html&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Some of my very rough notes when going nearly cover-to-cover of the&lt;br&gt;
PostgreSQL &lt;a href="https://www.postgresql.org/docs/current/backup.html"&gt;backup and&lt;br&gt;
restore&lt;/a&gt;&lt;br&gt;
documentation section. It's one of the most detailed pieces of&lt;br&gt;
documentation I've ever read, so this acts as a very high-level&lt;br&gt;
summary. Most of the content is useful for general context surrounding&lt;br&gt;
data backups of any kind.&lt;/p&gt;

&lt;p&gt;I'm using &lt;code&gt;/data&lt;/code&gt; directory to signify the data storage directory. This&lt;br&gt;
depends on the chosen configuration, however.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backups
&lt;/h2&gt;

&lt;p&gt;Three main types of backup strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL dump/load (stop-the-world)&lt;/li&gt;
&lt;li&gt;Backup &lt;code&gt;/data&lt;/code&gt; directory (stop-the-world)&lt;/li&gt;
&lt;li&gt;Continuous Archiving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first two typically need lots of extra space on the database server&lt;br&gt;
to store the backup before you can upload it to some off-site storage.&lt;/p&gt;

&lt;p&gt;"stop-the-world" in this context is not an official nomenclature. In&lt;br&gt;
these strategies, &lt;em&gt;most likely&lt;/em&gt;, the database needs to be shut down at&lt;br&gt;
some point.&lt;/p&gt;

&lt;p&gt;Continous archive-based backups are used for a leader-follower setup, or&lt;br&gt;
even delta backups—where there's a base backup, and subsequent data as&lt;br&gt;
deltas that can be used to restore the entire database.&lt;br&gt;
Application-supported remote backups are quite simple, and so this is&lt;br&gt;
the best strategy if the database servers are space-constrained.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL Dump/Load
&lt;/h3&gt;

&lt;p&gt;This is the easiest strategy. &lt;code&gt;pg_dump&lt;/code&gt; takes a backup, while&lt;br&gt;
&lt;code&gt;pg_restore&lt;/code&gt; command consumes the output of that backup. This strategy&lt;br&gt;
is the simplest to cron-ify a backup, without external dependencies:&lt;br&gt;
take a backup, upload the files to remote storage, test the backup on a&lt;br&gt;
different machine, and do this every night.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pg_dump&lt;/code&gt; saves the database into a &lt;code&gt;.sql&lt;/code&gt; statement. Requires large&lt;br&gt;
enough space to hold both the database and the backup script.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;File sizes might be limited by kernel/OS, so that's something to look&lt;br&gt;
ahead while deciding to use this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Restore from the &lt;code&gt;pg_dump&lt;/code&gt; output might also need extra configuration&lt;br&gt;
tweaking around connection times: too less, and the database might&lt;br&gt;
close the connection before the entire script has run.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Copy &lt;code&gt;/data&lt;/code&gt; directory
&lt;/h3&gt;

&lt;p&gt;PostgreSQL's directory layout is straight-forward—once you get to know&lt;br&gt;
it. Most of the data is put in one directory, and this includes the two&lt;br&gt;
main components needed for any future restores: the data files, and the&lt;br&gt;
temporary append-only log files. If the database is shut down, you're&lt;br&gt;
free to copy over the &lt;code&gt;data&lt;/code&gt; directory to another machine, and start off&lt;br&gt;
from it. Configuration files typically aren't placed in the &lt;code&gt;data&lt;/code&gt;&lt;br&gt;
directory, so they might need to be copied as well.&lt;/p&gt;

&lt;p&gt;Any strategies that have to rely on the file-system layout of&lt;br&gt;
PostgreSQL, or features provided by the file system itself.&lt;/p&gt;

&lt;p&gt;Two routes here: frozen snapshots of the file system, or using tools&lt;br&gt;
like rsync, tar etc.&lt;/p&gt;

&lt;h4&gt;
  
  
  frozen snapshots
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If the underlying file system supports atomic volume snapshots (btrfs,&lt;br&gt;
zfs, Apple's APFS for example), one can snapshot the entire &lt;code&gt;data&lt;/code&gt;&lt;br&gt;
directory. Lots of caveats around how good the snapshot mechanism is&lt;br&gt;
implemented exist.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The backup can be taken without stopping the server. During restore,&lt;br&gt;
this strategy would require replaying the logs as there might be some&lt;br&gt;
commits that weren't turned to data files from the append only log.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;rsync&lt;/code&gt;, &lt;code&gt;tar&lt;/code&gt;, et al.
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;rsync&lt;/code&gt;, &lt;code&gt;gzip&lt;/code&gt;, &lt;code&gt;tar&lt;/code&gt; the &lt;code&gt;data&lt;/code&gt; directory. These utilities don't&lt;br&gt;
take consistent snapshots of the disk, so it's best to shutdown the&lt;br&gt;
server. Shutting down the server forces a full flush of the data to&lt;br&gt;
disk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An example two step process with &lt;code&gt;rsync&lt;/code&gt;:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
run rsync
shutdown the server
rsync --checksum
&lt;/pre&gt;

&lt;p&gt;What's interesting is that this two-step process is similar to the one&lt;br&gt;
used in online backups section. This is like a one-step delta backup&lt;br&gt;
process if we stretch it enough: the first backup is a base backup that&lt;br&gt;
contains the data committed till that point, then the second &lt;code&gt;rsync&lt;/code&gt;&lt;br&gt;
takes the delta and copies that over.&lt;/p&gt;

&lt;h3&gt;
  
  
  Continuous Archiving
&lt;/h3&gt;

&lt;p&gt;This system can be used to setup a replicated system, consisting of a&lt;br&gt;
leader and potentially multiple followers. The data from the leader is&lt;br&gt;
pushed, and each of the followers might pull the data. Where this data&lt;br&gt;
is stored is customisable. There are many ways to setup replication in&lt;br&gt;
PostgreSQL, and the documentation for it is exhaustive. The archival&lt;br&gt;
part deals with the first part: taking the backup and pushing it&lt;br&gt;
somewhere.&lt;/p&gt;

&lt;p&gt;This strategy piggy-backs on the fact that a WAL log may be used to&lt;br&gt;
replay and restore a database. There are many caveats and configuration&lt;br&gt;
tweaks to how long the WAL log files are retained, the size of those log&lt;br&gt;
files and the naming of the files. It's best to ship the log files as&lt;br&gt;
and when they are created to an external storage service. Rather than do&lt;br&gt;
this manually via &lt;code&gt;rsync&lt;/code&gt; et. al., PostgreSQL provides a way:&lt;br&gt;
&lt;code&gt;archive_command&lt;/code&gt; setting in the configuration, which takes a script.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;WAL logs should be secured while transmission and remote storage,&lt;br&gt;
because these contain the actual data. (that goes for the main&lt;br&gt;
database too, fwiw)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;archive_command&lt;/code&gt; should exit with &lt;code&gt;0&lt;/code&gt; code. Otherwise, the command&lt;br&gt;
gets retried. The &lt;code&gt;pg_wal&lt;/code&gt; directory &lt;em&gt;may&lt;/em&gt; potentially get filled, and&lt;br&gt;
cause the server to crash!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;archive_command&lt;/code&gt; should be designed to ensure it doesn't override&lt;br&gt;
existing files on the remote system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Missing WAL logs from the archive might hamper future restore, so&lt;br&gt;
regular base backups will help keep the error surface area a little&lt;br&gt;
small.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;old base backup + too many WAL logs to restore increase the restore&lt;br&gt;
time. It's important to determine the maths behind this to figure out&lt;br&gt;
how much downtime you might need and tweak the base backup frequency,&lt;br&gt;
and WAL file size accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;General mechanism&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One base backup as a starting point&lt;/li&gt;
&lt;li&gt;Continuous deltas in the form of the append-only log (WAL) files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The base backup marks the point where the backup would start&lt;br&gt;
(&lt;a href="https://www.postgresql.org/docs/current/sql-checkpoint.html"&gt;checkpoint&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;base backup&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Two ways to take a base backup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Use the &lt;code&gt;pg_basebackup&lt;/code&gt; command from an external machine (or the same&lt;br&gt;
machine, with a different data directory setting), providing the&lt;br&gt;
connection info to connect to the leader.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple commands can be run from multiple machines, but might
depend on replication slots configuration on the leader.&lt;/li&gt;
&lt;li&gt;Might use one/two connections depending on the variant of backup
used: copy WAL logs at the end (1) or stream WAL logs parallelly (2).&lt;/li&gt;
&lt;li&gt;Does not run if &lt;code&gt;/data&lt;/code&gt; directory is not empty.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Two-step process via &lt;code&gt;rsync&lt;/code&gt;. PostgreSQL provides two SQL statements&lt;br&gt;&lt;br&gt;
for signalling the server that the user is taking a backup, and that&lt;br&gt;&lt;br&gt;
a checkpoint has to be created: &lt;code&gt;pg_start_backup&lt;/code&gt;, &lt;code&gt;pg_stop_backup&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
SELECT pg_start_backup('some_label')
rsync /data
SELECT * from pg_stop_backup();
&lt;/pre&gt;

&lt;h2&gt;
  
  
  Restore
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;stop-the-world restores&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Data dumps taken with &lt;code&gt;pg_dump&lt;/code&gt; or the file system strategy mentioned&lt;br&gt;
above can be restored by &lt;code&gt;pg_restore&lt;/code&gt; or just starting the server.&lt;br&gt;
Needless to say, this strategy causes either data loss or needs&lt;br&gt;
downtime, depending on the operations chosen.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If a system has a simple &lt;code&gt;pg_dump&lt;/code&gt; cron job that ships the archive to&lt;br&gt;
remote storage, when the leader crashes or dies, the time to&lt;br&gt;
detection, copying the archive to the follower, &lt;code&gt;pg_restore&lt;/code&gt;&lt;br&gt;
completion times is the amount of downtime that's required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The cron job, if configured at a certain time in the day, differs from&lt;br&gt;
the time the crash happens, the delta in the data until that time on&lt;br&gt;
the leader is a potential loss in data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the leader crashes/dies, but you still have access to the&lt;br&gt;
physical data disks, recovery using file system snapshot is possible,&lt;br&gt;
and that may potentially recover all the data up till the point of the&lt;br&gt;
last commit. Because this recovery would also have the WAL files&lt;br&gt;
handy, the replay will make sure as much data as possible is&lt;br&gt;
recovered.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Continuous Archive restores&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the system is setup with continuous archiving, it may be possible to&lt;br&gt;
recover all the data. Restore times depend on how fast the base backup&lt;br&gt;
archive, WAL logs can be copied over to the new server, and the WAL log&lt;br&gt;
replay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replication
&lt;/h2&gt;

&lt;p&gt;There are many ways to do this, too, depending on the underlying infra:&lt;br&gt;
shared disk (two machines accessing the same disk), file system&lt;br&gt;
replication (a write to one drive is mirrored to a different machine&lt;br&gt;
atomically), side-car middlewares that execute a given statement on&lt;br&gt;
multiple machines simultaneously, or even application-level middlewares&lt;br&gt;
that do this. Streaming/Point-in-time replication is one preferred&lt;br&gt;
approach that can piggy back on the continuous archive backup strategy.&lt;/p&gt;

&lt;p&gt;Streaming/Point-in-time replication strategy uses wal logs shipped to a&lt;br&gt;
remote server using &lt;code&gt;archive_command&lt;/code&gt; from the leader to be used to&lt;br&gt;
replay the logs on a follower continously.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Note that in streaming replication is possible without using
&lt;code&gt;archive_command&lt;/code&gt;, provided the data ingestion throughput never
exceeds the rate of the follower streaming the logs directly from the
leader, and applying them locally (also depends on the network
latency).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the follower is not able to keep up with the logs, the logs on&lt;br&gt;
  leader might get recycled, and the follower will keep waiting for the&lt;br&gt;
  now-non-existent WAL file. Force-starting the follower in case of&lt;br&gt;
  failure will result in data loss.&lt;/p&gt;

</description>
      <category>postgres</category>
    </item>
    <item>
      <title>Promise.race for timeouts, fetch and avoiding memory leaks</title>
      <dc:creator>Kashyap</dc:creator>
      <pubDate>Fri, 20 Sep 2019 07:15:39 +0000</pubDate>
      <link>https://dev.to/kgrz/promise-race-for-timeouts-fetch-and-avoiding-memory-leaks-ckc</link>
      <guid>https://dev.to/kgrz/promise-race-for-timeouts-fetch-and-avoiding-memory-leaks-ckc</guid>
      <description>&lt;p&gt;Using &lt;code&gt;Promise.race&lt;/code&gt; is a popular option for getting a cancellation/timeout behaviour for &lt;code&gt;fetch&lt;/code&gt; calls, when the new &lt;code&gt;AbortController&lt;/code&gt; is not available for use. However, there are ways to shoot yourself in the foot with this technique. I wrote a post which goes through:&lt;/p&gt;

&lt;p&gt;1 How to use &lt;code&gt;Promise.race&lt;/code&gt; to achieve timeouts for &lt;code&gt;fetch&lt;/code&gt; calls&lt;br&gt;
2 👾 Buggy implementation&lt;br&gt;
3 ✅A safer implementation using newer features in JavaScript that won't have the problems as that in 2.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kgrz.io/avoiding-memory-leaks-timing-out-fetch.html"&gt;https://kgrz.io/avoiding-memory-leaks-timing-out-fetch.html&lt;/a&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
    </item>
    <item>
      <title>Listing out the history of a particular file in Git</title>
      <dc:creator>Kashyap</dc:creator>
      <pubDate>Sun, 21 Jul 2019 05:35:33 +0000</pubDate>
      <link>https://dev.to/kgrz/listing-out-the-history-of-a-particular-file-in-git-4ka9</link>
      <guid>https://dev.to/kgrz/listing-out-the-history-of-a-particular-file-in-git-4ka9</guid>
      <description>&lt;p&gt;There are multiple ways to follow a file's history in&lt;br&gt;
Git. You can use a double-dash and pass file names that you want to list&lt;br&gt;
out the log for. But there's a limitation to what it can do. A better&lt;br&gt;
option would be to use the --follow flag for git log command. I wrote about this in a little more detail here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kgrz.io/use-git-log-follow-for-file-history.html"&gt;https://kgrz.io/use-git-log-follow-for-file-history.html&lt;/a&gt;&lt;/p&gt;

</description>
      <category>git</category>
    </item>
  </channel>
</rss>
