<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Blitz</title>
    <description>The latest articles on DEV Community by Blitz (@pblitz).</description>
    <link>https://dev.to/pblitz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F487125%2Fe9531e59-a33c-4fde-860a-0846df9fbaac.jpeg</url>
      <title>DEV Community: Blitz</title>
      <link>https://dev.to/pblitz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pblitz"/>
    <language>en</language>
    <item>
      <title>Timestream Data Design &amp; Ingestion</title>
      <dc:creator>Blitz</dc:creator>
      <pubDate>Fri, 06 Nov 2020 21:39:20 +0000</pubDate>
      <link>https://dev.to/pblitz/timestream-data-design-ingestion-25c5</link>
      <guid>https://dev.to/pblitz/timestream-data-design-ingestion-25c5</guid>
      <description>&lt;p&gt;After the &lt;a href="https://dev.to/pblitz/aws-timestream-an-intro-4i1j"&gt;first introduction&lt;/a&gt; to AWS Timestream, let's get into it. &lt;/p&gt;

&lt;p&gt;Let's first spin some data. As an example, let's assume that you have an IoT device that will send you the following event every so often:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"event_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;123456&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"created_time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1604355422089&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"device_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test_device"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"usecase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"retail"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"device_battery"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;60.34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"weight"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;So how do you get this into &lt;code&gt;AWS Timestream&lt;/code&gt;? &lt;/p&gt;

&lt;p&gt;As with every database, you start with your schema.&lt;br&gt;
AWS Timestream splits its data into &lt;code&gt;measures&lt;/code&gt; and &lt;code&gt;dimensions&lt;/code&gt;. How should we break our sample event into these categories?&lt;br&gt;
As the &lt;a href="https://docs.aws.amazon.com/timestream/latest/developerguide/concepts.html"&gt;docs&lt;/a&gt; say, fields that "describe" the data should be the &lt;code&gt;dimensions&lt;/code&gt;. In the event above, I would see the &lt;code&gt;event_id&lt;/code&gt;, &lt;code&gt;device_id&lt;/code&gt;, and &lt;code&gt;usecase&lt;/code&gt; as the &lt;em&gt;dimensions&lt;/em&gt;. Conversely &lt;code&gt;device_battery&lt;/code&gt;, &lt;code&gt;temperature&lt;/code&gt; and, &lt;code&gt;weight&lt;/code&gt; would be the &lt;em&gt;measures&lt;/em&gt;. And &lt;code&gt;created_time&lt;/code&gt; as the &lt;em&gt;time&lt;/em&gt; dimension of course.&lt;/p&gt;

&lt;p&gt;The weird thing about AWS Timestream is that you will add this data as 3 rows, as it contains three measurements: &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;event_id&lt;/th&gt;
&lt;th&gt;device_id&lt;/th&gt;
&lt;th&gt;usecase&lt;/th&gt;
&lt;th&gt;measure_value::BIGINT&lt;/th&gt;
&lt;th&gt;measure_value::DOUBLE&lt;/th&gt;
&lt;th&gt;measure_name&lt;/th&gt;
&lt;th&gt;time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;123456&lt;/td&gt;
&lt;td&gt;test_device&lt;/td&gt;
&lt;td&gt;retail&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;device_battery&lt;/td&gt;
&lt;td&gt;1604355422089&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;123456&lt;/td&gt;
&lt;td&gt;test_device&lt;/td&gt;
&lt;td&gt;retail&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;60.34&lt;/td&gt;
&lt;td&gt;temperature&lt;/td&gt;
&lt;td&gt;1604355422089&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;123456&lt;/td&gt;
&lt;td&gt;test_device&lt;/td&gt;
&lt;td&gt;retail&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;weight&lt;/td&gt;
&lt;td&gt;1604355422089&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
    </item>
    <item>
      <title>AWS Timestream - an Intro</title>
      <dc:creator>Blitz</dc:creator>
      <pubDate>Sat, 10 Oct 2020 23:25:21 +0000</pubDate>
      <link>https://dev.to/pblitz/aws-timestream-an-intro-4i1j</link>
      <guid>https://dev.to/pblitz/aws-timestream-an-intro-4i1j</guid>
      <description>&lt;p&gt;Working on serverless IoT platforms with event sourcing (yeah, buzzwords...) you quickly have to solve the issue of data storage. You probably wanna back up your events somewhere, you probably want a bus, but you most definitively want to have a database for them as well, especially if you're going to do any kind of analytics. &lt;/p&gt;

&lt;p&gt;AWS (finally) released a new database to do just that this month: &lt;a href="https://aws.amazon.com/timestream/"&gt;Timestream&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;This article is an intro into Timestream - in the next articles, i'll write some more about details of the system.&lt;/p&gt;

&lt;p&gt;Timestream is a time-series database, similar to influx and graphite. And let's face it, elsticsearch (it's not really a DB, but... )&lt;br&gt;
As such, you can add events/rows easily, but you normally don't edit your data. &lt;/p&gt;

&lt;h2&gt;
  
  
  Core concepts
&lt;/h2&gt;

&lt;p&gt;In Timestream, data is stored in a table, that is part of a database. Standard so far.&lt;br&gt;
The data you're storing is three parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dimensions&lt;/strong&gt;: the &lt;em&gt;metadata&lt;/em&gt; of your event. Which device triggered it, this kind of stuff&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure&lt;/strong&gt;: the actual datapoint you measure. has a name (&lt;code&gt;measure_name&lt;/code&gt;, a value and a pre-defined type)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time&lt;/strong&gt;: The &lt;code&gt;time&lt;/code&gt; your event occurred at. The key sorting point. Every event has a timestamp. You can have more timestamps as &lt;strong&gt;Dimensions&lt;/strong&gt; of course, but &lt;code&gt;time&lt;/code&gt; is similar to a key for your record.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quirks
&lt;/h2&gt;

&lt;p&gt;In typical AWS fashion, Timestream has a couple of quirks&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Retention Mgmt
&lt;/h3&gt;

&lt;p&gt;For larger systems, this is pretty awesome: When creating a table, you establish a memory and a magnetic storage window. If the &lt;code&gt;time&lt;/code&gt; of an event is older than your memory storage window, it's automatically offloaded to magnetic storage (but still accessible, albeit slower). Once it's hits the additional magnetic storage window, the data is deleted. Timestream only accepts new events that are in the memory storage window. Common values would be 6-12 months for the memory storage and 2-3 years for the magnetic storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: at least right now, it seems that Timestream has a bug where you have to configure the memory storage to be slightly more than 12 months (as in, 6 hours more) to accept any event older than 30 days. &lt;/p&gt;

&lt;h3&gt;
  
  
  Measure Cardinality
&lt;/h3&gt;

&lt;p&gt;To be honest, this is still giving me headaches: Timestream only allows A SINGLE MEASURE per record. You can have a high number of "describing" dimensions, but only a single measure. If you have an IoT device that produces an event that has multiple measurements (for example, the fuel consumption of a motor at a certain RPM) this will result in multiple records. &lt;br&gt;
I have so far not have any negative consequences of that (turns out that you actually very seldom need both values at the same time) but it's a weird way to design your data if you come from both the document stores or classical SQL databases. &lt;/p&gt;

&lt;h2&gt;
  
  
  Why even bother then?
&lt;/h2&gt;

&lt;p&gt;I can't really compare it to the "platzhirsch" influx, I haven't used influx in great detail. &lt;br&gt;
BUT Timestream has a great benefit: it doesn't need a container, ec2 instance or anything similar. It's a fully managed AWS service, where you only pay for usage. &lt;br&gt;
If you're working on &lt;code&gt;serverless&lt;/code&gt; environments, it's worth a look. And that's what I'm doing at the moment :) &lt;/p&gt;

&lt;p&gt;In the next article, I'll look into setting it up and getting data into it with &lt;code&gt;python&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;span&gt;(cover image from &lt;a href="https://www.flickr.com/photos/58314390@N08/15937475583"&gt;https://www.flickr.com/photos/58314390@N08/15937475583&lt;/a&gt;)&lt;/span&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>database</category>
    </item>
  </channel>
</rss>
