<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: javi santana</title>
    <description>The latest articles on DEV Community by javi santana (@javisantana).</description>
    <link>https://dev.to/javisantana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F894616%2Fee54abb7-30d3-423b-9f9e-5bd42b554bb3.jpg</url>
      <title>DEV Community: javi santana</title>
      <link>https://dev.to/javisantana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/javisantana"/>
    <language>en</language>
    <item>
      <title>[11/40 things about data] Learn SQL please</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Fri, 02 Sep 2022 12:18:21 +0000</pubDate>
      <link>https://dev.to/javisantana/1140-things-about-data-learn-sql-please-48lc</link>
      <guid>https://dev.to/javisantana/1140-things-about-data-learn-sql-please-48lc</guid>
      <description>&lt;p&gt;You may not like SQL based databases but the probability of dealing with a SQL based system in your career is so high that learning it as soon as possible will compound.&lt;/p&gt;

&lt;p&gt;I didn’t like SQL, I still don’t like it even though I work with it every single day but I have to recognize it’s a really handy tool.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>data</category>
    </item>
    <item>
      <title>[10/40 things about data] Try to use the simplest possible data structure</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Mon, 08 Aug 2022 16:57:25 +0000</pubDate>
      <link>https://dev.to/javisantana/1040-things-about-data-try-to-use-the-simplest-possible-data-structure-k6j</link>
      <guid>https://dev.to/javisantana/1040-things-about-data-try-to-use-the-simplest-possible-data-structure-k6j</guid>
      <description>&lt;p&gt;A few years ago, one of the websites I was working on went on the front page of Google (yes, that small blue link). The traffic it gets is pretty high.&lt;/p&gt;

&lt;p&gt;I had to develop search functionality. The first thing you’d think is to use the database you are currently using or maybe use a special one, like elastic.&lt;/p&gt;

&lt;p&gt;But in this case, I needed to use the database as little as possible to be able to cope with the load.&lt;/p&gt;

&lt;p&gt;So I decided to go the simplest way: create an index with an in-memory array where all the words would be stored. I ran a linear search, yes, a simple for loop with the search logic.&lt;/p&gt;

&lt;p&gt;Was it the best index structure? No, if you just think about performance, but it worked, it was simple, easy to maintain and change.&lt;/p&gt;

&lt;p&gt;There is always time to make it more advanced. With time you end up loving simple and flat arrays.&lt;/p&gt;

</description>
      <category>database</category>
      <category>data</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>[9/40 things about data]It’s better to master just one database than be bad at two</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Wed, 03 Aug 2022 08:25:07 +0000</pubDate>
      <link>https://dev.to/javisantana/940-things-about-dataits-better-to-master-just-one-database-than-be-bad-at-two-538l</link>
      <guid>https://dev.to/javisantana/940-things-about-dataits-better-to-master-just-one-database-than-be-bad-at-two-538l</guid>
      <description>&lt;p&gt;It’s tempting to start using another database when you run into a performance problem or the lack of a feature.&lt;/p&gt;

&lt;p&gt;There are always ways to make it perform better or solve the problem with a workaround.&lt;/p&gt;

&lt;p&gt;You’d be surprised how good your database can perform when you understand the internals. It’s not that bad to do that thing in two steps instead of one.&lt;/p&gt;

&lt;p&gt;If you go after the shiny new thing just because you find a small roadblock, you’ll never understand the actual limits of your database and you may never know when there is a real reason to change.&lt;/p&gt;

</description>
      <category>database</category>
      <category>data</category>
    </item>
    <item>
      <title>[8/40 things about data] Analytics it’s a product, not a department</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Mon, 01 Aug 2022 18:12:54 +0000</pubDate>
      <link>https://dev.to/javisantana/840-things-about-data-analytics-its-a-product-not-a-department-32a4</link>
      <guid>https://dev.to/javisantana/840-things-about-data-analytics-its-a-product-not-a-department-32a4</guid>
      <description>&lt;p&gt;When you have people asking for metrics and people extracting them from data. For the same metric, you’ll have as many definitions for a metric as people you have in the company.&lt;/p&gt;

&lt;p&gt;Reporting is something that requires the same thing a digital product needs: owners, maintenance, clear definitions, improvements and you know, gives people what they want in a way that is useful for everybody in the company.&lt;/p&gt;

&lt;p&gt;Many companies don’t consider analytics as a first-class citizen and end up spending more to have less quality.&lt;/p&gt;

</description>
      <category>data</category>
      <category>analytics</category>
    </item>
    <item>
      <title>[7/40 things about data] When I try to understand data I always end up using a histogram</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Fri, 29 Jul 2022 14:33:00 +0000</pubDate>
      <link>https://dev.to/javisantana/640-things-about-data-when-i-try-to-understand-data-i-always-end-up-using-a-histogram-4nd9</link>
      <guid>https://dev.to/javisantana/640-things-about-data-when-i-try-to-understand-data-i-always-end-up-using-a-histogram-4nd9</guid>
      <description>&lt;p&gt;When visualizing data you have to pick the right visualization type but before that, you need to understand the data.&lt;/p&gt;

&lt;p&gt;I start using an avg, then avg plus stddev, then min-max and finally I go with a histogram.&lt;/p&gt;

&lt;p&gt;It captures min, max, avg and most importantly, the data distribution.&lt;/p&gt;

</description>
      <category>data</category>
      <category>datavisualization</category>
    </item>
    <item>
      <title>[6/40 things about data] Behind every null value there is a story</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Wed, 27 Jul 2022 06:49:16 +0000</pubDate>
      <link>https://dev.to/javisantana/640-things-about-data-behind-every-null-value-there-is-a-story-30d8</link>
      <guid>https://dev.to/javisantana/640-things-about-data-behind-every-null-value-there-is-a-story-30d8</guid>
      <description>&lt;p&gt;When you join a company, just ask about it, you’ll learn a lot&lt;/p&gt;

</description>
      <category>data</category>
      <category>database</category>
      <category>sql</category>
    </item>
    <item>
      <title>[5/40 things about data] When in doubt, use Postgres as your database.</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Tue, 26 Jul 2022 15:27:00 +0000</pubDate>
      <link>https://dev.to/javisantana/540-things-about-data-when-in-doubt-use-postgres-as-your-database-4j85</link>
      <guid>https://dev.to/javisantana/540-things-about-data-when-in-doubt-use-postgres-as-your-database-4j85</guid>
      <description>&lt;p&gt;It’s quite typical when you start a project to decide what DBMS to use. Elastic, Mongo, some key/value store like Redis, funny things like Neo4J. If you have a use case that clearly fits with a database, fine, otherwise, use Postgres or anything relational. Of course, there will be someone that says “but it does not scale”. Anyone who has worked with a system at scale knows there is no storage system that scales well (except if it’s as simple as hell and is eventually consistent, but not even then).&lt;/p&gt;

&lt;p&gt;I love Postgres because of many things: solid, battle-tested, supports transactions (I will write about them), feature-complete, fast, it’s not owned by a VC backed company, guided by the community, calm and steady progress, great tooling, cloud services providing infra, companies with expertise…&lt;/p&gt;

&lt;p&gt;When you pick something funny, you end up developing half of the features a solid RDBMS system provides but just worse.&lt;/p&gt;

&lt;p&gt;I decided to use Redis as the storage for Tinybird and it’s working great but as the project evolves you miss many of the built-in features Postgres provides. Probably a mistake.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
    </item>
    <item>
      <title>[4/40 things about data] The second most important rule of working with data: the fastest data is the one you don’t read</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Mon, 25 Jul 2022 08:44:00 +0000</pubDate>
      <link>https://dev.to/javisantana/440-things-about-data-the-second-most-important-rule-of-working-with-data-the-fastest-data-is-the-one-you-dont-read-2l07</link>
      <guid>https://dev.to/javisantana/440-things-about-data-the-second-most-important-rule-of-working-with-data-the-fastest-data-is-the-one-you-dont-read-2l07</guid>
      <description>&lt;p&gt;As simple as it sounds, most people forget about using one of the most important database features: indexes. Well, you also need to think about what’s the data you actually need, a lot of apps are full of &lt;code&gt;select * from table&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The problem is, as your system grows, so does the amount and complexity of queries. Knowing what data you need becomes harder. To avoid that you need… yes, data about how you query your data.&lt;/p&gt;

</description>
      <category>data</category>
      <category>database</category>
    </item>
    <item>
      <title>[3/40 things about data] Good data models make good products</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Fri, 22 Jul 2022 08:10:13 +0000</pubDate>
      <link>https://dev.to/javisantana/340-things-about-data-good-data-models-make-good-products-3pl</link>
      <guid>https://dev.to/javisantana/340-things-about-data-good-data-models-make-good-products-3pl</guid>
      <description>&lt;p&gt;When the data model is not well designed, everything that goes after feels wrong. You feel like you are doing hacks and tweaks all the time.&lt;/p&gt;

&lt;p&gt;When the data model is the right one everything flows, it’s easy to explain, when you make a change it just fits like a good Tetris play. Only time will tell if the data model was the right one. If after some years you still use the same data model (maybe not the same database or same code) you did it right. It’s not that different to cars, buildings, companies…&lt;/p&gt;

&lt;p&gt;Designing a good data model takes time, prototypes and a good understanding of the reality you are modelling (see point 1 for more info).&lt;/p&gt;

</description>
      <category>data</category>
    </item>
    <item>
      <title>[2/40 things about data] There is no “the best data format”</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Thu, 21 Jul 2022 07:42:56 +0000</pubDate>
      <link>https://dev.to/javisantana/140-things-about-data-there-is-no-the-best-data-format-4nb6</link>
      <guid>https://dev.to/javisantana/140-things-about-data-there-is-no-the-best-data-format-4nb6</guid>
      <description>&lt;p&gt;We format the data to move it around. It could be hundreds of kilometers or a few nanometers but we always need to encode information somehow. I’ve never found the “El Dorado” of data formats.&lt;/p&gt;

&lt;p&gt;Text formats are easy to read by a human but harder and slower to parse.&lt;/p&gt;

&lt;p&gt;Binary formats are fast to parse but hard to debug.&lt;/p&gt;

&lt;p&gt;XML is a good container but it’s too verbose.&lt;/p&gt;

&lt;p&gt;JSON is easy but does not have basic data types.&lt;/p&gt;

&lt;p&gt;Serializable formats are not good to keep in memory but specific formats for in-memory operations are not binary compatible with other languages.&lt;/p&gt;

&lt;p&gt;The most important thing I learned is: you need to find the right balance between speed, flexibility, compatibility and the human-computer interface.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>[1/40 things about data] It’s hard to capture reality with data</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Wed, 20 Jul 2022 10:08:55 +0000</pubDate>
      <link>https://dev.to/javisantana/140-things-about-data-its-hard-to-capture-reality-with-data-44k2</link>
      <guid>https://dev.to/javisantana/140-things-about-data-its-hard-to-capture-reality-with-data-44k2</guid>
      <description>&lt;p&gt;Trying to recreate an accurate version of reality, no matter what that is or how simple it looks, is hard.&lt;/p&gt;

&lt;p&gt;Another way of seeing it: modelling reality always gets complex. There are always small nuances, special conditions, things that changed, edge cases and, of course, errors (which sometimes became features).&lt;/p&gt;

&lt;p&gt;The only models I found easy to work with and understand are the ones that reflect computer things.&lt;/p&gt;

</description>
      <category>data</category>
      <category>beginners</category>
    </item>
    <item>
      <title>4 keys of analyzing data fast</title>
      <dc:creator>javi santana</dc:creator>
      <pubDate>Tue, 19 Jul 2022 16:44:52 +0000</pubDate>
      <link>https://dev.to/javisantana/4-keys-of-analyzing-data-fast-1ond</link>
      <guid>https://dev.to/javisantana/4-keys-of-analyzing-data-fast-1ond</guid>
      <description>&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Don't store the data you don't need. Sounds silly but a lot of the data you have to read is not useful.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Don't read the data you don't need. Discard the data using indices or any other tool your database/framework provides&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run heavy operations later. For example, filtering data is faster than aggregating it so when processing data always filter first and do other heavy things later (joins, aggregations and so on)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sort your data before storing it. Sorting data makes compression much better and you use all the power of current hardware (sequential reads are 100x faster than random access)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Following these 3 rules I process large datasets 100-1000x faster than I usually did.&lt;/p&gt;

&lt;p&gt;(image from craiyon.com generated with "f1 going fast")&lt;/p&gt;

</description>
      <category>database</category>
      <category>data</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
