<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ernesto Campohermoso</title>
    <description>The latest articles on DEV Community by Ernesto Campohermoso (@ernestomar).</description>
    <link>https://dev.to/ernestomar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2012359%2Feb594fd7-3ead-4a6d-93e6-dd16aa17b9d5.jpeg</url>
      <title>DEV Community: Ernesto Campohermoso</title>
      <link>https://dev.to/ernestomar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ernestomar"/>
    <language>en</language>
    <item>
      <title>Do you think schema flexibility justifies using NoSQL? Think twice.</title>
      <dc:creator>Ernesto Campohermoso</dc:creator>
      <pubDate>Fri, 27 Dec 2024 14:05:55 +0000</pubDate>
      <link>https://dev.to/ernestomar/do-you-think-schema-flexibility-justifies-using-nosql-think-twice-131p</link>
      <guid>https://dev.to/ernestomar/do-you-think-schema-flexibility-justifies-using-nosql-think-twice-131p</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the world of software development, there is a common belief that implementing a NoSQL database is justified solely by its schema flexibility. However, this perspective can be misleading if we overlook fundamental aspects such as the CAP theorem and the differences between &lt;em&gt;Schema on Write&lt;/em&gt; and &lt;em&gt;Schema on Read&lt;/em&gt;. As Martin Kleppmann explains in his book &lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt;, the choice of a database should be made with a deep understanding of the requirements for consistency, availability, and partition tolerance, while keeping in mind the schema with which the data will be managed.&lt;/p&gt;

&lt;p&gt;In this article, you will learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the CAP theorem entails and how it influences your application's architecture.&lt;/li&gt;
&lt;li&gt;Why eventual consistency is not always desirable, especially when handling critical information like financial transactions.&lt;/li&gt;
&lt;li&gt;The role partition tolerance plays and the trade-offs in consistency within large-scale distributed systems.&lt;/li&gt;
&lt;li&gt;The real difference between document-oriented and relational databases, and how &lt;em&gt;Schema on Read&lt;/em&gt; and &lt;em&gt;Schema on Write&lt;/em&gt; are not as distinct as they seem, even at the code level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. CAP Theorem
&lt;/h2&gt;

&lt;p&gt;The CAP theorem states that in any distributed system, it is impossible to simultaneously guarantee Consistency, Availability, and Partition Tolerance. You can only prioritize two of these properties at a time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt;: All nodes see the same data at the same time.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability&lt;/strong&gt;: The system always responds, even if some nodes fail.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition Tolerance&lt;/strong&gt;: The system continues to function despite communication failures between nodes.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When designing a large-scale application, you must decide which of these elements are critical and which can be sacrificed. This balance guides the choice between relational databases (emphasizing strong consistency) and many NoSQL databases (leaning toward availability and partition tolerance, but with eventual consistency).&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Eventual Consistency and When It’s Not Enough
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Eventual consistency&lt;/strong&gt; means that in a distributed system, all nodes will eventually reach a consistent state over time. It works well for social networks or applications where slight delays in data updates do not compromise business integrity.&lt;/p&gt;

&lt;p&gt;However, when handling &lt;strong&gt;money&lt;/strong&gt; or banking operations, eventual consistency becomes a risk. In such cases, &lt;strong&gt;strong consistency&lt;/strong&gt; is required, ensuring that every transaction is immediately reflected across the system without the possibility of temporary discrepancies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Case Study
&lt;/h3&gt;

&lt;p&gt;Imagine a financial institution using MongoDB configured for eventual consistency. If a user withdraws all the funds in their account, leaving it at zero, but that update doesn’t propagate to all nodes immediately, and a credit card payment is processed simultaneously, the customer’s balance could go negative.&lt;/p&gt;

&lt;p&gt;For this reason, systems managing money typically require strong consistency as an essential prerequisite.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Large Data Volumes and Trade-offs for Partition Tolerance
&lt;/h2&gt;

&lt;p&gt;When dealing with &lt;strong&gt;large volumes of data&lt;/strong&gt;, many distributed architectures choose to sacrifice a certain degree of consistency to achieve high partition tolerance. If a node fails to respond or there are network issues, the application can still function with the remaining nodes.&lt;/p&gt;

&lt;p&gt;This trade-off is essential for services with millions of simultaneous users or global systems spanning multiple geographic regions. But it’s not always the right choice: if your application demands absolute precision and cannot tolerate outdated data, adopting a model that sacrifices consistency can be detrimental.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Document-Oriented vs Relational: &lt;em&gt;Schema on Read&lt;/em&gt; vs &lt;em&gt;Schema on Write&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;In a &lt;strong&gt;document-oriented&lt;/strong&gt; model (typically NoSQL), the term &lt;em&gt;Schema on Read&lt;/em&gt; is used: the structure of documents is not rigidly defined when data is written, and the schema is validated during reading or processing. On the other hand, in a &lt;strong&gt;relational&lt;/strong&gt; model, &lt;em&gt;Schema on Write&lt;/em&gt; is used: a rigid schema is defined before any record is entered into the database.&lt;/p&gt;

&lt;p&gt;While document-oriented systems offer more flexibility, the reality is that &lt;strong&gt;there is always a schema&lt;/strong&gt;, in one form or another. The code processing the data must know what fields exist and how to interpret them. For example, if your application expects a "price" field to calculate a total, it cannot “guess” where that field is if it wasn’t predefined. Thus, the supposed freedom of schema doesn’t eliminate the need for a coherent design and careful evolution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Schema Modification Example
&lt;/h3&gt;

&lt;h4&gt;
  
  
  In a Relational Database
&lt;/h4&gt;

&lt;p&gt;If we need to add a new column to store the last update date, we must &lt;strong&gt;alter the table&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;last_update_date&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any application querying this table must account for this new column. If there’s a process using the &lt;code&gt;last_update_date&lt;/code&gt; information, the code will need to be updated to read it.&lt;/p&gt;

&lt;h4&gt;
  
  
  In a Document-Oriented Database
&lt;/h4&gt;

&lt;p&gt;Let’s imagine a &lt;strong&gt;products&lt;/strong&gt; collection in MongoDB, where the last update date wasn’t previously stored. From a specific date, we decide to add this field. The new “schema” is handled in the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Starting from 2024-01-01, we add new logic:&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processProduct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;referenceDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2024-01-01&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;referenceDate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;last_update_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Add the field to the document&lt;/span&gt;
      &lt;span class="nx"&gt;product&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;last_update_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Process the rest of the product&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this scenario, altering a formal structure isn’t required. But the code must handle the new field and, if it’s missing, generate a default behavior (e.g., creating it). In other words, &lt;em&gt;Schema on Read&lt;/em&gt; is managed by the application, not by the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Choosing the right database model&lt;/strong&gt; requires considering the CAP theorem and your application’s specific needs. If you need scalability and partition tolerance to handle large data volumes with eventual consistency, a &lt;strong&gt;document-oriented&lt;/strong&gt; or &lt;strong&gt;NoSQL&lt;/strong&gt; database may be the best option. On the other hand, if data accuracy and integrity are critical, &lt;strong&gt;relational&lt;/strong&gt; databases with strong consistency often excel.&lt;/p&gt;

&lt;p&gt;Don’t forget that other models, such as &lt;strong&gt;graph databases&lt;/strong&gt;, are ideal for handling complex (many-to-many) relationships and exploring deep connections between entities. The type of relationship (one-to-many, many-to-many, etc.) and the required level of consistency should guide your decision. With this approach, you’ll build a reliable, scalable, and coherent system without falling into the misconception that schema flexibility is the only reason to use NoSQL.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Image from: &lt;a href="https://www.commitstrip.com/en/2012/04/10/what-do-you-mean-its-oversized/" rel="noopener noreferrer"&gt;https://www.commitstrip.com/en/2012/04/10/what-do-you-mean-its-oversized/&lt;/a&gt;?&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>nosql</category>
      <category>database</category>
      <category>mongodb</category>
    </item>
  </channel>
</rss>
