<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matan Shidlov</title>
    <description>The latest articles on DEV Community by Matan Shidlov (@mshidlov).</description>
    <link>https://dev.to/mshidlov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F716305%2F732221a1-6a11-40ad-bef8-c623598d876b.jpg</url>
      <title>DEV Community: Matan Shidlov</title>
      <link>https://dev.to/mshidlov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mshidlov"/>
    <language>en</language>
    <item>
      <title>How to Fix the “Record to Delete Does Not Exist” Error in Prisma</title>
      <dc:creator>Matan Shidlov</dc:creator>
      <pubDate>Tue, 14 Jan 2025 11:26:52 +0000</pubDate>
      <link>https://dev.to/mshidlov/how-to-fix-the-record-to-delete-does-not-exist-error-in-prisma-5fo3</link>
      <guid>https://dev.to/mshidlov/how-to-fix-the-record-to-delete-does-not-exist-error-in-prisma-5fo3</guid>
      <description>&lt;p&gt;When you use Prisma to interact with your database, you might run into an error that says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An operation failed because it depends on one or more records that were required but not found. Record to delete does not exist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In plain English, this means Prisma can’t find the record you’re trying to delete because your &lt;code&gt;where&lt;/code&gt; clause does not match a &lt;em&gt;unique&lt;/em&gt; key. Below, we’ll walk through how this happens and the steps to fix it, using a &lt;strong&gt;blog post&lt;/strong&gt; example model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Error Occurs
&lt;/h2&gt;

&lt;p&gt;In Prisma, when you delete a record like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* Some criteria */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the &lt;code&gt;where&lt;/code&gt; clause &lt;strong&gt;must&lt;/strong&gt; reference a &lt;em&gt;unique&lt;/em&gt; key—e.g., a primary key (&lt;code&gt;@id&lt;/code&gt;), a field marked with &lt;code&gt;@unique&lt;/code&gt;, or a group of fields defined with &lt;code&gt;@@unique([...])&lt;/code&gt; or &lt;code&gt;@@id([...])&lt;/code&gt; (a composite key).&lt;/p&gt;

&lt;p&gt;If you include fields that aren’t guaranteed to be unique, Prisma won’t find (or won’t allow) the record because it doesn’t match the unique constraint, triggering the “Record to delete does not exist” error.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example Scenario
&lt;/h2&gt;

&lt;p&gt;Let’s say you have this model in your &lt;code&gt;schema.prisma&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model Post {
  id        Int      @id @default(autoincrement())
  title     String
  content   String
  published Boolean   @default(false)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;id&lt;/code&gt; is the only uniquely identified field by default (the primary key).&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Deleting by Primary Key Alone
&lt;/h3&gt;

&lt;p&gt;Because &lt;code&gt;id&lt;/code&gt; is unique, you can safely delete by &lt;code&gt;id&lt;/code&gt; alone:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;deletePost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works perfectly if all you need is the &lt;code&gt;id&lt;/code&gt; to identify which Post to delete.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Checking Other Fields Before Deleting
&lt;/h3&gt;

&lt;p&gt;What if you need to ensure that the post’s &lt;code&gt;title&lt;/code&gt; matches before deleting it? Since &lt;code&gt;title&lt;/code&gt; is &lt;strong&gt;not&lt;/strong&gt; unique in the schema, you can’t just do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This might fail because (id, title) isn't a unique combination in the schema&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// valid unique field&lt;/span&gt;
    &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// not unique&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead, you can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Find&lt;/strong&gt; the record first by &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;title&lt;/code&gt; (without requiring it to be unique).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throw&lt;/strong&gt; an error if it doesn’t match.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delete&lt;/strong&gt; it by the primary key (&lt;code&gt;id&lt;/code&gt;).
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;deletePostByIdAndTitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 1. Find the post by `id` and `title`&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;post&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findFirst&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No post found with that ID and title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Delete by the primary key&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sequence ensures that you only delete a post if both &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;title&lt;/code&gt; match the record you expect.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Using a Composite Unique Key
&lt;/h3&gt;

&lt;p&gt;If your business logic requires &lt;code&gt;(id, title)&lt;/code&gt; to be a unique pair, you can define a composite constraint in the schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model Post {
  id        Int      @default(autoincrement())
  title     String
  content   String
  published Boolean   @default(false)

  @@unique([id, title]) // or @@id([id, title]) if you want them as a composite primary key
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, Prisma recognizes &lt;code&gt;(id, title)&lt;/code&gt; as a unique combination. You can then do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id_title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;(The exact syntax—&lt;code&gt;id_title: { id, title }&lt;/code&gt;—comes from how Prisma interprets composite keys in your schema.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;To avoid the &lt;strong&gt;“Record to delete does not exist”&lt;/strong&gt; error:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Delete by a unique field&lt;/strong&gt; (e.g., &lt;code&gt;id&lt;/code&gt;) if that’s sufficient.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check first, then delete&lt;/strong&gt; if you need to match non-unique fields.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define composite unique keys&lt;/strong&gt; if multiple fields must uniquely identify your record.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By aligning your &lt;code&gt;where&lt;/code&gt; clause with a valid unique or primary key—either by using existing unique fields or by creating a composite constraint—you’ll ensure your Prisma &lt;code&gt;.delete()&lt;/code&gt; operations work without error.&lt;/p&gt;

</description>
      <category>prisma</category>
      <category>database</category>
      <category>node</category>
      <category>development</category>
    </item>
    <item>
      <title>SSO Gone Wrong: Insights from a Real Breach</title>
      <dc:creator>Matan Shidlov</dc:creator>
      <pubDate>Thu, 09 Jan 2025 13:36:10 +0000</pubDate>
      <link>https://dev.to/mshidlov/sso-gone-wrong-insights-from-a-real-breach-g3</link>
      <guid>https://dev.to/mshidlov/sso-gone-wrong-insights-from-a-real-breach-g3</guid>
      <description>&lt;p&gt;In one of my past projects, we experienced a critical security breach when a white-hat security researcher reported a vulnerability that allowed unauthorized access to one of our root system admin accounts. This superuser account was designed to facilitate support operations, granting access to other accounts for internal use only. While the situation could have been catastrophic, we were fortunate for two key reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Researcher’s Intentions Were Ethical&lt;/strong&gt;: The security researcher acted with goodwill and a bounty incentive. They demonstrated that they could log in without exploiting the access to harm or compromise client data. Notably, the researcher did not realize that the compromised superuser account could access other user accounts, limiting potential damage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limited Exposure to the Vulnerability&lt;/strong&gt;: The vulnerability, known as "nOAuth," was rooted in Microsoft Azure Active Directory’s Single Sign-On (SSO) feature. This feature had been deployed in our application only 2-3 days prior to the breach. Once notified, we acted quickly, disabling the Microsoft Azure AD SSO login feature to close the vulnerability until a proper fix could be implemented.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Understanding the "nOAuth" Vulnerability
&lt;/h2&gt;

&lt;p&gt;The "nOAuth" vulnerability was a critical flaw in Microsoft Azure Active Directory (Azure AD) applications utilizing the "Log in with Microsoft" feature. This security gap allowed attackers to impersonate users by modifying the email attribute in their Azure AD profile to match the target victim’s email. If an application relied solely on the email claim for user identification, it could inadvertently grant unauthorized access to attackers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Attack Flow:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Preparation&lt;/strong&gt;: With administrative privileges in their Azure AD account, the attacker altered their "Email" attribute to mimic the target victim’s email address.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploitation&lt;/strong&gt;: Using the "Log in with Microsoft" feature on a vulnerable application, the attacker could successfully log in. The application identified users based solely on the email claim without additional verification.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why It Happened:
&lt;/h3&gt;

&lt;p&gt;The vulnerability exploited a design flaw in applications that depended on mutable and unverified claims, such as the email attribute, for user identification or authorization.&lt;/p&gt;

&lt;p&gt;We aimed to give existing users the option to log in with different authentication methods to the same account. For example, a user signing up with a Google account could access their account using another method, provided the same email address was used. However, prioritizing the user email address received in the OAuth2 callback to identify user identity exposed our application to this vulnerability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mitigation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Immediate Actions Taken:
&lt;/h3&gt;

&lt;p&gt;Once the vulnerability was reported, we acted according to a pre-determined protocol designed to handle severe security risks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disabled the Microsoft Azure AD SSO feature, effectively eliminating the risk.&lt;/li&gt;
&lt;li&gt;Informed all client-facing representatives and company executives about the incident.&lt;/li&gt;
&lt;li&gt;Held an all-hands meeting with the development team to decide on a course of action.&lt;/li&gt;
&lt;li&gt;Analyzed login data to gauge the severity and scope of the incident.&lt;/li&gt;
&lt;li&gt;Sent targeted notifications to users affected by the Microsoft SSO feature, offering guidance and points of contact for support.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;To prevent similar vulnerabilities, applications must implement robust security practices and &lt;strong&gt;use Immutable Identifiers&lt;/strong&gt; Instead of relying on mutable claims like the email attribute, use immutable identifiers such as the &lt;code&gt;sub&lt;/code&gt; and &lt;code&gt;iss&lt;/code&gt; ID Token claims to uniquely identify users and their identity providers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Code:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Extracting immutable 'sub' claim for user identification  &lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jwt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;jsonwebtoken&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifyToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;  
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;decoded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Immutable user identifier&lt;/span&gt;

   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  
       &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid token: Missing subject claim&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Product Options
&lt;/h2&gt;

&lt;p&gt;When a user logs in with different methods, there are several ways a product can address associated security and usability challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Restrict Cross-Provider Logins&lt;/strong&gt;: Require users to be uniquely identified by an immutable identifier, such as the &lt;code&gt;oid&lt;/code&gt;, ensuring email addresses do not overlap or lead to account hijacking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Separate Accounts per Provider&lt;/strong&gt;: Create distinct accounts for each authentication provider. This ties accounts to their unique identifiers (&lt;code&gt;oid&lt;/code&gt;) rather than emails, avoiding issues with email duplication.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Account Merging Based on Email&lt;/strong&gt;: Automatically merge user accounts across SSO providers if they share the same email address. However, relying solely on email for identification is insufficient; additional verification methods are necessary during the merge process to mitigate risks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Key Security Enhancements:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User Notifications&lt;/strong&gt;: Notify users whenever a new login method or additional SSO provider is linked to their account. Real-time alerts enable prompt responses to unauthorized changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-Factor Authentication (MFA)&lt;/strong&gt;: Require multiple verification steps, such as email confirmations, SMS codes, or app-based authenticators. MFA is particularly crucial when merging accounts or adding new authentication methods.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitoring and Auditing&lt;/strong&gt;: Continuously monitor login flows to detect anomalies. Regularly audit authentication logs to ensure adherence to security protocols and identify potential vulnerabilities. These logs are invaluable for analyzing the impact of security incidents and improving future response strategies.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The "nOAuth" vulnerability underscored the importance of robust security measures and proactive responses to emerging threats. Team education and pre-established security protocols played a pivotal role in our ability to respond effectively. Combining multiple measures—such as immutable identifiers, enhanced user verification, and continuous monitoring—proved essential in mitigating risks. Additionally, it is critical to approach third-party information and integrations with a grain of salt, as these can introduce unforeseen vulnerabilities. By maintaining vigilance, investing in ongoing education, and adapting to the evolving threat landscape, organizations can secure their systems and safeguard user data effectively.&lt;/p&gt;

</description>
      <category>security</category>
      <category>node</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How Closures Can Cause Memory Leaks and What You Can Do About It</title>
      <dc:creator>Matan Shidlov</dc:creator>
      <pubDate>Tue, 07 Jan 2025 13:44:01 +0000</pubDate>
      <link>https://dev.to/mshidlov/how-closures-can-cause-memory-leaks-and-what-you-can-do-about-it-fjd</link>
      <guid>https://dev.to/mshidlov/how-closures-can-cause-memory-leaks-and-what-you-can-do-about-it-fjd</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Memory leaks are a developer’s nightmare, especially when they occur in production. Despite our best efforts to write clean, efficient code, subtle issues like improper use of closures can lead to memory leaks that are difficult to detect and resolve. This article focuses on understanding closures and their interaction with the garbage collector (GC), recounting my experience with an accidental memory leak caused by closures. We’ll explore how closures hold references to memory, why this can prevent the GC from reclaiming it, and the lessons learned along the way.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Problem: A Gradual Memory Increase in Production
&lt;/h3&gt;

&lt;p&gt;Everything seemed fine during development and testing. However, a few days after deploying our application to production, our monitoring system flagged an unusual memory usage pattern. The memory consumption of our Node.js application was steadily increasing over time, eventually causing performance degradation and even crashes.&lt;/p&gt;

&lt;p&gt;Initially, I suspected external factors, such as database connection issues or unoptimized third-party libraries. But after isolating the application and reproducing the issue locally, I realized the problem was within our codebase.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Investigation: A Challenging Path
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Understanding Closures and the Garbage Collector&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Closures are functions that "close over" their lexical scope, retaining references to variables defined in their outer scope. While this behavior is incredibly powerful, it can lead to memory leaks if developers are unaware of what variables the closure is holding onto. The garbage collector cannot release memory for variables referenced by closures, even if those variables are no longer needed elsewhere in the application.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Analyzing the Symptoms&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Memory leaks often manifest as memory that is no longer needed but is not released. In this case, the garbage collector was unable to reclaim memory, indicating that something in our code was retaining references to unused objects. The challenge was identifying what.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Analyzing the Heap&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;I turned to &lt;strong&gt;Node.js Heap Snapshots&lt;/strong&gt; to capture and analyze memory usage. By taking snapshots of the heap at different intervals, I observed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A growing number of retained objects.&lt;/li&gt;
&lt;li&gt;Certain closures holding references to variables long after their usefulness had ended.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;The Culprit: A Closure Holding Large Data&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;After meticulously going through the heap analysis, I discovered that a closure unintentionally held onto references to variables in its outer scope, preventing them from being garbage collected. This closure was inadvertently kept alive, preventing the garbage collector from reclaiming the memory associated with the large object.&lt;/p&gt;

&lt;p&gt;Here’s a concrete example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createLeak&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;largeObject&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;leaky data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Simulating a large object.&lt;/span&gt;

    &lt;span class="c1"&gt;// The closure retains a reference to `largeObject`.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;leakyFunction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;largeObject&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="c1"&gt;// Accessing `largeObject` in the closure.&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;leakyClosure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createLeak&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Even if `createLeak` is no longer called, `largeObject` remains in memory due to the closure.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What Happens in the Code:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Creation of &lt;code&gt;largeObject&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
Inside &lt;code&gt;createLeak&lt;/code&gt;, a large array &lt;code&gt;largeObject&lt;/code&gt; is created. This array uses a significant amount of memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Closure Retains Reference:&lt;/strong&gt;&lt;br&gt;
The inner function &lt;code&gt;leakyFunction&lt;/code&gt; forms a closure over the outer function’s scope, which includes the &lt;code&gt;largeObject&lt;/code&gt; variable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Return of the Closure:&lt;/strong&gt;&lt;br&gt;
The closure &lt;code&gt;leakyFunction&lt;/code&gt; is returned and assigned to &lt;code&gt;leakyClosure&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory Leak:&lt;/strong&gt;&lt;br&gt;
Even if &lt;code&gt;createLeak&lt;/code&gt; completes execution, the &lt;code&gt;largeObject&lt;/code&gt; is not garbage collected because the closure &lt;code&gt;leakyFunction&lt;/code&gt; still holds a reference to it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This prevents the &lt;code&gt;largeObject&lt;/code&gt; from being freed from memory.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Resolution: Fixing the Leak
&lt;/h3&gt;

&lt;p&gt;To resolve the issue, I redesigned the code to ensure closures do not retain unnecessary references to large objects. The solution ensures closures only retain references to necessary variables. Here’s the revised example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createFixed&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;largeObject&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;leaky data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Use the required value, not the entire object.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;importantValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;largeObject&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="c1"&gt;// Only keep the necessary data in the closure.&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fixedFunction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;importantValue&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fixedClosure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createFixed&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Now, `largeObject` can be garbage collected since the closure does not retain it.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What Changed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only the necessary part of the &lt;code&gt;largeObject&lt;/code&gt; (&lt;code&gt;importantValue&lt;/code&gt;) is retained in the closure.&lt;/li&gt;
&lt;li&gt;The large array &lt;code&gt;largeObject&lt;/code&gt; is no longer referenced by the closure, allowing the garbage collector to free its memory once &lt;code&gt;createFixed&lt;/code&gt; finishes execution.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;p&gt;This experience taught me several valuable lessons about closures and memory management:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Understand Closures and the Garbage Collector:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Closures retain references to variables in their outer scope. If those references are no longer needed but not explicitly released, the garbage collector cannot reclaim the associated memory, leading to leaks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Monitor Production Applications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up robust monitoring to detect memory anomalies early. Memory leaks often manifest gradually, so monitoring trends can help catch issues before they become critical.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Minimize Captured Variables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Design closures to capture only the variables they truly need, reducing the likelihood of retaining unnecessary data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Memory leaks can be elusive, especially when they’re caused by subtle issues like closures. Understanding how closures interact with the garbage collector is crucial to writing efficient and leak-free code. With the right tools and practices, such leaks can be identified and resolved effectively. By being vigilant about cleaning up resources and mindful of what closures are capturing, you can avoid similar pitfalls and ensure your applications run smoothly in production.&lt;/p&gt;

</description>
      <category>coding</category>
      <category>programming</category>
      <category>javascript</category>
      <category>node</category>
    </item>
    <item>
      <title>Using Apache Parquet to Optimize Data Handling in a Real-Time Ad Exchange Platform</title>
      <dc:creator>Matan Shidlov</dc:creator>
      <pubDate>Tue, 07 Jan 2025 12:48:12 +0000</pubDate>
      <link>https://dev.to/mshidlov/using-apache-parquet-to-optimize-data-handling-in-a-real-time-ad-exchange-platform-2l37</link>
      <guid>https://dev.to/mshidlov/using-apache-parquet-to-optimize-data-handling-in-a-real-time-ad-exchange-platform-2l37</guid>
      <description>&lt;p&gt;A few years back, while working at cignal.io, I led the development of a real-time bidding platform for ad opportunities. This smart ad exchange managed a process called Real-Time Bidding (RTB). RTB is an automated system where advertisers bid in real time for the chance to display their ads to specific users visiting websites. When a partner sent an ad opportunity, our platform processed it through a series of real-time machine learning (ML) models to predict which advertising partner should receive the opportunity to bid. These models performed tasks like fraud detection, auction-winning prediction, matching advertising partners based on buying patterns, and identifying repeating opportunities. Ultimately, this system ensured that the highest bidder's ad was displayed, optimizing efficiency and relevance for advertisers and users alike.&lt;/p&gt;

&lt;p&gt;The scale of the platform was staggering, handling 100,000 to 150,000 ad opportunities per second. Each opportunity was represented as a large JSON object of up to 2-3 KB in size. Not every opportunity received a bid; in fact, around 40-50% were filtered out by predictive models and never sent forward. For the remaining opportunities, if a bid was placed and won the auction, a notification was generated. This activity resulted in over 1 TB of data every hour. The sheer volume of data posed significant challenges for training ML models, especially when more than 90% of the data consisted of opportunities without bids.&lt;/p&gt;

&lt;h3&gt;
  
  
  Initial Steps to Manage Data Volume
&lt;/h3&gt;

&lt;p&gt;To address the data explosion, we implemented a selective data writing approach. Only a small percentage of the ad opportunities were written to storage, focusing primarily on those that resulted in bids. For these, we added a flag to indicate whether the opportunity was part of the reduced write set. This allowed us to maintain balanced statistical information—for example, the number of ad opportunities originating from New York—while significantly reducing the volume of stored data.&lt;/p&gt;

&lt;p&gt;This strategy improved the preprocessing workflow for Spark, which was used to join data fragments and prepare it for ML tasks. However, as the platform scaled, the demands on Spark clusters grew, increasing processing time. Delays in updating the models with new data affected the quality of real-time predictions, and the rising resource costs reduced the platform’s return on investment (ROI).&lt;/p&gt;

&lt;h3&gt;
  
  
  Transitioning to Apache Parquet
&lt;/h3&gt;

&lt;p&gt;To solve these issues, we transitioned to storing all our data in Apache Parquet. Parquet is an open-source, columnar storage file format optimized for large-scale data processing and analytics. Developed collaboratively by Twitter and Cloudera and inspired by Google’s Dremel paper, Parquet became a top-level Apache project in 2015. Its columnar structure and support for efficient compression and encoding schemes made it an ideal choice for our use case.&lt;/p&gt;

&lt;p&gt;We chose Snappy as the compression algorithm for Parquet, which balanced speed and efficiency. Parquet’s columnar format allowed us to store similar data types together, significantly improving compression ratios and reducing storage requirements. Additionally, Snappy compression enabled the files to be split and processed in a distributed manner, allowing us to leverage our big Spark clusters effectively. The columnar design also enabled selective reading of relevant columns during query execution, drastically reducing I/O operations and speeding up data processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits of Using Parquet
&lt;/h3&gt;

&lt;p&gt;The switch to Parquet had a transformative impact on our platform:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reduced Resource Usage:&lt;/strong&gt; The improved storage efficiency and compression reduced the amount of hardware and computational resources required for data processing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Faster Data Processing:&lt;/strong&gt; By storing data in Parquet, we dramatically decreased the processing time for Spark jobs. This allowed us to update ML models more frequently, improving their real-time prediction accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced Scalability:&lt;/strong&gt; As our data flow grew, Parquet’s efficient format allowed us to handle increased volumes without proportional increases in infrastructure costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Empowered Data Scientists:&lt;/strong&gt; The ability to process larger volumes of data during research and testing enabled our data scientists to refine and enhance all our ML models. Parquet’s schema evolution feature also allowed for seamless updates to data structures without breaking existing workflows.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;By adopting Apache Parquet and following its best practices, we not only overcame the challenges of scaling our ad exchange platform but also improved the overall efficiency and quality of our ML models. The shift to Parquet enhanced our ability to react to real-time changes in data, optimized resource usage, and provided our data science team with the tools to innovate further. This experience underscored the value of choosing the right data storage format for high-scale, data-intensive applications.&lt;/p&gt;

</description>
      <category>bigdata</category>
      <category>dataengineering</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Optimizing Data Pipelines for Fiix Dating App</title>
      <dc:creator>Matan Shidlov</dc:creator>
      <pubDate>Sun, 05 Jan 2025 14:07:50 +0000</pubDate>
      <link>https://dev.to/mshidlov/optimizing-data-pipelines-for-fiix-dating-app-2b0p</link>
      <guid>https://dev.to/mshidlov/optimizing-data-pipelines-for-fiix-dating-app-2b0p</guid>
      <description>&lt;p&gt;Working for Fiix, formerly known as Jfiix, a mobile dating application, I took on the task of refining a crucial data pipeline. This pipeline served as the backbone of the app’s user engagement strategy. The pipeline analyzed user interactions—likes and messages—to generate match suggestions that would be delivered via push notifications. These suggestions aimed to reduce churn, boost active users, improve retention rates, increase conversion rates, and ultimately elevate the lifetime value (LTV) of the platform's user base.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenges Faced
&lt;/h3&gt;

&lt;p&gt;The project presented two significant challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Technical Challenge:&lt;/strong&gt;&lt;br&gt;
The process required over 15 hours to complete daily and exhibited a concerning trend of increased processing time as the user base grew. Preparing for user base growth to support further scaling became crucial.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Business Challenge:&lt;/strong&gt;&lt;br&gt;
Despite the intensive computations, the number of actual matches produced was limited, resulting in a poor return on investment (ROI). This inefficiency jeopardized the business goal of driving user re-engagement and retention.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Root Cause Analysis
&lt;/h3&gt;

&lt;p&gt;At the core of the problem was the method used to identify potential matches. The logic involved a multi-step relationship analysis:&lt;/p&gt;

&lt;p&gt;For example, if User A likes or messages User B, and User B then interacts with User C, this suggests User C might have interests similar to User A. This chain of interactions can help identify potential matches based on shared connections and interests. Building on this, if User C interacts with User D, there is a possibility that User D could also be a good match for User A.&lt;/p&gt;

&lt;p&gt;The computational burden stemmed from a series of Cartesian product operations, which combine every row of one dataset with every row of another. This process exponentially increased the data volume being processed, leading to significant challenges in memory and computation. This led to excessive memory usage, data spills, high I/O operations, and intensive CPU demands.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: A Data-First Mindset
&lt;/h3&gt;

&lt;p&gt;To address these challenges, I devised and implemented a new pipeline with a data-first approach, emphasizing efficiency at each stage. Here are the steps I took:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Database Optimization:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Configured the MySQL database for write optimization by fine-tuning database internals and optimizing the host server settings.&lt;/p&gt;

&lt;p&gt;This ensured smoother data ingestion and retrieval processes.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Dedicated Interaction Tables:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Set up dedicated tables to store daily user interactions, isolating this data for streamlined processing.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Layered Processing Workflow:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Broke the pipeline into seven distinct processing steps. For example, the first step involved identifying unique user interactions and storing them in a dedicated temporary table. Subsequent steps layered additional insights, such as filtering for active users, mapping interaction chains, and prioritizing based on engagement metrics.&lt;/p&gt;

&lt;p&gt;Each step added a new layer of processed data and wrote the results to temporary tables.&lt;/p&gt;

&lt;p&gt;This layered approach reduced the need to hold large datasets in memory and allowed the database to perform efficient, incremental operations.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Indexing and Partitioning:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Leveraged indexing and partitioning to accelerate query performance and reduce I/O operations.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Incremental Data Processing:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Designed the pipeline to process only new data each day, minimizing redundant computations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Results Achieved
&lt;/h4&gt;

&lt;p&gt;The revamped pipeline delivered transformative results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance Improvement:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced processing time from over 15 hours to under 2 hours.&lt;/li&gt;
&lt;li&gt;Enabled the system to handle significantly larger datasets without resource bottlenecks.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Increased Matches:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boosted the number of matches produced by approximately 600%, as measured by the total count of successful user connections each day compared to the previous pipeline. This increase led to a noticeable improvement in user engagement, with more users returning to the app after receiving match suggestions.&lt;/li&gt;
&lt;li&gt;Enhanced the relevance of match suggestions, leading to higher user satisfaction.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Business Impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Achieved the primary goal of reducing churn and increasing user engagement.&lt;/li&gt;
&lt;li&gt;Contributed to improved retention rates, higher conversion rates, and greater LTV.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reflections
&lt;/h3&gt;

&lt;p&gt;This project provided several key insights that were instrumental in its success:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Incremental and Modular Design:&lt;/strong&gt; Breaking down complex problems into smaller, manageable steps was critical for achieving both efficiency and scalability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Effective Database Optimization:&lt;/strong&gt; Leveraging features like indexing, partitioning, and write optimization resulted in substantial performance improvements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Understanding User Interaction Patterns:&lt;/strong&gt; A deep analysis of user relationships and interactions was central to building an effective match suggestion system.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These insights highlight the value of adopting a data-first mindset and engineering solutions that align technical efficiency with business objectives. By embracing a structured and incremental approach, we were able to overcome significant challenges and deliver measurable value to the Fiix platform and its users.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>performance</category>
      <category>datapipelines</category>
    </item>
    <item>
      <title>I'm Joining Amplication with a Vision to Democratize Engineering</title>
      <dc:creator>Matan Shidlov</dc:creator>
      <pubDate>Thu, 07 Oct 2021 11:41:53 +0000</pubDate>
      <link>https://dev.to/amplication/i-m-joining-amplication-with-a-vision-to-democratize-engineering-385</link>
      <guid>https://dev.to/amplication/i-m-joining-amplication-with-a-vision-to-democratize-engineering-385</guid>
      <description>&lt;p&gt;I've spent the last decade leading technology development, creating real-time low latency AI-based systems, processing billions of requests each day, and helping companies with their software architectures. Through the years, I've been working alongside highly talented software engineers and cloud architects. I've learned to appreciate the value of sound engineering that can make the difference in creating reliable, maintainable, flexible, and scalable software.&lt;br&gt;
Now I'm excited to try and make software architecture and good engineering practices more accessible to all by joining Amplication as VP of engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unicorns
&lt;/h2&gt;

&lt;p&gt;In today's economy, what once was an unreachable title "unicorn", became another milestone in companies' way to success. Companies are experiencing hyper-growth, which leads to a severe shortage of talented developers. Any crisis in history (including COVID-19) accelerates evolution and creates new opportunities. The severe lack of gifted developers was a unique opportunity for new people to join the engineering community, and they've been integrated within both big and small organizations. &lt;/p&gt;

&lt;h2&gt;
  
  
  Good Engineering Is a Scarce Commodity
&lt;/h2&gt;

&lt;p&gt;Unfortunately, tapping new sources of talent doesn't fill the shortage of experienced engineers who have extensive knowledge and understanding. Senior developers who can mentor inexperienced programmers and create intelligent, robust, scalable, maintainable, and flexible solutions are especially tough to find. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost of Lack of Good Engineering
&lt;/h2&gt;

&lt;p&gt;I've had the privilege of helping many companies with consulting on how to architect for scale. I noticed that, in many companies, low engineering standards and poor decisions were impacting business. So many developers today are writing code that "works" but doesn't stand the test of time. The main reason for this phenomenon is lack of knowledge and insufficient training. Writing code that only "works" will not withstand any of the following dynamics:&lt;br&gt;
Logic Changes - adding, removing, and updating business logic&lt;br&gt;
Data Changes - adding, removing, and updating fields and entities&lt;br&gt;
Data Accumulates - what was once thousands of entries grows to hundreds of thousands, millions, and billions&lt;br&gt;
Usage Changes - Getting more and more traffic as a result of growth and usage spikes that leads to uneven loads&lt;br&gt;
Infrastructure Changes - Changes in API's of 3rd party services or products (for example, databases)&lt;br&gt;
Staff changes - Original committers leave and which results in knowledge loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Democratization of Engineering Is the Need of the Hour
&lt;/h2&gt;

&lt;p&gt;I have a vision that with Amplication, an open-source low-code platform, we can empower developers. Amplication can take care of those necessities of repetitive code and function as a developer private architect helping implement great software with a solid foundation. In my vision, senior developers will appreciate Amplication as an uncompromising shortcut, and developers at the beginning of their career will find it extremely useful as an enabler tool for quality engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amplication
&lt;/h2&gt;

&lt;p&gt;Amplication is an open-source low-code platform for backend and full-stack developers. Amplication's goal is to help developers and empower their code generation. The application code owner is the developer, and Amplication supports the project by providing a robust base. When developing projects with Amplication, developers can create great applications without the nuisance of being "bogged down" with building and maintaining the application's infrastructure and architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open as a Policy
&lt;/h2&gt;

&lt;p&gt;What a better way to democratize software engineering than open-sourcing it and building a community around the principles of open engineering. &lt;br&gt;
I'm excited to start this journey with Amplication. I will use this platform to share our architecture decisions, best practices, and coding standards along the way and publicly build Amplication with this great community.&lt;/p&gt;

&lt;p&gt;Please share your experience using Amplication and your vision of how Amplication can help you in your next project.&lt;/p&gt;

&lt;p&gt;I'm always available along with the rest of the team at our Discord channel. So please join, to talk to me about anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/amplication/amplication" rel="noopener noreferrer"&gt;github&lt;/a&gt;, &lt;a href="https://discord.com/invite/Z2CG3rUFnu" rel="noopener noreferrer"&gt;discord&lt;/a&gt;
&lt;/h3&gt;

</description>
      <category>opensource</category>
      <category>programming</category>
      <category>architecture</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
