<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ujjawal Tyagi</title>
    <description>The latest articles on DEV Community by Ujjawal Tyagi (@ujjawaltyagi).</description>
    <link>https://dev.to/ujjawaltyagi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1273744%2Fa67c260b-0ce4-454a-aecb-fcc1f0caaf1d.png</url>
      <title>DEV Community: Ujjawal Tyagi</title>
      <link>https://dev.to/ujjawaltyagi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ujjawaltyagi"/>
    <language>en</language>
    <item>
      <title>Decoding Amazon's Recommendation Engine</title>
      <dc:creator>Ujjawal Tyagi</dc:creator>
      <pubDate>Fri, 01 Mar 2024 17:16:15 +0000</pubDate>
      <link>https://dev.to/ujjawaltyagi/decoding-amazons-recommendation-engine-3gpo</link>
      <guid>https://dev.to/ujjawaltyagi/decoding-amazons-recommendation-engine-3gpo</guid>
      <description>&lt;p&gt;I am sure that while browsing &lt;a href="https://www.amazon.com/" rel="noopener noreferrer"&gt;Amazon&lt;/a&gt;, you must have experienced this while casually looking at something, and then you get bombarded with suggestions for "&lt;strong&gt;similar items you might like&lt;/strong&gt;"? It's almost like the website can read your mind!&lt;/p&gt;

&lt;p&gt;Well, while it may not be telepathy, there's a &lt;strong&gt;powerful recommendation engine&lt;/strong&gt; behind the scenes, carefully crafting personalized suggestions just for you. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;But the question is&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How it works?&lt;/li&gt;
&lt;li&gt;How does Amazon balance speed and accuracy in delivering personalized recommendations?&lt;/li&gt;
&lt;li&gt;How do the deal with user privacy while utilizing their data for personalized recommendations?&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;So let's try to understand the &lt;strong&gt;inner workings&lt;/strong&gt; of Amazon's Recommendation Engine and don't worry I won't make it complicated!&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond "Customers Who Bought This Also Bought"
&lt;/h2&gt;

&lt;p&gt;While "Customers Who Bought This Also Bought" is a familiar sight, it's just one piece of the puzzle. We've all encountered those appealing product suggestions while browsing Amazon. &lt;strong&gt;But have you ever wondered&lt;/strong&gt; how Amazon curates these recommendations amidst its vast inventory? How do they filter or what technique do they use for leveraging user behavior data to predict preferences?&lt;/p&gt;

&lt;p&gt;Well the answer lies in the root of two primary techniques:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Collaborative Filtering&lt;/strong&gt;: This method analyzes the behavior of similar users. Let's try to understand this, Imagine a giant network where users and items are connected based on their interactions. By analyzing buying habits and ratings of users with similar tastes, the engine predicts what you might like based on what others like you have chosen.
Here's the technical breakdown:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User-item matrix&lt;/strong&gt;: This matrix represents interactions 
(purchases, ratings, etc.) between users and items. Each cell 
holds a value signifying the interaction strength.
(e.g. purchase = 1, no interaction = 0)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Similarity measures&lt;/strong&gt;: Techniques like &lt;a href="https://naomy-gomes.medium.com/the-cosine-similarity-and-its-use-in-recommendation-systems-cb2ebd811ce1" rel="noopener noreferrer"&gt;cosine similarity&lt;/a&gt; or 
&lt;a href="https://new.pythonforengineers.com/blog/machine-learning-with-an-amazon-like-recommendation-engine/" rel="noopener noreferrer"&gt;Pearson correlation&lt;/a&gt; coefficients measure the similarity between 
user profiles based on their interaction patterns within the 
matrix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nearest neighbor algorithms&lt;/strong&gt;: These algorithms identify 
users with the highest similarity scores to the target user. 
Their past interactions are then used to recommend items they 
haven't encountered yet but might enjoy based on their similar 
preferences.
 
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feqskf8tk99ul5minj3uq.png" alt="Similar preferences"&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content-Based Filtering&lt;/strong&gt;: This technique focuses on the item itself. The engine analyzes features, descriptions, and categories of products you've interacted with, and then recommends similar items based on these characteristics.
It can involve: 

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Item-item matrix&lt;/strong&gt;: This matrix represents the relationships 
between items based on shared features, categories, or 
descriptions. Each cell holds a similarity score between items.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature engineering&lt;/strong&gt;: Techniques like &lt;a href="https://towardsdatascience.com/sentiment-analysis-and-product-recommendation-on-amazons-electronics-dataset-reviews-part-2-de71649de42b" rel="noopener noreferrer"&gt;TF-IDF &lt;/a&gt;(Term 
Frequency-Inverse Document Frequency) are employed to extract 
relevant features and represent them numerically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nearest neighbor algorithms&lt;/strong&gt;: Similar to collaborative 
filtering, these algorithms identify items with the highest 
similarity scores to items the user has interacted with. These 
similar items are then presented as recommendations.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Hybridization:
&lt;/h2&gt;

&lt;p&gt;With millions of products and customers, efficiently sorting through all that data is a huge challenge. To deal with this Amazon employs a technique called &lt;strong&gt;Matrix Factorization&lt;/strong&gt;.&lt;br&gt;
Also, Amazon doesn't rely solely on one technique. It often employs a &lt;strong&gt;hybrid approach&lt;/strong&gt;, combining the strengths of collaborative and content-based filtering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weighted combination&lt;/strong&gt;: The recommendations from both 
techniques are combined using weights based on their individual 
effectiveness for the specific user or item.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Matrix factorization&lt;/strong&gt;: Advanced techniques like matrix 
factorization can be used to create a lower-dimensional 
representation of the user-item and item-item matrices, capturing 
latent factors influencing user preferences and item 
relationships. This allows for more efficient and accurate 
recommendations.&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  What about data?
&lt;/h2&gt;

&lt;p&gt;These algorithms are only as good as the data they are fed. Amazon leverages a vast amount of user data to personalize recommendations, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explicit feedback&lt;/strong&gt;: It includes purchase history, ratings, 
reviews, and wish list additions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implicit feedback&lt;/strong&gt;: It involves Browsing behavior, search 
queries, clicks on product images, and time spent on product 
pages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contextual data&lt;/strong&gt;: Location, time of day, and device type can 
be used to tailor recommendations to specific situations (e.g., 
suggesting raincoats during a rainfall).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Personalization
&lt;/h2&gt;

&lt;p&gt;Amazon employs additional techniques to personalize the recommendation experience:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Time-based recommendations&lt;/strong&gt;: Products are suggested based 
on seasonal trends or upcoming events (e.g., recommending 
cookbooks around holidays).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time recommendations&lt;/strong&gt;: User behavior is analyzed in 
real-time to dynamically adjust recommendations on the fly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A/B testing&lt;/strong&gt;: Different recommendation strategies are 
tested on different user segments to identify the most effective 
approach for each individual.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb22rhsr9z77ngtnoog16.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb22rhsr9z77ngtnoog16.png" alt="personalization"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But don't you think that &lt;strong&gt;scaling&lt;/strong&gt; this recommendation engine to serve &lt;strong&gt;millions of users&lt;/strong&gt; requires more than just clever algorithms? Yes, It demands a robust infrastructure. Amazon's recommendation engine operates atop a &lt;strong&gt;distributed computing framework&lt;/strong&gt;, where data is partitioned across multiple servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But&lt;/strong&gt; what happens if a server fails under the weight of &lt;strong&gt;user queries?&lt;/strong&gt; For that Amazon has implemented fault-tolerant mechanisms, ensuring uninterrupted service by replicating data across redundant servers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's the role of Caching?
&lt;/h2&gt;

&lt;p&gt;Amazon utilizes &lt;a href="https://www.cloudflare.com/en-in/learning/cdn/what-is-caching/" rel="noopener noreferrer"&gt;caching&lt;/a&gt; to store frequently accessed data closer to users, reducing the need to fetch information from the main database repeatedly. By keeping popular data in a cache, Amazon &lt;strong&gt;minimizes&lt;/strong&gt; the computational overhead and latency associated with retrieving data, thus enhancing the overall user experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffae53cgiayg2nap9sc4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffae53cgiayg2nap9sc4.jpg" alt="caching"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reducing Load Times&lt;/strong&gt;: Caching strategies enable Amazon to load web pages and display product information more quickly, leading to shorter wait times for users. With cached data readily available, users experience &lt;strong&gt;faster page load times&lt;/strong&gt;, allowing for smoother browsing and quicker access to desired products.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced User Experience&lt;/strong&gt;: By optimizing data retrieval with caching, Amazon ensures a seamless and efficient shopping experience for its users. Reduced latency and faster access to information contribute to a more responsive website, improving user satisfaction and encouraging increased engagement and sales.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What about user privacy &amp;amp; data?
&lt;/h2&gt;

&lt;p&gt;Even if it is in the name of personalized experience the vast amount of user data collected by Amazon raises concerns about &lt;strong&gt;potential misuse&lt;/strong&gt; or &lt;strong&gt;unauthorized access&lt;/strong&gt;. &lt;br&gt;
Specially Personalized recommendations can inadvertently create &lt;strong&gt;filter bubbles&lt;/strong&gt;, limiting users' exposure to diverse information and viewpoints, and further which can perpetuate existing &lt;strong&gt;biases&lt;/strong&gt;, leading to discriminatory recommendations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So, what does Amazon have to say about this?&lt;/strong&gt; Well, Amazon outlines its data collection and usage practices in its privacy policy, allowing users to make informed choices and allow users to manage their data and &lt;strong&gt;opt out&lt;/strong&gt; of personalized recommendations altogether.&lt;br&gt;
Also, Amazon &lt;strong&gt;anonymizes data&lt;/strong&gt; before using it for recommendation purposes while trends and patterns are analyzed using &lt;strong&gt;aggregated data sets&lt;/strong&gt;, minimizing the use of individual user information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But still&lt;/strong&gt; the balance between personalization and privacy remains a complex and evolving debate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Final Verdict
&lt;/h2&gt;

&lt;p&gt;Amazon's recommendation engine is a complex &lt;strong&gt;combination&lt;/strong&gt; of algorithms, data analysis, and machine learning, constantly evolving and improving. &lt;strong&gt;While the specifics remain proprietary&lt;/strong&gt;, understanding the working between user behavior, data analysis, and recommendation algorithms gives a glimpse of how things work behind the scenes.&lt;/p&gt;

&lt;p&gt;I wonder if other e-commerce giants like &lt;strong&gt;eBay&lt;/strong&gt; or &lt;strong&gt;Walmart&lt;/strong&gt; employ similar recommendation strategies, or if they have their own methods?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What do you think about it? Do let me know in the comments.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;If you enjoyed this blog, you can follow me on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ujjawaltyagii" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/ujjawal_tyagiii" rel="noopener noreferrer"&gt;Twitter/X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/ujjawal-tyagi/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you'd like to support me, you can &lt;a href="https://github.com/sponsors/ujjawaltyagii" rel="noopener noreferrer"&gt;sponsor me on GitHub&lt;/a&gt; or &lt;a href="https://www.buymeacoffee.com/codewithuj" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>aws</category>
      <category>programming</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>Why YouTube Never Runs Out of Storage? It's NOT just CLOUD!</title>
      <dc:creator>Ujjawal Tyagi</dc:creator>
      <pubDate>Thu, 22 Feb 2024 20:19:15 +0000</pubDate>
      <link>https://dev.to/ujjawaltyagi/why-youtube-never-runs-out-of-storage-its-not-just-cloud-225f</link>
      <guid>https://dev.to/ujjawaltyagi/why-youtube-never-runs-out-of-storage-its-not-just-cloud-225f</guid>
      <description>&lt;p&gt;Have you ever wondered, despite all these years and an &lt;strong&gt;absolutely insane amount of video data&lt;/strong&gt; being generated. Why &lt;a href="https://www.youtube.com/" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; haven't run out of space? &lt;strong&gt;Especially&lt;/strong&gt; with hits like these:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fringjbwpyt5kb02g2pbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fringjbwpyt5kb02g2pbs.png" alt="Absurd yt"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3377y1jlf7kvhjq1vd1h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3377y1jlf7kvhjq1vd1h.png" alt="more absurd"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;This is insane&lt;/strong&gt;, right? Imagine a platform bursting with millions of videos, yet &lt;strong&gt;never facing a space crunch&lt;/strong&gt;. And even if you try to counter it with &lt;strong&gt;cloud computing&lt;/strong&gt;, at the end of the day, it's just physical hardware or hard disks sitting somewhere in a data center in the name of the cloud :)&lt;br&gt;
 &lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Petabytes to Exabytes:
&lt;/h2&gt;

&lt;p&gt;YouTube operates at an unprecedented scale, storing &lt;strong&gt;petabytes&lt;/strong&gt; and &lt;strong&gt;exabytes&lt;/strong&gt; of video content to cater to its vast user base. To put this into perspective, &lt;strong&gt;a single petabyte is equivalent to one million gigabytes&lt;/strong&gt;, while &lt;strong&gt;an exabyte is one billion gigabytes&lt;/strong&gt;. Managing such immense volumes of data is insane🤯.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;So, the question arises: *&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What's the limit ?🤔&lt;/li&gt;
&lt;li&gt;How do they never lose anything?&lt;/li&gt;
&lt;li&gt;How can any data be accessed instantly for anywhere in the world?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Let's&lt;/strong&gt; delve into a deeper, more fascinating story behind YouTube's seemingly infinite storage capabilities.&lt;/p&gt;

&lt;p&gt;And, &lt;strong&gt;don't worry&lt;/strong&gt;, I'm not gonna fool you into cloud computing XD&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond the Cloud
&lt;/h2&gt;

&lt;p&gt;Well, it does make sense when the maximum quality used to be 720p, but now most videos need to be stored in 4K. They must have developed &lt;strong&gt;some special&lt;/strong&gt; compression algorithms or &lt;strong&gt;methods&lt;/strong&gt; to minimize the size.  &lt;/p&gt;

&lt;p&gt;If they were to rely &lt;strong&gt;solely on cloud storage&lt;/strong&gt;, it would require enormous space and be costly, regardless of the company's size, especially considering that anyone can upload vast amounts of data for free.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  First take: Compression Magic
&lt;/h2&gt;

&lt;p&gt;The only reasonable explanation includes &lt;strong&gt;data compression&lt;/strong&gt; or some compression algorithm. Videos are compressed before storage using cutting-edge codecs, like &lt;strong&gt;VP9, H.264, H.265 (HEVC)&lt;/strong&gt; and &lt;strong&gt;AV1&lt;/strong&gt;. This reduces file size by up to &lt;strong&gt;50%&lt;/strong&gt;, significantly stretching storage capacity without compromising quality. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;However&lt;/strong&gt;, this must be done in a way that does not compromise quality at all. Nonetheless, with general compression, no matter how effective it is, there is still &lt;strong&gt;minimal loss&lt;/strong&gt; during compression to maintain performance and speed.&lt;/p&gt;

&lt;p&gt;This does sound like a Pied Piper's revolutionary compression algorithm from series "Silicon Valley" XD &lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5270nrt6ftol9e5kv5zb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5270nrt6ftol9e5kv5zb.png" alt="Pied piper"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
In addition, YouTube utilizes &lt;strong&gt;advanced transcoding&lt;/strong&gt; and &lt;strong&gt;optimization&lt;/strong&gt; techniques to encode uploaded videos into multiple formats and resolutions, catering to various devices and network conditions. &lt;strong&gt;Adaptive bitrate streaming&lt;/strong&gt; further enhances the user experience by dynamically adjusting video quality based on available bandwidth and device capabilities.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Second take: Storage Tiers
&lt;/h2&gt;

&lt;p&gt;Tiered Storage is one of the main factors as videos aren't stored in a &lt;strong&gt;monolithic cloud&lt;/strong&gt;. YouTube employs a tiered system, where frequently accessed content resides in &lt;strong&gt;high-performance, readily accessible storage&lt;/strong&gt; (think lightning-fast SSDs), while &lt;strong&gt;less-viewed videos&lt;/strong&gt; migrate to colder, more cost-effective tiers (like hard drives). This optimizes latency, performance and storage costs.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Third take: Content Lifecycle Management
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Content Assessment:&lt;/strong&gt; YouTube constantly analyzes videos to understand their popularity and engagement. Videos with low viewership or engagement are flagged for archival or removal, freeing up space for fresh content.&lt;br&gt;
(&lt;strong&gt;But&lt;/strong&gt; still there are tons of inactive accounts with all their old videos)&lt;br&gt;
&lt;br&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partner Programs:&lt;/strong&gt; YouTube offers monetization options for creators. Videos enrolled in such programs are typically retained longer due to their potential revenue generation. &lt;br&gt;
&lt;br&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technology Advancements:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Emerging Technologies&lt;/strong&gt;: YouTube actively explores cutting-edge technologies like &lt;a href="https://www.popularmechanics.com/technology/infrastructure/a29008852/dna-storage-future/" rel="noopener noreferrer"&gt;DNA storage&lt;/a&gt;, which offers exponentially denser storage compared to traditional methods. While still in its early stages, it holds vast potential for the future.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Moore's Law&lt;/strong&gt;: Storage capacity consistently increases, driven by advancements in hardware technology. This allows YouTube to accommodate growing video libraries while maintaining cost-effectiveness.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  What about availability?
&lt;/h2&gt;

&lt;p&gt;Well If you talk about just the availability of this huge data, then it is because of: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Global Network:&lt;/strong&gt; YouTube's storage infrastructure isn't confined to a single location. It's distributed across data centers worldwide, ensuring redundancy and resilience. If one data center experiences an outage, others can seamlessly take over, preventing service interruptions.&lt;/li&gt;
&lt;/ul&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content Replication:&lt;/strong&gt; Popular content is replicated across different data centers. This ensures it's readily available to viewers near them, minimizing latency and buffering issues.

&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's the available information?
&lt;/h2&gt;

&lt;p&gt;Google uses &lt;strong&gt;Google File System (GFS)&lt;/strong&gt; and &lt;strong&gt;BigTable&lt;/strong&gt; to manage the large amount of data. They have millions of disks that are in a &lt;strong&gt;RAID&lt;/strong&gt; configuration across multiple data centers. I found an answer on twitter from 'TechWelthEngine' that sounds plausible. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"At 4.3 petabytes a day, it takes just over 232 days to get to an exabyte. If we assume that they have 15 EB of storage, then that means it'll take them 9.5 years to fill it all at this pace."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;But&lt;/strong&gt; if this is true then do they have to built a &lt;strong&gt;new 15Eb&lt;/strong&gt; facility &lt;strong&gt;every 9.5 years&lt;/strong&gt;?&lt;br&gt;
I am not really sure. May be they will just &lt;strong&gt;dedupe&lt;/strong&gt; any redundant data?&lt;br&gt;
And don't forget the fact that the &lt;strong&gt;4.3 petabytes&lt;/strong&gt; a day will &lt;strong&gt;increase&lt;/strong&gt; over the coming years specially with a huge number of videos are being created and narrated by &lt;strong&gt;AI&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And if they are really just constantly upgrading their servers(which obviously they are not) then it explains why we have to watch 2 ads, then 1.5 minutes of the actual video, then 2 ads, then 3 minutes, then the process repeats :)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I believe there must be a way because they can't keep building server farms forever and ever....&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I tried to contact&lt;/strong&gt; YouTube and some senior developers at YouTube to get a more clear view on this, but so far there has been &lt;strong&gt;no response&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hence&lt;/strong&gt;, the question remains unanswered just &lt;strong&gt;how long can YouTube hold onto our data in the cloud&lt;/strong&gt;, &lt;strong&gt;what are the YouTube's archival processes?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What do you think about it? Do let me know in the comments.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inspired by twitter/X talks with Ben Weddle.&lt;/p&gt;

&lt;p&gt;If you enjoyed this blog, you can follow me on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ujjawaltyagii" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/ujjawal_tyagiii" rel="noopener noreferrer"&gt;Twitter/X&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/ujjawal-tyagi/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you'd like to support me, you can &lt;a href="https://github.com/sponsors/ujjawaltyagii" rel="noopener noreferrer"&gt;sponsor me on GitHub&lt;/a&gt; or &lt;a href="https://www.buymeacoffee.com/codewithuj" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>discuss</category>
      <category>cloudstorage</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>Unmasking LinkedIn's Connection Logic 🤯</title>
      <dc:creator>Ujjawal Tyagi</dc:creator>
      <pubDate>Wed, 14 Feb 2024 14:45:19 +0000</pubDate>
      <link>https://dev.to/ujjawaltyagi/unmasking-linkedins-connection-logic-29ff</link>
      <guid>https://dev.to/ujjawaltyagi/unmasking-linkedins-connection-logic-29ff</guid>
      <description>&lt;p&gt;Have you ever wondered how &lt;a href="https://www.linkedin.com/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; knows you and your potential connections are just a few clicks away? It's neither magic, nor some clever engineering hidden behind the scenes. What if I tell you that it's just &lt;strong&gt;&lt;a href="https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/" rel="noopener noreferrer"&gt;Graph&lt;/a&gt;&lt;/strong&gt;? &lt;strong&gt;Yes It is&lt;/strong&gt;!&lt;br&gt;
So let's dive into the fascinating world of &lt;strong&gt;graph algorithms&lt;/strong&gt; and see how LinkedIn connects you to your professional network!&lt;br&gt;
And don't worry It's not going to be too complex.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond 1st, 2nd, and 3rd Degrees:
&lt;/h2&gt;

&lt;p&gt;We all know those little icons next to LinkedIn profiles indicating our connection level (1st, 2nd, or 3rd degree). But how does LinkedIn calculate these connections? It all starts with a powerful tool called a &lt;strong&gt;graph algorithm&lt;/strong&gt;. Imagine a giant map where people are represented as dots and connections are lines. This complex map, known as a graph, stores information about who's connected to whom.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxueqtg1y6htcgyyh5tc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxueqtg1y6htcgyyh5tc2.png" alt="1st, 2nd, 3rd connections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How is it possible with millions of users and connections?
&lt;/h2&gt;

&lt;p&gt;The challenge of scaling is real as navigating this map efficiently becomes a challenge. To overcome this, LinkedIn uses a special type of graph algorithm called &lt;strong&gt;bi-directional BFS&lt;/strong&gt; &lt;strong&gt;(Breadth-First Search)&lt;/strong&gt;. This algorithm simultaneously searches from you and your potential connection, meeting somewhere in the middle to determine your connection level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3iq5ay6mexz65rc3a7k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3iq5ay6mexz65rc3a7k.png" alt="connections"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;But wait&lt;/strong&gt;, there's more! As you interact and your network grows, constantly searching the entire graph becomes impractical. So, LinkedIn employs a clever caching strategy. It stores your &lt;strong&gt;second-degree connections&lt;/strong&gt; (friends of your friends) locally, allowing for &lt;strong&gt;faster lookups&lt;/strong&gt; without needing to traverse the entire network every time.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling to Millions:
&lt;/h2&gt;

&lt;p&gt;Imagine storing everyone's second-degree connections on a single server! Not feasible. That's why LinkedIn distributes this data across multiple servers, dividing it based on user IDs. &lt;strong&gt;But what happens if a server crashes?&lt;/strong&gt; To ensure redundancy, each shard (portion of data) is replicated on different servers.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real twist: Speed vs Efficiency
&lt;/h2&gt;

&lt;p&gt;Now comes the real twist. While replicating data ensures availability, it also adds complexity. To avoid hitting every server for each query, LinkedIn uses a technique called &lt;strong&gt;set cover&lt;/strong&gt;. This fancy term basically means finding the smallest number of servers that hold all the information needed for your query, minimizing the number of hops and maximizing speed.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Secret Sauce: Greedy Set Cover:
&lt;/h2&gt;

&lt;p&gt;LinkedIn uses a modified version of the greedy set cover algorithm, prioritizing servers that hold connections most relevant to your search. Think of it like finding the shortest route on a map by visiting only the essential points. This clever approach reduces the number of servers needed, making queries faster and more efficient.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The End Result: A Connected us!
&lt;/h2&gt;

&lt;p&gt;Thanks to these complex algorithms and clever caching strategies, LinkedIn can efficiently navigate its massive network and show you relevant connections within milliseconds. So, the next time you see those degree icons, remember the invisible technology working tirelessly to connect you with your professional world!&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  And for the tech-savvy:
&lt;/h2&gt;

&lt;p&gt;A question: &lt;strong&gt;Does Facebook's 'Friends of friends' feature work the same way?&lt;/strong&gt;&lt;br&gt;
Or do they use TAO(Memcached) or something else?&lt;br&gt;
What do you think about it?&lt;/p&gt;

&lt;p&gt;If you're curious about the nitty-gritty, the &lt;a href="https://engineering.linkedin.com/real-time-distributed-graph/using-set-cover-algorithm-optimize-query-latency-large-scale-distributed" rel="noopener noreferrer"&gt;research paper&lt;/a&gt; linked in the original content delves deeper into the specific algorithms and optimizations used by LinkedIn. &lt;/p&gt;

&lt;p&gt;But for everyone else, hopefully, this blog has shed some light on the magic behind those connection degrees!&lt;br&gt;
There still might be things I am missing on so do let me know in comments.&lt;/p&gt;

&lt;p&gt;Inspired by various online discussions and Gaurav Sen.&lt;/p&gt;

&lt;p&gt;If you enjoyed this blog, you can follow me on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ujjawaltyagii" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/ujjawal_tyagiii" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/ujjawal-tyagi/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;It's Valentine Day and I'm not feeling lonely because my keyboard is definitely getting touched tonight XD&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

#SingleCodersDay


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you'd like to support me, you can &lt;a href="https://github.com/sponsors/ujjawaltyagii" rel="noopener noreferrer"&gt;sponsor me on GitHub&lt;/a&gt; or &lt;a href="https://www.buymeacoffee.com/codewithuj" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>algorithms</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Docker alternative Podman on rise 🚀: The Future of DevOps?</title>
      <dc:creator>Ujjawal Tyagi</dc:creator>
      <pubDate>Mon, 05 Feb 2024 22:33:43 +0000</pubDate>
      <link>https://dev.to/ujjawaltyagi/docker-alternative-podman-on-rise-the-future-of-devops-31i2</link>
      <guid>https://dev.to/ujjawaltyagi/docker-alternative-podman-on-rise-the-future-of-devops-31i2</guid>
      <description>&lt;p&gt;As a developer, I've long relied on &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; for its robust features and ease of use. However, with the emergence of &lt;a href="https://podman.io/" rel="noopener noreferrer"&gt;Podman&lt;/a&gt;, a new player in the containerization arena, the landscape is shifting. In this article, I'll delve into my experiences with both Docker and Podman, highlighting their key differences, advantages, and potential impact on the future of DevOps.&lt;br&gt;
Sit tight &amp;amp; explore!&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's wrong with Docker?
&lt;/h2&gt;

&lt;p&gt;For years, Docker has been my go-to tool for containerizing applications. Its intuitive interface, extensive community support, and seamless integration with orchestration tools like Kubernetes have made it an indispensable part of my workflow. From developing microservices to deploying scalable applications, Docker has been my trusted companion.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbxzq1gld1ni02ihbw0f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbxzq1gld1ni02ihbw0f.png" alt="docker pic"&gt;&lt;/a&gt;&lt;br&gt;
*&lt;em&gt;But is it enough? *&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What about:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the root access you need everytime?&lt;/li&gt;
&lt;li&gt;the dependence on Daemon engine?&lt;/li&gt;
&lt;li&gt;Surface for attackers and vulnerabilities?&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;



&lt;h2&gt;
  
  
  Podman: The New Kid on the Block
&lt;/h2&gt;

&lt;p&gt;A container engine that promises a fresh perspective on containerization. Initially, I was skeptical about Podman's capabilities, especially given its lack of a dedicated desktop application and limited compatibility with orchestration tools. &lt;strong&gt;However&lt;/strong&gt;, recent updates have transformed my perception.&lt;/p&gt;

&lt;p&gt;Despite my allegiance to Docker, I couldn't ignore the buzz surrounding Podman. Intrigued by its security features, lightweight architecture, and rootless operation, I decided to give it a try. To my surprise, &lt;strong&gt;transitioning from Docker to Podman&lt;/strong&gt; was smoother than expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vlfqejbabis4oak9mca.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vlfqejbabis4oak9mca.jpg" alt="docker vs podman"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is this transition worth?
&lt;/h2&gt;

&lt;p&gt;Well, you don't have to worry about that as 99% of docker commands are same in Podman, all the docker or other container management tools are compatible with Podman Engine.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Matters:
&lt;/h2&gt;

&lt;p&gt;One of Podman's standout features is its enhanced security model. Unlike Docker, which requires root access for container management, Podman operates in a rootless fashion, significantly reducing the attack surface and minimizing security risks. For organizations prioritizing security, Podman offers a compelling alternative.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploring Pods: Kubernetes Integration
&lt;/h2&gt;

&lt;p&gt;While Docker's orchestration capabilities have long been lauded, Podman introduces a new concept: Pods. Similar to Kubernetes, Podman Pods allow multiple containers to share the same network, volume, and even Port mapping simplifying complex deployments and enabling seamless scaling.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future of Containerization
&lt;/h2&gt;

&lt;p&gt;Whether you are team Docker or team Podman, one thing becomes clear: adaptability is key. Whether you're a seasoned Docker enthusiast or a curious newcomer, exploring the possibilities offered by Podman can lead to new insights, improved workflows, and enhanced security. In the dynamic world of DevOps, embracing innovation is not just a choice—it's a necessity.&lt;br&gt;
&lt;br&gt;&lt;br&gt;
Now, you've got everything you need to start your Podman journey.&lt;br&gt;
There are more things to cover but that's a story for another time!&lt;/p&gt;

&lt;p&gt;I'll be back with more technicalities about Podman till then see yaa....&lt;br&gt;
&lt;/p&gt;
If you enjoyed this blog, you can follow me on:

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/ujjawaltyagii" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://twitter.com/ujjawal_tyagiii" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/in/ujjawal-tyagi/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you'd like to support me, you can &lt;a href="https://github.com/sponsors/ujjawaltyagii" rel="noopener noreferrer"&gt;sponsor me on GitHub&lt;/a&gt; or &lt;a href="https://www.buymeacoffee.com/codewithuj" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>opensource</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
