<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 👨🏻‍💻</title>
    <description>The latest articles on DEV Community by 👨🏻‍💻 (@decipher).</description>
    <link>https://dev.to/decipher</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1383391%2Fada855b7-95c6-4014-959b-4997adbc6d33.png</url>
      <title>DEV Community: 👨🏻‍💻</title>
      <link>https://dev.to/decipher</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/decipher"/>
    <language>en</language>
    <item>
      <title>Facilitating Real-Time Competitive Analysis</title>
      <dc:creator>👨🏻‍💻</dc:creator>
      <pubDate>Tue, 30 Apr 2024 15:28:53 +0000</pubDate>
      <link>https://dev.to/decipher/enabling-real-time-competitive-analysis-56d0</link>
      <guid>https://dev.to/decipher/enabling-real-time-competitive-analysis-56d0</guid>
      <description>&lt;p&gt;Upon joining the team, we had recently launched a digital financial portal to assist consumers in finding suitable financial and insurance products.&lt;/p&gt;

&lt;h2&gt;
  
  
  Issue Description
&lt;/h2&gt;

&lt;p&gt;One of the goals was to gain a better understanding of our offerings in comparison to those of our competitors. This required focusing on two main aspects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Comparing our offerings with those of the competitor sourced from the same vendors.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring changes in vendor offerings available through our competitor.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Existing Approach and Obstacles
&lt;/h2&gt;

&lt;p&gt;The existing method involved creating a few user profiles representing different potential user types. Twice a day, someone manually inputted these user details into both our platform and our competitor's platform, then compiled the results into a standardized Google Sheet for analysis by our analysts. However, this approach faced several challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limited Insights due to Fixed User Profiles&lt;/strong&gt;: The use of a limited number of user profiles restricted our insights to specific "fictitious" user personas, hindering our ability to gather comprehensive insights across a broader range of potential user segments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lack of Scalability&lt;/strong&gt;: The manual process was not scalable, making it hard to increase the frequency of data collection or expand the number of user profiles beyond a few. Consequently, our ability to capture real-time market dynamics and adapt to evolving user profiles was limited.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Resolution
&lt;/h2&gt;

&lt;p&gt;To overcome these challenges, I introduced the following system:&lt;/p&gt;

&lt;h3&gt;
  
  
  Utilizing Real User Profiles and Auto-Scaling Crawler
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv3601jphbyxfw53auapj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv3601jphbyxfw53auapj.png" alt="System Design" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of relying on limited, artificial user personas, I proposed utilizing pseudonymized real user data extracted from our portal. Additionally, implementing an auto-scaling crawler would fetch competitor offers in real-time, eliminating manual data entry and restrictions on data collection frequency. This dynamic approach ensured that our analysis reflected current market conditions and the competitive landscape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced Data Accuracy&lt;/strong&gt;: Leveraging real user data improved the relevance and accuracy of our analysis, enabling more informed decision-making.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Improved Scalability&lt;/strong&gt;: Automation allowed for efficient scaling of data collection efforts to accommodate a limitless number of user profiles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-Time Insights&lt;/strong&gt;: Real-time offer fetching provided immediate visibility into competitor strategies and market conditions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Obstacles
&lt;/h3&gt;

&lt;p&gt;The system faced two significant challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;IP Address Blocking&lt;/strong&gt;: The competitor's website blocked our Lambda workers' IP address range, disrupting the crawler's operation. To overcome this, we implemented rotating IP proxy servers to bypass the ban permanently.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dynamic Website Updates&lt;/strong&gt;: Crawler failures occurred when the competitor updated its website's elements or network request contracts. To address this, we established a monitoring system with a dead letter queue for swift identification and adaptation to changes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Despite challenges, we successfully crawled the competitor's portal with 95% of our user profiles, achieving an end-to-end latency of 90 seconds (99th percentile).&lt;/p&gt;

&lt;h3&gt;
  
  
  Technology Stack
&lt;/h3&gt;

&lt;p&gt;The solution incorporated the following components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://www.confluent.io/confluent-cloud/"&gt;&lt;strong&gt;Kafka&lt;/strong&gt;&lt;/a&gt;, for real-time communication.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.scala-lang.org/"&gt;&lt;strong&gt;Scala&lt;/strong&gt;&lt;/a&gt;, for powering our stream processor.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.cypress.io/"&gt;&lt;strong&gt;Cypress&lt;/strong&gt;&lt;/a&gt;, for web crawling.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://proxymesh.com/"&gt;&lt;strong&gt;Proxymesh&lt;/strong&gt;&lt;/a&gt;, for a rotating IP proxy service to bypass IP ban.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.docker.com/"&gt;&lt;strong&gt;Docker&lt;/strong&gt;&lt;/a&gt;, for containerization.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/lambda/"&gt;&lt;strong&gt;AWS Lambda&lt;/strong&gt;&lt;/a&gt;, for enabling serverless execution of the crawler.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.snowflake.com/en/"&gt;&lt;strong&gt;Snowflake&lt;/strong&gt;&lt;/a&gt;, for data warehousing.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.getdbt.com/"&gt;&lt;strong&gt;dbt&lt;/strong&gt;&lt;/a&gt;, for automating data transformation pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://argoproj.github.io/workflows/"&gt;&lt;strong&gt;Argo Workflows&lt;/strong&gt;&lt;/a&gt;, for orchestrating DBT jobs.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/looker/docs"&gt;&lt;strong&gt;Looker&lt;/strong&gt;&lt;/a&gt;, for business intelligence and data visualization.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>systemdesign</category>
      <category>kafka</category>
      <category>dataengineering</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
