<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Siddhant Sharma</title>
    <description>The latest articles on DEV Community by Siddhant Sharma (@technicaldost).</description>
    <link>https://dev.to/technicaldost</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2313992%2Ff655e3ad-4254-4a02-a8c4-418281ebc83c.jpg</url>
      <title>DEV Community: Siddhant Sharma</title>
      <link>https://dev.to/technicaldost</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/technicaldost"/>
    <language>en</language>
    <item>
      <title>How to automatically monitor new ML research papers on Arxiv by keyword</title>
      <dc:creator>Siddhant Sharma</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:20:12 +0000</pubDate>
      <link>https://dev.to/technicaldost/how-to-automatically-monitor-new-ml-research-papers-on-arxiv-by-keyword-1ll8</link>
      <guid>https://dev.to/technicaldost/how-to-automatically-monitor-new-ml-research-papers-on-arxiv-by-keyword-1ll8</guid>
      <description>&lt;h2&gt;
  
  
  Staying on Top of ML Research
&lt;/h2&gt;

&lt;p&gt;With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Automation
&lt;/h2&gt;

&lt;p&gt;I built an Arxiv scraper on Apify that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Keyword search&lt;/strong&gt;: Define the topics you care about (e.g., "diffusion models", "LLM alignment", "RLHF")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduled runs&lt;/strong&gt;: Set it to check daily or hourly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured output&lt;/strong&gt;: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy integration&lt;/strong&gt;: JSON output works with any webhook, Slack bot, or Notion database&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Example use: Slack Bot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# Run the scraper
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keywords&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diffusion models&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Post to Slack
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;paper&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_SLACK_WEBHOOK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*New paper*: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;paper&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Researchers and engineers waste hours browsing Arxiv. An automated pipeline means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero missed papers in your niche&lt;/li&gt;
&lt;li&gt;Daily digest delivered to your preferred platform&lt;/li&gt;
&lt;li&gt;Easy collaboration with teams (shared paper feeds)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Try it on the &lt;a href="https://apify.com/technicaldost/arxiv-paper-scraper" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt; — free tier available.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>research</category>
      <category>automation</category>
    </item>
    <item>
      <title>How to automatically detect any company's tech stack and logo from just their domain name</title>
      <dc:creator>Siddhant Sharma</dc:creator>
      <pubDate>Wed, 24 Jun 2026 16:04:21 +0000</pubDate>
      <link>https://dev.to/technicaldost/how-to-automatically-detect-any-companys-tech-stack-and-logo-from-just-their-domain-name-25f5</link>
      <guid>https://dev.to/technicaldost/how-to-automatically-detect-any-companys-tech-stack-and-logo-from-just-their-domain-name-25f5</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When you're doing sales prospecting, competitor research, or lead generation, one of the most tedious tasks is manually visiting each company's website to figure out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What tech stack do they use?&lt;/li&gt;
&lt;li&gt;What's their company logo for your CRM?&lt;/li&gt;
&lt;li&gt;Where are they on social media?&lt;/li&gt;
&lt;li&gt;How do I contact them?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;I built a simple API that takes a domain name and returns all this data automatically. Let me walk you through how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: A company domain name (e.g., &lt;code&gt;example.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process&lt;/strong&gt;: The scraper visits the website, analyzes its HTML, headers, and scripts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Tech stack detection, company logo URL, social profiles, contact info, and industry classification&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Tech Stack Detection
&lt;/h3&gt;

&lt;p&gt;The API uses pattern matching against 50+ known indicators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Framework-specific meta tags and script patterns&lt;/li&gt;
&lt;li&gt;CDN and hosting headers&lt;/li&gt;
&lt;li&gt;Analytics and tracking scripts&lt;/li&gt;
&lt;li&gt;CMS signatures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Logo Extraction
&lt;/h3&gt;

&lt;p&gt;Multiple fallback strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Graph image tags&lt;/li&gt;
&lt;li&gt;Apple touch icons&lt;/li&gt;
&lt;li&gt;JSON-LD structured data&lt;/li&gt;
&lt;li&gt;Clearbit API fallback&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  API Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.apify.com/v2/acts/technicaldost~company-intelligence-api/run-sync&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domains&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;techStack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;As someone who builds web scrapers and automation tools, I found myself repeatedly writing the same domain-analysis code for different projects. This API consolidates all that into one endpoint.&lt;/p&gt;

&lt;p&gt;Check it out on the &lt;a href="https://apify.com/technicaldost/company-intelligence-api" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt; — free tier available with 1000 results/month.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>api</category>
      <category>scraping</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
