<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sumit Sapkota</title>
    <description>The latest articles on DEV Community by Sumit Sapkota (@sumit_sapkota_ba51d1ae503).</description>
    <link>https://dev.to/sumit_sapkota_ba51d1ae503</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3694712%2F3d31c211-0f1b-4a4a-bfa3-6a94ab346345.jpg</url>
      <title>DEV Community: Sumit Sapkota</title>
      <link>https://dev.to/sumit_sapkota_ba51d1ae503</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sumit_sapkota_ba51d1ae503"/>
    <language>en</language>
    <item>
      <title>Building a Roboflow Universe Search Agent: Automating ML Model Discovery</title>
      <dc:creator>Sumit Sapkota</dc:creator>
      <pubDate>Mon, 05 Jan 2026 16:43:46 +0000</pubDate>
      <link>https://dev.to/sumit_sapkota_ba51d1ae503/building-a-roboflow-universe-search-agent-automating-ml-model-discovery-2hj1</link>
      <guid>https://dev.to/sumit_sapkota_ba51d1ae503/building-a-roboflow-universe-search-agent-automating-ml-model-discovery-2hj1</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;As a machine learning enthusiast, I often find myself browsing &lt;a href="https://universe.roboflow.com" rel="noopener noreferrer"&gt;Roboflow Universe&lt;/a&gt; looking for pre-trained models. But manually searching, clicking through pages, and copying API endpoints is tedious. I wanted a way to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for models by keywords&lt;/li&gt;
&lt;li&gt;Extract detailed information (metrics, classes, API endpoints)&lt;/li&gt;
&lt;li&gt;Get structured data I could use programmatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built a Python web scraper that does exactly that! 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Roboflow Universe Search Agent&lt;/strong&gt; is a Python tool that:&lt;/p&gt;

&lt;p&gt;✅ Searches Roboflow Universe with custom keywords&lt;br&gt;&lt;br&gt;
✅ Extracts model details (title, author, metrics, classes)&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Finds API endpoints using multiple extraction strategies&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
✅ Outputs structured JSON data&lt;br&gt;&lt;br&gt;
✅ Handles retries and errors gracefully  &lt;/p&gt;
&lt;h2&gt;
  
  
  The Challenge: Finding API Endpoints
&lt;/h2&gt;

&lt;p&gt;The trickiest part was reliably extracting API endpoints. Roboflow displays them in various places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JavaScript code snippets&lt;/li&gt;
&lt;li&gt;Model ID variables&lt;/li&gt;
&lt;li&gt;Input fields&lt;/li&gt;
&lt;li&gt;Page text&lt;/li&gt;
&lt;li&gt;Legacy endpoint formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I needed a robust solution that wouldn't break if the website structure changed.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Solution: Multi-Strategy Extraction
&lt;/h2&gt;

&lt;p&gt;Instead of relying on a single method, I implemented &lt;strong&gt;6 different extraction strategies&lt;/strong&gt; with fallbacks:&lt;/p&gt;
&lt;h3&gt;
  
  
  Strategy 1: JavaScript Code Blocks
&lt;/h3&gt;

&lt;p&gt;The most reliable source - API endpoints appear in code snippets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;js_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url:\s*[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']https://serverless\.roboflow\.com/([^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\'?\s]+)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="s"&gt;https://serverless\.roboflow\.com/([^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\'?\s]+)&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://serverless\.roboflow\.com/([a-z0-9\-_]+/\d+)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strategy 2: Model ID Patterns
&lt;/h3&gt;

&lt;p&gt;Extract from JavaScript variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model_id_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_id[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']?\s*[:=]\s*[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']([a-z0-9\-_]+/\d+)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MODEL_ENDPOINT[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']?\s*[:=]\s*[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']([a-z0-9\-_]+/\d+)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\']&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strategy 3: Input Fields &amp;amp; Textareas
&lt;/h3&gt;

&lt;p&gt;Check form elements and code blocks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;input_selectors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input[value*=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;serverless.roboflow.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textarea&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Strategy 4: Page Text Search
&lt;/h3&gt;

&lt;p&gt;Fallback to visible text on the page&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategy 5: Legacy Endpoints
&lt;/h3&gt;

&lt;p&gt;Support older endpoint formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;detect.roboflow.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;classify.roboflow.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;segment.roboflow.com&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Strategy 6: URL Construction
&lt;/h3&gt;

&lt;p&gt;Build endpoint from page URL structure if all else fails&lt;/p&gt;

&lt;p&gt;This multi-strategy approach ensures we find the API endpoint even if the page structure changes!&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt;: Browser automation (more reliable than requests for dynamic content)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.7+&lt;/strong&gt;: Core language&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regex&lt;/strong&gt;: Pattern matching for extraction&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Search for basketball detection models&lt;/span&gt;
&lt;span class="nv"&gt;SEARCH_KEYWORDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"basketball model object detection"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;MAX_PROJECTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5 &lt;span class="se"&gt;\&lt;/span&gt;
python roboflow_search_agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JSON Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get structured JSON output&lt;/span&gt;
&lt;span class="nv"&gt;SEARCH_KEYWORDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"soccer ball instance segmentation"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;OUTPUT_JSON&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
python roboflow_search_agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"project_title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Basketball Detection"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://universe.roboflow.com/workspace/basketball-detection"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"John Doe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"project_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Object Detection"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"has_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mAP"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"85.2%"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"precision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"87.1%"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"recall"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"83.5%"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"training_images"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"classes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"basketball"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"player"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"api_endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://serverless.roboflow.com/basketball-detection/1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model_identifier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"workspace/basketball-detection"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Intelligent Search
&lt;/h3&gt;

&lt;p&gt;The tool applies the "Has a Model" filter automatically and handles keyword prioritization.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Comprehensive Data Extraction
&lt;/h3&gt;

&lt;p&gt;Extracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance metrics (mAP@50, Precision, Recall)&lt;/li&gt;
&lt;li&gt;Training data info (image count, classes)&lt;/li&gt;
&lt;li&gt;Project metadata (author, update time, tags)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API endpoints&lt;/strong&gt; (the hard part!)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Robust Error Handling
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatic retries (3 attempts)&lt;/li&gt;
&lt;li&gt;Graceful failure handling&lt;/li&gt;
&lt;li&gt;Timeout management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Flexible Output
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Human-readable console output&lt;/li&gt;
&lt;li&gt;JSON format for programmatic use&lt;/li&gt;
&lt;li&gt;Configurable via environment variables&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technical Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Browser Automation with Playwright
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect_browser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;playwright&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--no-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--disable-setuid-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;viewport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1440&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;playwright&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Smart Scrolling
&lt;/h3&gt;

&lt;p&gt;Instead of fixed waits, the scraper detects when content stops loading:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scroll_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_scrolls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;last_height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_scrolls&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;window.scrollBy(0, window.innerHeight)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;new_height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document.body.scrollHeight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;new_height&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;last_height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;last_height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_height&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multiple Strategies &amp;gt; Single Strategy&lt;/strong&gt;: Having fallbacks makes the scraper much more reliable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright &amp;gt; Requests&lt;/strong&gt;: For dynamic sites, browser automation is essential&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern Matching&lt;/strong&gt;: Regex patterns need careful testing with real data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Handling&lt;/strong&gt;: Web scraping is fragile - always have retry logic&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Research&lt;/strong&gt;: Quickly find models for specific tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Discovery&lt;/strong&gt;: Extract endpoints for integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Comparison&lt;/strong&gt;: Compare metrics across multiple models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: Integrate into ML pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repository&lt;/span&gt;
git clone https:https://github.com/SumitS10/Roboflow-.git
&lt;span class="nb"&gt;cd &lt;/span&gt;roboflow

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Install Playwright browsers&lt;/span&gt;
playwright &lt;span class="nb"&gt;install &lt;/span&gt;chromium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Future Improvements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Add filtering by metrics (e.g., mAP &amp;gt; 80%)&lt;/li&gt;
&lt;li&gt;[ ] Support for batch processing multiple searches&lt;/li&gt;
&lt;li&gt;[ ] Export to CSV/Excel&lt;/li&gt;
&lt;li&gt;[ ] Add model comparison features&lt;/li&gt;
&lt;li&gt;[ ] Cache results to avoid re-scraping&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building this scraper taught me a lot about web scraping, browser automation, and handling edge cases. The multi-strategy approach for API extraction was key to making it reliable.&lt;/p&gt;

&lt;p&gt;If you're working with Roboflow models or need to automate model discovery, give it a try! Contributions and feedback are welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/SumitS10/Roboflow-.git" rel="noopener noreferrer"&gt;https://github.com/SumitS10/Roboflow-.git&lt;/a&gt;&lt;br&gt;
🌐 &lt;strong&gt;Roboflow Universe&lt;/strong&gt;: &lt;a href="https://universe.roboflow.com" rel="noopener noreferrer"&gt;universe.roboflow.com&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;: &lt;code&gt;#python&lt;/code&gt; &lt;code&gt;#webscraping&lt;/code&gt; &lt;code&gt;#machinelearning&lt;/code&gt; &lt;code&gt;#roboflow&lt;/code&gt; &lt;code&gt;#playwright&lt;/code&gt; &lt;code&gt;#automation&lt;/code&gt; &lt;code&gt;#api&lt;/code&gt; &lt;code&gt;#ml&lt;/code&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
