<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mohammad Ehsan Ansari</title>
    <description>The latest articles on DEV Community by Mohammad Ehsan Ansari (@mohammad_ehsanansari_671).</description>
    <link>https://dev.to/mohammad_ehsanansari_671</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3067582%2Fd2c3c12b-ade4-438e-bbe6-a1b7207b7cdc.jpg</url>
      <title>DEV Community: Mohammad Ehsan Ansari</title>
      <link>https://dev.to/mohammad_ehsanansari_671</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mohammad_ehsanansari_671"/>
    <language>en</language>
    <item>
      <title>I Tried Cleaning 5,000+ Rows of Messy Excel Data… and Realized the Problem Is Bigger Than I Thought</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Tue, 09 Dec 2025 06:39:00 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/i-tried-cleaning-5000-rows-of-messy-excel-data-and-realized-the-problem-is-bigger-than-i-thought-1lml</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/i-tried-cleaning-5000-rows-of-messy-excel-data-and-realized-the-problem-is-bigger-than-i-thought-1lml</guid>
      <description>&lt;p&gt;I’ve spent the last few months cleaning spreadsheets — not because I love it (I don’t), but because I was building something that required understanding &lt;em&gt;how bad real-world data actually is&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If you’ve ever worked with CSVs exported from different tools, you already know the pain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;male&lt;/code&gt;, &lt;code&gt;Male&lt;/code&gt;, &lt;code&gt;MALE&lt;/code&gt;, &lt;code&gt;m&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;₹20,000&lt;/code&gt;, &lt;code&gt;20k&lt;/code&gt;, &lt;code&gt;20000.00&lt;/code&gt;, &lt;code&gt;₹ 20,000/-&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Dates like &lt;code&gt;12/03/24&lt;/code&gt;, &lt;code&gt;2024-03-12&lt;/code&gt;, and &lt;code&gt;03-12-2024&lt;/code&gt; all in the same sheet
&lt;/li&gt;
&lt;li&gt;Category columns where “Electronics”, “Electronic”, “Elec.” all refer to the same thing
&lt;/li&gt;
&lt;li&gt;Rows where values literally belong to different columns
&lt;/li&gt;
&lt;li&gt;Two sheets that &lt;em&gt;should&lt;/em&gt; merge cleanly… but don’t&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is new. But when you try automating it, you realize something:&lt;/p&gt;

&lt;p&gt;Cleaning data isn’t “just clean the data.”&lt;br&gt;&lt;br&gt;
It’s a chain of dependent decisions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For context, I’m building a tool called &lt;strong&gt;RowTidy&lt;/strong&gt; — an AI-powered data cleaning system — mostly because of these discoveries.&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="https://rowtidy.com" rel="noopener noreferrer"&gt;https://rowtidy.com&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post is not promotional — it’s the story of what I learned while trying to make sense of messy spreadsheets.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 1. No two spreadsheets are ever the same
&lt;/h2&gt;

&lt;p&gt;Even inside the same organization, I’ve seen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Qty, quantity, QTY, QTY. , QTY ( )&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer Name&lt;/strong&gt;, &lt;strong&gt;Client Name&lt;/strong&gt;, &lt;strong&gt;Name&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Price&lt;/strong&gt;, &lt;strong&gt;Rate&lt;/strong&gt;, &lt;strong&gt;Cost&lt;/strong&gt;, &lt;strong&gt;Total Price&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spreadsheet headers evolve based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who created them
&lt;/li&gt;
&lt;li&gt;which department exported them
&lt;/li&gt;
&lt;li&gt;which software wrote them
&lt;/li&gt;
&lt;li&gt;how fast the person writing it was in a rush
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spreadsheets reflect human chaos.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧹 2. Hardcoded rules always break on the next file
&lt;/h2&gt;

&lt;p&gt;You can write rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;trim spaces
&lt;/li&gt;
&lt;li&gt;fix capitalization
&lt;/li&gt;
&lt;li&gt;remove symbols
&lt;/li&gt;
&lt;li&gt;normalize dates
&lt;/li&gt;
&lt;li&gt;normalize currency
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…but then you get a file that uses one of those “dirty” formats intentionally.&lt;br&gt;&lt;br&gt;
Or a system where &lt;code&gt;L&lt;/code&gt; means lakhs, but another team uses &lt;code&gt;k&lt;/code&gt;.&lt;br&gt;&lt;br&gt;
Or numerical fields that sometimes contain text — but intentionally.&lt;/p&gt;

&lt;p&gt;Static rules don’t work.&lt;br&gt;&lt;br&gt;
Dynamic rules do.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 3. LLMs help — but passing the &lt;em&gt;entire&lt;/em&gt; sheet is a disaster
&lt;/h2&gt;

&lt;p&gt;If you pass 5,000 rows:&lt;/p&gt;

&lt;p&gt;❌ hallucinations&lt;br&gt;&lt;br&gt;
❌ inconsistent corrections&lt;br&gt;&lt;br&gt;
❌ smashed formats&lt;br&gt;&lt;br&gt;
❌ some rows get cleaned differently than others  &lt;/p&gt;

&lt;p&gt;But if you pass:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20–30 representative rows
&lt;/li&gt;
&lt;li&gt;column types
&lt;/li&gt;
&lt;li&gt;sheet type (Sales Sheet / Product Catalog / Property Listing etc.)
&lt;/li&gt;
&lt;li&gt;patterns in the data
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs suddenly behave like extremely consistent rule generators.&lt;/p&gt;

&lt;p&gt;They say things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;“Column Price seems to contain currency in mixed formats. Normalize as plain float with 2 decimals.”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;“Column Category contains variants. Canonicalize based on semantic similarity.”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Column BHK in this dataset is numeric but appears in words. Convert consistently.”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you take those rules and deterministically apply them across the whole dataset.&lt;/p&gt;

&lt;p&gt;This works far better than the naïve “ask LLM to clean the whole sheet.”&lt;/p&gt;




&lt;h2&gt;
  
  
  🔄 4. The hardest part: conflicts &amp;amp; dependencies
&lt;/h2&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Gender&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ehsan&lt;/td&gt;
&lt;td&gt;Male&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ehsan&lt;/td&gt;
&lt;td&gt;Female&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Which one is correct?&lt;/p&gt;

&lt;p&gt;Or:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mouse&lt;/td&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mouse&lt;/td&gt;
&lt;td&gt;Peripheral&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Or dates like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;02/03/24&lt;/code&gt; (DD/MM/YY?)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;02/03/24&lt;/code&gt; (MM/DD/YY?)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only reliable way I found:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detect potential conflicts
&lt;/li&gt;
&lt;li&gt;Ask AI to &lt;strong&gt;propose resolutions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Let the user &lt;em&gt;approve or discard&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;After approval, future sheets auto-resolve the same conflict&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Human-in-the-loop.&lt;br&gt;&lt;br&gt;
But only when necessary.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 5. Trying to merge two sheets is where everything truly breaks
&lt;/h2&gt;

&lt;p&gt;A friend shared this experience:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“We receive a sheet every hour with 3–4K rows.&lt;br&gt;&lt;br&gt;
It looks similar… but merging it with our master sheet always breaks.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because two seemingly identical sheets may have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;different casing
&lt;/li&gt;
&lt;li&gt;different punctuation
&lt;/li&gt;
&lt;li&gt;different abbreviations
&lt;/li&gt;
&lt;li&gt;different spacing
&lt;/li&gt;
&lt;li&gt;different numeric formats
&lt;/li&gt;
&lt;li&gt;different canonical values
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data is “the same” — but not &lt;em&gt;identical&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is exactly the kind of problem that made me build RowTidy:&lt;br&gt;
&lt;a href="https://rowtidy.com" rel="noopener noreferrer"&gt;https://rowtidy.com&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 What I learned
&lt;/h2&gt;

&lt;p&gt;Data cleaning is not about formulas.&lt;br&gt;&lt;br&gt;
It’s not about regex.&lt;br&gt;&lt;br&gt;
It’s not even about machine learning.&lt;/p&gt;

&lt;p&gt;It’s about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understanding intent
&lt;/li&gt;
&lt;li&gt;catching inconsistency
&lt;/li&gt;
&lt;li&gt;normalizing semantics
&lt;/li&gt;
&lt;li&gt;detecting dependencies
&lt;/li&gt;
&lt;li&gt;handling edge cases without destroying valid data
&lt;/li&gt;
&lt;li&gt;doing all this without annoying the user
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The more spreadsheets I cleaned, the more I realized:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The problem isn’t that data is messy.&lt;br&gt;&lt;br&gt;
The problem is that every dataset is messy &lt;em&gt;in a different way&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that’s exactly why automation is both difficult and meaningful.&lt;/p&gt;




&lt;p&gt;If you deal with messy CSVs, duplications, conflicting columns, or imports that constantly break — I’d love to hear about your workflow.&lt;/p&gt;

&lt;p&gt;Also, if you're curious, you can check what I’m building here:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://rowtidy.com" rel="noopener noreferrer"&gt;https://rowtidy.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>saas</category>
      <category>msexcel</category>
      <category>googlesheet</category>
      <category>csv</category>
    </item>
    <item>
      <title>What is Ctrl+H in Excel? Master Find &amp; Replace for Data Cleaning</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Sun, 16 Nov 2025 12:05:40 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/what-is-ctrlh-in-excel-master-find-replace-for-data-cleaning-37mm</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/what-is-ctrlh-in-excel-master-find-replace-for-data-cleaning-37mm</guid>
      <description>&lt;p&gt;Ctrl+H opens Excel’s Find and Replace dialog — a commonly used tool for locating and updating values in bulk. It is helpful for tasks like correcting text, formatting, or cleaning large datasets quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Ctrl+H Does
&lt;/h2&gt;

&lt;p&gt;By pressing Ctrl+H, you can:&lt;br&gt;
    • Search for specific text, characters, or numbers&lt;br&gt;
    • Replace values individually or in bulk&lt;br&gt;
    • Search inside formulas, values, or comments&lt;br&gt;
    • Use wildcards to match patterns&lt;br&gt;
    • Work within a selected range, sheet, or full workbook&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Usage
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.  Press Ctrl+H
2.  Enter text under Find what
3.  Enter text under Replace with
4.  Use Replace or Replace All
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Example:&lt;br&gt;
Replace "USA" → "United States" across a sheet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Match Case&lt;/td&gt;
&lt;td&gt;Finds text with same capitalization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Match Entire Cell Contents&lt;/td&gt;
&lt;td&gt;Replaces only when full cell matches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search Within&lt;/td&gt;
&lt;td&gt;Selection / Sheet / Workbook&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Look In&lt;/td&gt;
&lt;td&gt;Formulas / Values / Comments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Common Data Cleaning Examples
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Find&lt;/th&gt;
&lt;th&gt;Replace&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Remove extra spaces&lt;/td&gt;
&lt;td&gt;"  "&lt;/td&gt;
&lt;td&gt;" "&lt;/td&gt;
&lt;td&gt;Run multiple times until no changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standardize abbreviations&lt;/td&gt;
&lt;td&gt;"St."&lt;/td&gt;
&lt;td&gt;"Street"&lt;/td&gt;
&lt;td&gt;Useful for addresses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Format dates&lt;/td&gt;
&lt;td&gt;"/"&lt;/td&gt;
&lt;td&gt;"-"&lt;/td&gt;
&lt;td&gt;Changes separator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove characters&lt;/td&gt;
&lt;td&gt;"@"&lt;/td&gt;
&lt;td&gt;&lt;em&gt;empty&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Deletes unwanted symbols&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove commas in numbers&lt;/td&gt;
&lt;td&gt;","&lt;/td&gt;
&lt;td&gt;""&lt;/td&gt;
&lt;td&gt;Useful before numeric conversion&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Wildcards Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Wildcard&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Example Input&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;Matches any characters&lt;/td&gt;
&lt;td&gt;John *&lt;/td&gt;
&lt;td&gt;Replaces variable endings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;td&gt;Matches one character&lt;/td&gt;
&lt;td&gt;???&lt;/td&gt;
&lt;td&gt;For fixed length patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;~&lt;/td&gt;
&lt;td&gt;Escape character&lt;/td&gt;
&lt;td&gt;~*&lt;/td&gt;
&lt;td&gt;Finds literal asterisk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Sample Cleaning Workflow
&lt;/h2&gt;

&lt;p&gt;Step 1 — Reduce Extra Spaces&lt;/p&gt;

&lt;p&gt;Repeat Ctrl+H using "  " → " " until no more replacements.&lt;/p&gt;

&lt;p&gt;Step 2 — Standardize Terms&lt;/p&gt;

&lt;p&gt;Use Match Case if needed (e.g., inc. → Inc.).&lt;/p&gt;

&lt;p&gt;Step 3 — Format Dates&lt;/p&gt;

&lt;p&gt;Find / → Replace with - (Look in: Values).&lt;/p&gt;

&lt;h2&gt;
  
  
  Shortcuts Inside the Dialog
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Shortcut&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Open Find &amp;amp; Replace&lt;/td&gt;
&lt;td&gt;Ctrl+H&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Find Next&lt;/td&gt;
&lt;td&gt;Alt+F&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replace&lt;/td&gt;
&lt;td&gt;Alt+R&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replace All&lt;/td&gt;
&lt;td&gt;Alt+A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Close dialog&lt;/td&gt;
&lt;td&gt;Esc&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Tips &amp;amp; Mistakes
&lt;/h2&gt;

&lt;p&gt;Tips&lt;br&gt;
    • Use Find Next to verify before replacing&lt;br&gt;
    • Keep a backup copy&lt;br&gt;
    • Limit scope using selection&lt;br&gt;
    • Use wildcards for patterns&lt;/p&gt;

&lt;p&gt;Mistakes to Avoid&lt;br&gt;
    • Replacing without previewing&lt;br&gt;
    • Unintentionally modifying formulas&lt;br&gt;
    • Ignoring case sensitivity&lt;br&gt;
    • Running across entire workbook without intent&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Ctrl+H vs Other Tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;th&gt;Limitations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ctrl+H&lt;/td&gt;
&lt;td&gt;Simple bulk replacements &amp;amp; quick text formatting&lt;/td&gt;
&lt;td&gt;Not ideal for conditional or complex patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formulas&lt;/td&gt;
&lt;td&gt;Logic-based, conditional replacements&lt;/td&gt;
&lt;td&gt;Creates new columns, needs cleanup later&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Power Query&lt;/td&gt;
&lt;td&gt;Large structured transformations&lt;/td&gt;
&lt;td&gt;Setup time &amp;amp; learning curve&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://rowtidy.com/" rel="noopener noreferrer"&gt;RowTidy.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Pattern learning, repetitive multi-sheet cleaning&lt;/td&gt;
&lt;td&gt;External tool, not native Excel&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note: RowTidy.com is an external tool that helps automate data-cleaning workflows. It can be useful for complex, repetitive, or multi-file cleanup situations when manual Find &amp;amp; Replace becomes time-consuming.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Ctrl+H is a fast and effective option for bulk editing and data cleanup in Excel. It works best for straightforward replacements and formatting updates. For more advanced, repetitive, or large-scale cleaning tasks, additional tools, including Excel formulas, Power Query, or AI-based platforms like &lt;a href="https://rowtidy.com/" rel="noopener noreferrer"&gt;RowTidy.com&lt;/a&gt;, can help streamline the process.&lt;/p&gt;

</description>
      <category>msexcel</category>
      <category>rowtidy</category>
      <category>datacleaning</category>
    </item>
    <item>
      <title>Cleaning Supplier Excel Sheets: Best Practices for Small Businesses</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Wed, 12 Nov 2025 11:58:37 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/cleaning-supplier-excel-sheets-best-practices-for-small-businesses-135l</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/cleaning-supplier-excel-sheets-best-practices-for-small-businesses-135l</guid>
      <description>&lt;h1&gt;
  
  
  Cleaning Supplier Excel Sheets: Best Practices for Small Businesses
&lt;/h1&gt;

&lt;p&gt;For small businesses, supplier Excel sheets are the backbone of &lt;strong&gt;inventory management, pricing, and purchasing&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
But here’s the problem: these files are often &lt;strong&gt;messy, inconsistent, and hard to work with&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;From merged cells to inconsistent headers, cleaning supplier spreadsheets can feel like an endless chore.&lt;br&gt;&lt;br&gt;
This guide outlines &lt;strong&gt;best practices small businesses can use&lt;/strong&gt; to clean supplier data efficiently — without wasting valuable time.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛑 Why Supplier Excel Sheets Are Always Messy
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Different Formats Per Supplier&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ One uses &lt;code&gt;Product Name&lt;/code&gt;, another uses &lt;code&gt;Item Description&lt;/code&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Merged Cells &amp;amp; Broken Headers&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Makes sorting, filtering, and formulas unreliable.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mixed Currencies&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ INR, USD, GBP, EUR scattered across sheets.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Duplicated SKUs&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Same product listed in multiple ways.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Manual Updates&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Suppliers send monthly updates in different templates.  &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  ✅ Best Practices for Cleaning Supplier Excel Sheets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Standardize Column Headers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Decide on a &lt;strong&gt;Golden Schema&lt;/strong&gt;: SKU, Product Name, Category, Price, Currency, Stock.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;consistent naming&lt;/strong&gt; across all suppliers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Normalize Dates &amp;amp; Numbers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Convert all dates to &lt;code&gt;YYYY-MM-DD&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Format prices with &lt;strong&gt;2 decimal places&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Strip units like "10 pcs" → &lt;code&gt;10&lt;/code&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Remove Extra Spaces &amp;amp; Hidden Characters
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use Excel’s &lt;code&gt;TRIM()&lt;/code&gt; or automated tools to clean whitespace.
&lt;/li&gt;
&lt;li&gt;Remove line breaks or non-printable characters.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Handle Missing Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Replace blanks with &lt;code&gt;N/A&lt;/code&gt; or &lt;code&gt;0&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;Flag missing SKU or price fields for review.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Merge Multiple Supplier Sheets
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;consistent schema&lt;/strong&gt; so files can be merged without chaos.
&lt;/li&gt;
&lt;li&gt;Remove duplicates across suppliers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Automate Repetitive Tasks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Save cleaning &lt;strong&gt;recipes&lt;/strong&gt; you can reuse monthly.
&lt;/li&gt;
&lt;li&gt;Automate column mapping for each supplier.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📊 Example: Before &amp;amp; After
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Before&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
| Item Code | Unit Cost | Qty | Currency |&lt;br&gt;&lt;br&gt;
|-----------|-----------|-----|----------|&lt;br&gt;&lt;br&gt;
| P001      | 10 pcs    | USD | $20      |&lt;br&gt;&lt;br&gt;
| P002      |           | 5   | INR      |  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After (Golden Schema)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
| SKU  | Product Name | Stock | Price (USD) |&lt;br&gt;&lt;br&gt;
|------|--------------|-------|-------------|&lt;br&gt;&lt;br&gt;
| P001 | N/A          | 10    | 20.00       |&lt;br&gt;&lt;br&gt;
| P002 | N/A          | 5     | 0.00        |  &lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 How RowTidy Helps Small Businesses
&lt;/h2&gt;

&lt;p&gt;Instead of spending hours cleaning supplier sheets:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload messy Excel/CSV files.
&lt;/li&gt;
&lt;li&gt;AI automatically &lt;strong&gt;detects headers, trims spaces, removes duplicates, and normalizes formats&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Save mappings per supplier → one-click reuse.
&lt;/li&gt;
&lt;li&gt;Export to Excel, CSV, JSON, or even Google Sheets.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means &lt;strong&gt;less time on admin, more time on growth&lt;/strong&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  📌 Conclusion
&lt;/h2&gt;

&lt;p&gt;Supplier Excel sheets don’t need to be a bottleneck for small businesses.&lt;br&gt;&lt;br&gt;
By following these best practices (and using automation tools like &lt;strong&gt;RowTidy&lt;/strong&gt;), you can &lt;strong&gt;cut cleanup time from hours to minutes&lt;/strong&gt; and ensure your business runs smoothly.  &lt;/p&gt;




&lt;p&gt;✍️ &lt;em&gt;Ready to stop wasting time on messy supplier files?&lt;/em&gt;&lt;br&gt;&lt;br&gt;
👉 Try &lt;strong&gt;&lt;span&gt;&lt;a href="https://rowtidy.com" rel="noopener noreferrer"&gt;RowTidy&lt;/a&gt;&lt;/span&gt;&lt;/strong&gt; today and get your first month free.&lt;/p&gt;

</description>
      <category>excel</category>
      <category>datacleaning</category>
      <category>smallbusiness</category>
      <category>suppliers</category>
    </item>
    <item>
      <title>Convert Blobs of Text into Rows in Excel Without Manual Work</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Wed, 03 Sep 2025 06:11:35 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/convert-blobs-of-text-into-rows-in-excel-without-manual-work-em9</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/convert-blobs-of-text-into-rows-in-excel-without-manual-work-em9</guid>
      <description>&lt;h1&gt;
  
  
  Convert Blobs of Text into Rows in Excel Without Manual Work
&lt;/h1&gt;

&lt;p&gt;Have you ever received a spreadsheet where entire &lt;strong&gt;paragraphs of information&lt;/strong&gt; are crammed into one cell? Instead of structured rows, you're left with blobs of text that are impossible to analyze.  &lt;/p&gt;

&lt;p&gt;Example:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cell A1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OrderID: 1234, Date: 2024-01-10, Customer: John Doe, Amount: $250&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What you &lt;em&gt;really&lt;/em&gt; want is:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OrderID&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Customer&lt;/th&gt;
&lt;th&gt;Amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1234&lt;/td&gt;
&lt;td&gt;2024-01-10&lt;/td&gt;
&lt;td&gt;John Doe&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This post will show you how to &lt;strong&gt;automatically split blobs of text into rows and columns&lt;/strong&gt; in Excel — no manual copy-paste required.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚨 Why Text Blobs Are a Problem
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Impossible to analyze&lt;/strong&gt; → Pivot tables and charts won't work.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error-prone&lt;/strong&gt; → Manual splitting leads to mistakes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-consuming&lt;/strong&gt; → Cleaning hundreds of rows by hand takes hours.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistent formatting&lt;/strong&gt; → Vendor or export files rarely use the same separators.
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🛠 Method 1: Text-to-Columns (Quick Fix)
&lt;/h2&gt;

&lt;p&gt;Excel's built-in &lt;strong&gt;Text-to-Columns&lt;/strong&gt; is the fastest option if your text is consistently separated by commas, semicolons, or tabs.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps:&lt;/strong&gt;  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select the column with text blobs.
&lt;/li&gt;
&lt;li&gt;Go to &lt;strong&gt;Data → Text to Columns&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Choose &lt;strong&gt;Delimited&lt;/strong&gt; → pick the right delimiter (comma, tab, etc.).
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Finish&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;✔️ Works well for consistent separators.&lt;br&gt;&lt;br&gt;
❌ Breaks if the text uses inconsistent patterns.  &lt;/p&gt;


&lt;h2&gt;
  
  
  🛠 Method 2: Using Formulas (Flexible)
&lt;/h2&gt;

&lt;p&gt;If text contains key-value pairs (like &lt;code&gt;OrderID: 1234&lt;/code&gt;), you can extract fields with formulas.  &lt;/p&gt;

&lt;p&gt;For example, to pull the &lt;strong&gt;OrderID&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=MID(A1, FIND("OrderID:", A1) + 8, FIND(",", A1) - FIND("OrderID:", A1) - 8)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✔️ Customizable for structured text.&lt;br&gt;
❌ Complex for large datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠 Method 3: Power Query (Advanced)
&lt;/h2&gt;

&lt;p&gt;Power Query is excellent for large blobs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to Data → Get &amp;amp; Transform → From Table/Range.&lt;/li&gt;
&lt;li&gt;Split the column by delimiter or by text pattern.&lt;/li&gt;
&lt;li&gt;Transform into rows automatically.&lt;/li&gt;
&lt;li&gt;Load back into Excel.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;✔️ Handles thousands of rows.&lt;br&gt;
❌ Steeper learning curve for beginners.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 Method 4: Automate with RowTidy
&lt;/h2&gt;

&lt;p&gt;If you often deal with vendor exports, invoices, or messy reports where blobs of text appear in single cells, RowTidy automates the cleanup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detects blobs of text inside cells&lt;/li&gt;
&lt;li&gt;Splits structured fields into proper columns&lt;/li&gt;
&lt;li&gt;Converts paragraphs into clean, row-based data&lt;/li&gt;
&lt;li&gt;Exports directly into Excel/CSV/Google Sheets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of wasting hours, just upload your messy file and get a structured dataset in seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Always backup your original data before splitting text.&lt;/li&gt;
&lt;li&gt;Define a standard schema (e.g., always OrderID, Date, Amount).&lt;/li&gt;
&lt;li&gt;Test splitting on a sample set first.&lt;/li&gt;
&lt;li&gt;Use consistent delimiters in future exports if possible.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📌 Conclusion
&lt;/h2&gt;

&lt;p&gt;Blobs of text in Excel are one of the biggest productivity killers. Instead of manually retyping or copy-pasting, use Text-to-Columns, formulas, Power Query, or RowTidy to turn messy text into rows.&lt;/p&gt;

&lt;p&gt;👉 By automating this, you'll save hours every week and ensure your data is always analysis-ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✍️ Tired of splitting text blobs manually?
&lt;/h2&gt;

&lt;p&gt;👉 Try &lt;span&gt;&lt;a href="https://rowtidy.com/" rel="noopener noreferrer"&gt;RowTidy&lt;/a&gt;&lt;/span&gt; for AI-powered text-to-rows conversion and data cleaning.&lt;/p&gt;

</description>
      <category>excel</category>
      <category>googlesheet</category>
      <category>cleandata</category>
    </item>
    <item>
      <title>Building Intelligent Agents with ScrapeGraph: The Complete Guide</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Tue, 24 Jun 2025 16:51:39 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/building-intelligent-agents-with-scrapegraph-the-complete-guide-4ldf</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/building-intelligent-agents-with-scrapegraph-the-complete-guide-4ldf</guid>
      <description>&lt;h1&gt;
  
  
  Empowering Intelligent Agents with ScrapeGraphAI
&lt;/h1&gt;

&lt;p&gt;In today's rapidly evolving digital world, intelligent agents need immediate access to accurate and structured online data to make smart decisions. This is where &lt;strong&gt;ScrapeGraphAI&lt;/strong&gt; comes in—transforming from a simple scraping tool into an essential component for agents. By integrating ScrapeGraphAI, your agents can automatically fetch, validate, and process web data in real time, bridging the gap between raw information and actionable insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Intelligent Agents Need ScrapeGraphAI
&lt;/h2&gt;

&lt;p&gt;Intelligent agents depend on up-to-date data to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enhance Decision-Making&lt;/strong&gt;: Accessing real-time web data enables agents to respond quickly to changing environments.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize Automation&lt;/strong&gt;: With structured data in hand, agents can automate workflows and execute tasks more efficiently.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drive Innovation&lt;/strong&gt;: Agents empowered by reliable data can unlock new insights, driving better strategies and competitive advantages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a tool like ScrapeGraphAI, agents would struggle to access the wealth of data available on the internet—limiting their ability to learn, adapt, and make data-driven decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ScrapeGraphAI Becomes a Tool for Agents
&lt;/h2&gt;

&lt;p&gt;ScrapeGraphAI not only automates web scraping but also integrates seamlessly with intelligent agent frameworks. It serves as a dedicated tool agents can invoke to fetch data whenever needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔑 Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated Data Extraction&lt;/strong&gt;: ScrapeGraphAI handles the complexity of scraping and delivers structured data using predefined schemas.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema Validation&lt;/strong&gt;: Ensures agents receive consistent and reliable information.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Integration&lt;/strong&gt;: Easily bind ScrapeGraphAI to your agent, enabling web scraping as part of its decision-making process.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧠 Example Integration Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MessagesState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemorySaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tools_condition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_scraper_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapegraph_py&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SyncClient&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapegraph_py.logger&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_logger&lt;/span&gt;
    &lt;span class="nf"&gt;get_logger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;sgai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SCRAPEGRAPH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sgai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;smartscraper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;website_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;request_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sgai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_scraper_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapegraph_py&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapegraph_py.logger&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sgai_logger&lt;/span&gt;
    &lt;span class="n"&gt;sgai_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_logging&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INFO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;sgai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SCRAPEGRAPH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sgai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;searchscraper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;request_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sgai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;smart_scraper_func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_scraper_func&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;llm_with_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sys_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant tasked with performing scraping scripts with scrapegraphai. Use the tool asked from the user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assistant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessagesState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;llm_with_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;sys_msg&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])]}&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MessagesState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;assistant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools_condition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How It Fits into the Agent Workflow
&lt;/h3&gt;

&lt;p&gt;ScrapeGraphAI becomes a module in your agent's toolkit. Instead of manually coding web data extraction every time, your agent can simply call this function to retrieve the latest data. This integration allows the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automate Web Data Retrieval&lt;/strong&gt;: Call the scraping tool on-demand during various tasks.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process and Analyze Data&lt;/strong&gt;: Use the structured output for further analysis or to trigger other actions.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhance Responsiveness&lt;/strong&gt;: Make decisions based on current, accurate data pulled directly from the web.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are intelligent agents and how do they use ScrapeGraphAI?
&lt;/h3&gt;

&lt;p&gt;Intelligent agents are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated systems that make decisions
&lt;/li&gt;
&lt;li&gt;Use real-time data for insights
&lt;/li&gt;
&lt;li&gt;Integrate with tools like ScrapeGraphAI
&lt;/li&gt;
&lt;li&gt;Process and analyze web data
&lt;/li&gt;
&lt;li&gt;Adapt to changing conditions
&lt;/li&gt;
&lt;li&gt;Learn from interactions
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How does ScrapeGraphAI enhance agent capabilities?
&lt;/h3&gt;

&lt;p&gt;ScrapeGraphAI enhances agents by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Providing structured web data
&lt;/li&gt;
&lt;li&gt;Enabling real-time data collection
&lt;/li&gt;
&lt;li&gt;Offering schema validation
&lt;/li&gt;
&lt;li&gt;Supporting multiple data sources
&lt;/li&gt;
&lt;li&gt;Automating data extraction
&lt;/li&gt;
&lt;li&gt;Ensuring data accuracy
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What types of data can agents collect with ScrapeGraphAI?
&lt;/h3&gt;

&lt;p&gt;Agents can collect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product information
&lt;/li&gt;
&lt;li&gt;Market trends
&lt;/li&gt;
&lt;li&gt;Competitor data
&lt;/li&gt;
&lt;li&gt;User reviews
&lt;/li&gt;
&lt;li&gt;Price data
&lt;/li&gt;
&lt;li&gt;Industry insights
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How do I integrate ScrapeGraphAI with my existing agents?
&lt;/h3&gt;

&lt;p&gt;Integration steps include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing required packages
&lt;/li&gt;
&lt;li&gt;Setting up API authentication
&lt;/li&gt;
&lt;li&gt;Configuring data schemas
&lt;/li&gt;
&lt;li&gt;Implementing error handling
&lt;/li&gt;
&lt;li&gt;Setting up monitoring
&lt;/li&gt;
&lt;li&gt;Testing integration
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What are the best practices for agent-based scraping?
&lt;/h3&gt;

&lt;p&gt;Best practices include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementing rate limiting
&lt;/li&gt;
&lt;li&gt;Using proper error handling
&lt;/li&gt;
&lt;li&gt;Validating extracted data
&lt;/li&gt;
&lt;li&gt;Monitoring agent performance
&lt;/li&gt;
&lt;li&gt;Maintaining data quality
&lt;/li&gt;
&lt;li&gt;Following platform policies
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How can I scale my agent operations?
&lt;/h3&gt;

&lt;p&gt;Scaling strategies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using distributed processing
&lt;/li&gt;
&lt;li&gt;Implementing load balancing
&lt;/li&gt;
&lt;li&gt;Managing resource allocation
&lt;/li&gt;
&lt;li&gt;Optimizing data storage
&lt;/li&gt;
&lt;li&gt;Monitoring performance
&lt;/li&gt;
&lt;li&gt;Handling concurrent requests
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What are common challenges in agent integration?
&lt;/h3&gt;

&lt;p&gt;Common challenges include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data validation issues
&lt;/li&gt;
&lt;li&gt;Rate limiting concerns
&lt;/li&gt;
&lt;li&gt;Authentication handling
&lt;/li&gt;
&lt;li&gt;Error management
&lt;/li&gt;
&lt;li&gt;Performance optimization
&lt;/li&gt;
&lt;li&gt;Resource allocation
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How do I handle errors in agent operations?
&lt;/h3&gt;

&lt;p&gt;Error handling includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementing retry logic
&lt;/li&gt;
&lt;li&gt;Logging error details
&lt;/li&gt;
&lt;li&gt;Setting up alerts
&lt;/li&gt;
&lt;li&gt;Managing timeouts
&lt;/li&gt;
&lt;li&gt;Validating responses
&lt;/li&gt;
&lt;li&gt;Maintaining fallbacks
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What security measures should I implement?
&lt;/h3&gt;

&lt;p&gt;Security measures include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API key protection
&lt;/li&gt;
&lt;li&gt;Data encryption
&lt;/li&gt;
&lt;li&gt;Access control
&lt;/li&gt;
&lt;li&gt;Audit logging
&lt;/li&gt;
&lt;li&gt;Error handling
&lt;/li&gt;
&lt;li&gt;Compliance monitoring
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How can I monitor agent performance?
&lt;/h3&gt;

&lt;p&gt;Monitoring includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tracking success rates
&lt;/li&gt;
&lt;li&gt;Measuring response times
&lt;/li&gt;
&lt;li&gt;Monitoring resource usage
&lt;/li&gt;
&lt;li&gt;Analyzing error patterns
&lt;/li&gt;
&lt;li&gt;Checking data quality
&lt;/li&gt;
&lt;li&gt;Evaluating efficiency
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What are the costs involved?
&lt;/h3&gt;

&lt;p&gt;Cost considerations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API usage fees
&lt;/li&gt;
&lt;li&gt;Computing resources
&lt;/li&gt;
&lt;li&gt;Storage requirements
&lt;/li&gt;
&lt;li&gt;Maintenance costs
&lt;/li&gt;
&lt;li&gt;Development time
&lt;/li&gt;
&lt;li&gt;Monitoring tools
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How do I maintain my agent system?
&lt;/h3&gt;

&lt;p&gt;Maintenance tasks include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular updates
&lt;/li&gt;
&lt;li&gt;Performance monitoring
&lt;/li&gt;
&lt;li&gt;Error checking
&lt;/li&gt;
&lt;li&gt;Data validation
&lt;/li&gt;
&lt;li&gt;System optimization
&lt;/li&gt;
&lt;li&gt;Documentation updates
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What development skills are needed?
&lt;/h3&gt;

&lt;p&gt;Required skills include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python programming
&lt;/li&gt;
&lt;li&gt;API integration
&lt;/li&gt;
&lt;li&gt;Data processing
&lt;/li&gt;
&lt;li&gt;Error handling
&lt;/li&gt;
&lt;li&gt;System architecture
&lt;/li&gt;
&lt;li&gt;Performance optimization
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How can I ensure data quality?
&lt;/h3&gt;

&lt;p&gt;Quality assurance includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schema validation
&lt;/li&gt;
&lt;li&gt;Data cleaning
&lt;/li&gt;
&lt;li&gt;Error checking
&lt;/li&gt;
&lt;li&gt;Format verification
&lt;/li&gt;
&lt;li&gt;Consistency checks
&lt;/li&gt;
&lt;li&gt;Regular testing
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What are the limitations of agent-based scraping?
&lt;/h3&gt;

&lt;p&gt;Limitations include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limiting
&lt;/li&gt;
&lt;li&gt;Resource constraints
&lt;/li&gt;
&lt;li&gt;Platform restrictions
&lt;/li&gt;
&lt;li&gt;Data availability
&lt;/li&gt;
&lt;li&gt;Processing speed
&lt;/li&gt;
&lt;li&gt;Accuracy concerns
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Integrating ScrapeGraphAI into your intelligent agents is a game changer. It provides a seamless bridge between the vast amount of web data and the sophisticated decision-making capabilities of your agents. With ScrapeGraphAI as a dedicated tool, your agents can operate with real-time information—driving innovation, efficiency, and strategic advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embrace ScrapeGraphAI, empower your agents, and unlock the true potential of data-driven automation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Happy coding and innovating!&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Resources
&lt;/h2&gt;

&lt;p&gt;Want to learn more about intelligent agents and web scraping? Explore these guides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com/blog/101-scraping" rel="noopener noreferrer"&gt;Web Scraping 101&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/how-to-create-agent-without-frameworks" rel="noopener noreferrer"&gt;Building Agents Without Frameworks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/multi-agent" rel="noopener noreferrer"&gt;Multi-Agent Systems&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/ai-agent-webscraping" rel="noopener noreferrer"&gt;AI Agent Web Scraping&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/scrapegraphai-crewai-integration" rel="noopener noreferrer"&gt;ScrapeGraphAI CrewAI Integration&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/scrapegraphai-llamaindex-integration" rel="noopener noreferrer"&gt;LlamaIndex Integration&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/integrating-scrapegraph-into-intelligent-agents" rel="noopener noreferrer"&gt;Building Intelligent Agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/pre-ai-to-post-ai-scraping" rel="noopener noreferrer"&gt;Pre-AI to Post-AI Scraping&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapegraphai.com//blog/legality-of-web-scraping-of-web-scraping" rel="noopener noreferrer"&gt;Web Scraping Legality&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>brightdatachallenge</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Monitor Website Changes with Python, Streamlit, Slack &amp; Olostep</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Tue, 06 May 2025 16:19:49 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/monitor-website-changes-with-python-streamlit-slack-olostep-3h87</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/monitor-website-changes-with-python-streamlit-slack-olostep-3h87</guid>
      <description>&lt;p&gt;&lt;a href="https://gist.github.com/mdehsan873/1a3e7a40f19c5693d1b8eb4f1aa9ac47" rel="noopener noreferrer"&gt;Souce code&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  🛠️ Monitor Website Changes with Python, Streamlit, Slack &amp;amp; Olostep
&lt;/h1&gt;

&lt;p&gt;Keeping tabs on your competitors or your own brand pages is crucial — changes in pricing, content, or job openings can have a direct impact on your strategy. This blog shows how to monitor &lt;strong&gt;any public webpage&lt;/strong&gt; for updates using:&lt;/p&gt;

&lt;p&gt;✅ Python&lt;br&gt;&lt;br&gt;
✅ Streamlit&lt;br&gt;&lt;br&gt;
✅ Slack for alerts&lt;br&gt;&lt;br&gt;
✅ Olostep API for web scraping&lt;/p&gt;


&lt;h2&gt;
  
  
  🎯 What You’ll Learn
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How to detect &lt;strong&gt;pricing updates&lt;/strong&gt;, &lt;strong&gt;new jobs&lt;/strong&gt;, &lt;strong&gt;copy changes&lt;/strong&gt;, &lt;strong&gt;new articles/pages&lt;/strong&gt;, &lt;strong&gt;logos&lt;/strong&gt;, and more&lt;/li&gt;
&lt;li&gt;How to use &lt;strong&gt;Olostep’s scraping API&lt;/strong&gt; to retrieve website content&lt;/li&gt;
&lt;li&gt;How to compare two content versions and find changes&lt;/li&gt;
&lt;li&gt;How to notify your team via &lt;strong&gt;Slack&lt;/strong&gt; when a change is detected&lt;/li&gt;
&lt;li&gt;How to visualize and interact with the app using &lt;strong&gt;Streamlit&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  🔧 Step 1: Install Dependencies
&lt;/h2&gt;

&lt;p&gt;Start by installing the necessary packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit requests python-dotenv slack_sdk difflib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll also need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Slack bot token&lt;/li&gt;
&lt;li&gt;Your Slack channel ID&lt;/li&gt;
&lt;li&gt;Your Olostep API key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file to securely store them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OLOSTEP_API_KEY=your_olostep_key
SLACK_BOT_TOKEN=xoxb-your-slack-token
SLACK_CHANNEL_ID=your-channel-id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🌐 Step 2: Set Up Web Scraping with Olostep
&lt;/h2&gt;

&lt;p&gt;We use Olostep’s &lt;code&gt;/scrapes&lt;/code&gt; endpoint to fetch the raw HTML or markdown content of any URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url_to_scrape&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/scrapes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🧠 Why markdown? It's easier to analyze than raw HTML and more structured for identifying changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Step 3: Compare New vs Old Content
&lt;/h2&gt;

&lt;p&gt;We compare the newly scraped content with the last saved version using Python’s &lt;code&gt;difflib&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compare_versions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;difflib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unified_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;old_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
        &lt;span class="n"&gt;new_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
        &lt;span class="n"&gt;lineterm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We store the previous version in a local file (&lt;code&gt;latest.md&lt;/code&gt;) and load it when the app runs again.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔔 Step 4: Notify via Slack
&lt;/h2&gt;

&lt;p&gt;When a change is detected, send a message to your Slack workspace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;slack_sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WebClient&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_slack_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;change_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WebClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SLACK_BOT_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔍 *Website Change Detected!*
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
{change_text[:3000]}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    client.chat_postMessage(channel=os.getenv("SLACK_CHANNEL_ID"), text=message)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 Slack messages are capped to ~3000 chars for readability. You can link to a diff file if needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  🖼️ Step 5: Build the Streamlit App
&lt;/h2&gt;

&lt;p&gt;We use Streamlit for a clean UI to trigger checks and view results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔎 Website Change Monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter the website URL to monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check for Changes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;new_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;scrape_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;old_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;diff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compare_versions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;send_slack_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Changes detected and sent to Slack!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No changes found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;First time check. Saving initial version.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧪 Example: Monitoring a Competitor's Pricing Page
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Add your competitor's pricing page: &lt;code&gt;https://example.com/pricing&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run the Streamlit app&lt;/li&gt;
&lt;li&gt;It scrapes content, checks for diff, and posts to Slack if any changes are found&lt;/li&gt;
&lt;li&gt;Useful for:

&lt;ul&gt;
&lt;li&gt;Pricing wars&lt;/li&gt;
&lt;li&gt;Detecting seasonal offers&lt;/li&gt;
&lt;li&gt;New plans or structure&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Monitor your own site for accidental changes&lt;/li&gt;
&lt;li&gt;✅ Keep an eye on competitors&lt;/li&gt;
&lt;li&gt;✅ Detect new job postings on a careers page&lt;/li&gt;
&lt;li&gt;✅ Track new testimonials, logos, or articles&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛠️ Future Enhancements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Save diffs to a database&lt;/li&gt;
&lt;li&gt;Visualize changes over time&lt;/li&gt;
&lt;li&gt;Monitor multiple pages in parallel&lt;/li&gt;
&lt;li&gt;Email alerts as a fallback&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This simple automation stack — built with &lt;strong&gt;Python + Streamlit + Slack + Olostep&lt;/strong&gt; — empowers you to stay ahead of competitors or monitor any site changes. Customize it to track any page that matters.&lt;/p&gt;

&lt;p&gt;Happy hacking 🚀&lt;/p&gt;

</description>
      <category>programming</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>Build an AI-Powered Stock Analyzer Using Streamlit, Olostep, and OpenAI</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Fri, 02 May 2025 05:52:47 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/build-an-ai-powered-stock-analyzer-using-streamlit-olostep-and-openai-4a3c</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/build-an-ai-powered-stock-analyzer-using-streamlit-olostep-and-openai-4a3c</guid>
      <description>&lt;h2&gt;
  
  
  Build an AI-Powered Stock Analyzer Using Streamlit, Olostep, and OpenAI
&lt;/h2&gt;

&lt;p&gt;Ever wanted to analyze stocks using AI without manually going through dozens of websites?&lt;/p&gt;

&lt;p&gt;In this tutorial, you’ll build a &lt;strong&gt;Streamlit app&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;Olostep's scraping API&lt;/strong&gt; to fetch stock data from &lt;a href="https://www.marketwatch.com" rel="noopener noreferrer"&gt;MarketWatch&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sends the content to &lt;strong&gt;OpenAI GPT-4&lt;/strong&gt; to rate stocks based on their performance and description&lt;/li&gt;
&lt;li&gt;Visualizes the investment score with interactive charts&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://gist.github.com/mdehsan873/99502a755e4554873e602ac1ba7733c6" rel="noopener noreferrer"&gt;Source Code&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ Requirements
&lt;/h2&gt;

&lt;p&gt;Make sure to install the following dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit openai requests python-dotenv matplotlib beautifulsoup4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file and add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=your_openai_key
OLOSTEP_API_KEY=your_olostep_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔧 Step 1: Scrape Stock Pages with Olostep
&lt;/h2&gt;

&lt;p&gt;Use Olostep’s &lt;code&gt;scrape&lt;/code&gt; endpoint to fetch the latest data for stock tickers from MarketWatch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape_stock_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url_to_scrape&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.marketwatch.com/investing/stock/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/scrapes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧠 Step 2: Analyze Stock Content with OpenAI GPT-4
&lt;/h2&gt;

&lt;p&gt;We ask GPT-4 to evaluate and assign a score from 0 to 100 for each stock.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_stocks_with_gpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tickers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdowns&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a stock analyst. Based on the following markdowns from MarketWatch, rate each stock on a scale of 0–100 as an investment opportunity. Return only JSON format like:

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scores&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [
    {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AAPL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 92},
    {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stock_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MSFT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 88}
  ]
}&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;


    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tickers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdowns&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticker: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Content:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🎨 Step 3: Visualize Results in Streamlit
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;plot_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;stock_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stock_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stock_names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;green&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Investment Score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI-Scored Stocks by OpenAI GPT-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pyplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🚀 Step 4: Complete Streamlit App
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_page_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Stock Analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_icon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;📈&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;📈 AI Stock Analyzer with Olostep + OpenAI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tickers_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter comma-separated stock tickers (e.g., AAPL,MSFT,TSLA)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tickers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tickers_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="n"&gt;markdowns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;scrape_stock_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ticker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tickers&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdowns&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No data found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyzing stocks with GPT...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_stocks_with_gpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tickers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdowns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analysis complete!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;plot_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scores&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ✅ Example Output
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stock&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AAPL&lt;/td&gt;
&lt;td&gt;91&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TSLA&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MSFT&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  💼 Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Retail investors screening multiple stocks&lt;/li&gt;
&lt;li&gt;Analysts comparing high-volume tickers&lt;/li&gt;
&lt;li&gt;Developers building AI-based investing tools&lt;/li&gt;
&lt;li&gt;Newsletter writers looking to automate stock ideas&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔮 What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Export scores to a Google Sheet or PDF&lt;/li&gt;
&lt;li&gt;Use historical stock data and financial ratios&lt;/li&gt;
&lt;li&gt;Add GPT explanation for each score&lt;/li&gt;
&lt;li&gt;Deploy on &lt;a href="https://streamlit.io/cloud" rel="noopener noreferrer"&gt;Streamlit Cloud&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💡 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This app combines &lt;strong&gt;real-time scraping&lt;/strong&gt;, &lt;strong&gt;AI rating&lt;/strong&gt;, and &lt;strong&gt;visual insights&lt;/strong&gt; into one seamless experience. It’s ideal for anyone looking to use AI to make smarter investment decisions at scale.&lt;/p&gt;

&lt;p&gt;Happy investing! 📈&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Analyze GitHub Profiles Using Olostep API and GPT-4 in Streamlit</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Fri, 02 May 2025 03:45:13 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/analyze-github-profiles-using-olostep-api-and-gpt-4-in-streamlit-6gg</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/analyze-github-profiles-using-olostep-api-and-gpt-4-in-streamlit-6gg</guid>
      <description>&lt;h2&gt;
  
  
  Analyze GitHub Profiles Using Olostep API and GPT-4 in Streamlit
&lt;/h2&gt;

&lt;p&gt;Want to extract meaningful insights from a developer’s GitHub profile automatically? In this guide, we’ll walk through how to build a &lt;strong&gt;Streamlit web app&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scrapes GitHub profile content using &lt;strong&gt;Olostep’s Scrape API&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Uses &lt;strong&gt;OpenAI GPT-4&lt;/strong&gt; to analyze and summarize profile insights&lt;/li&gt;
&lt;li&gt;Displays the results neatly using &lt;strong&gt;Streamlit&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Objectives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Scrape public GitHub profiles using Olostep API
&lt;/li&gt;
&lt;li&gt;Generate analysis using OpenAI GPT-4
&lt;/li&gt;
&lt;li&gt;Build an intuitive UI with Streamlit
&lt;/li&gt;
&lt;li&gt;Present insights like skills, contributions, and collaboration
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit openai requests python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file in your root directory with your API credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OLOSTEP_API_KEY=your_olostep_api_key
OPENAI_API_KEY=your_openai_api_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;github-analyzer/
├── app.py
├── .env
└── requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 1: Import Libraries and Load Environment Variables
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;streamlit&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;olostep_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Scrape GitHub Profile using Olostep
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape_profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url_to_scrape&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://github.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/scrapes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Analyze Markdown Content with GPT-4
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_with_gpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Analyze the GitHub profile below for insights:
    - Professional background (company, location, role)
    - Activity (repos, stars, streaks)
    - Tech stack
    - Community engagement

    Username: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Markdown Content:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4: Build Streamlit UI
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_page_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GitHub Analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_icon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🐙&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🐙 GitHub Profile Analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter a GitHub username (e.g., torvalds)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze Profile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scraping GitHub profile...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;scrape_profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Could not scrape the profile.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generating AI analysis...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyze_with_gpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## 🧠 GPT-4 Generated Insights&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Example Output
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GitHub Profile Analysis: torvalds

## 1. Professional Background
• Location: Portland, Oregon
• Works on the Linux kernel

## 2. Activity Analysis
• Top repository: linux
• Thousands of contributions
• Maintains core system software

## 3. Tech Stack
• C, C++, Shell scripting

## 4. Community Engagement
• Collaborates with hundreds of devs
• Active in pull request reviews
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Hiring teams evaluating developer contributions
&lt;/li&gt;
&lt;li&gt;Candidates generating portfolio summaries
&lt;/li&gt;
&lt;li&gt;Open-source communities reviewing contributors
&lt;/li&gt;
&lt;li&gt;AI agents assessing technical depth of profiles
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Add export to PDF
&lt;/li&gt;
&lt;li&gt;Add batch processing for multiple usernames
&lt;/li&gt;
&lt;li&gt;Deploy to &lt;a href="https://streamlit.io/cloud" rel="noopener noreferrer"&gt;Streamlit Cloud&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This full-stack project combines &lt;strong&gt;web scraping&lt;/strong&gt;, &lt;strong&gt;AI summarization&lt;/strong&gt;, and &lt;strong&gt;interactive frontend&lt;/strong&gt; to deliver structured GitHub insights. With Olostep + OpenAI + Streamlit, you can automate what once required hours of manual review.&lt;/p&gt;

&lt;p&gt;Happy hacking! 🧙‍♂️&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to Find and Extract All URLs from a Website Using Olostep Maps API and Streamlit</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Sun, 27 Apr 2025 12:05:09 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/how-to-find-and-extract-all-urls-from-a-website-using-olostep-maps-api-and-streamlit-1cbp</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/how-to-find-and-extract-all-urls-from-a-website-using-olostep-maps-api-and-streamlit-1cbp</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When building web crawlers, competitive analysis, SEO audits, or AI agents, one of the &lt;strong&gt;first critical tasks&lt;/strong&gt; is finding all the URLs on a website.&lt;/p&gt;

&lt;p&gt;While traditional methods like Google search tricks, sitemap exploration, and SEO tools work, there's a &lt;strong&gt;faster, modern way&lt;/strong&gt;: using &lt;strong&gt;Olostep Maps API&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this guide, we'll:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduce the challenge of URL discovery&lt;/li&gt;
&lt;li&gt;Show how to build a &lt;strong&gt;live Streamlit app&lt;/strong&gt; to scrape all URLs&lt;/li&gt;
&lt;li&gt;Compare it with traditional techniques (like sitemap.xml and robots.txt)&lt;/li&gt;
&lt;li&gt;Provide complete runnable Python code&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Target Audience:&lt;/strong&gt; Developers, Growth Engineers, Data Scientists, SEO specialists, and Founders who need structured, scalable scraping.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Extract All URLs?
&lt;/h2&gt;

&lt;p&gt;Finding every page on a website can help you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analyze site structure&lt;/strong&gt; (for SEO)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scrape website content&lt;/strong&gt; efficiently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Find hidden gems&lt;/strong&gt; like orphan pages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monitor website changes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prepare data&lt;/strong&gt; for AI agents and automation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Traditional Methods (Before Olostep)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Sitemaps (XML Files)
&lt;/h3&gt;

&lt;p&gt;Webmasters often create XML sitemaps to help Google index their sites. Here's an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;urlset&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;url&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;loc&amp;gt;&lt;/span&gt;https://example.com&lt;span class="nt"&gt;&amp;lt;/loc&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/url&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;url&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;loc&amp;gt;&lt;/span&gt;https://example.com/about&lt;span class="nt"&gt;&amp;lt;/loc&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/url&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/urlset&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To find sitemaps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit &lt;code&gt;/sitemap.xml&lt;/code&gt; (e.g., &lt;a href="https://example.com/sitemap.xml" rel="noopener noreferrer"&gt;https://example.com/sitemap.xml&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Check &lt;code&gt;/robots.txt&lt;/code&gt; (it usually links to the sitemap)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other possible sitemap locations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/sitemap.xml.gz&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/sitemap_index.xml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/sitemap.php&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also Google:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;site:example.com filetype:xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some websites don't maintain updated sitemaps.&lt;/li&gt;
&lt;li&gt;Not all pages may be listed.&lt;/li&gt;
&lt;li&gt;Dynamic websites (heavy JavaScript) often leave out many pages.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Robots.txt
&lt;/h3&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User-agent: *
Sitemap: https://example.com/sitemap.xml
Disallow: /admin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for finding disallowed URLs and sitemap links, but again &lt;strong&gt;not comprehensive&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Modern Solution: Olostep Maps API
&lt;/h2&gt;

&lt;p&gt;✅ Find &lt;strong&gt;up to 100,000 URLs&lt;/strong&gt; in seconds.&lt;br&gt;&lt;br&gt;
✅ No need to manually find sitemap or robots.txt.&lt;br&gt;&lt;br&gt;
✅ Simple API call.&lt;br&gt;&lt;br&gt;
✅ No server maintenance or IP bans.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://gist.github.com/mdehsan873/b67246777f0e3085e5db304dbaacf3f6" rel="noopener noreferrer"&gt;Full code Gist&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's &lt;strong&gt;build a full Streamlit app&lt;/strong&gt; to demo this!&lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ Full Project: Website URL Extractor with Olostep Maps API + Streamlit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install Requirements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Python Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;streamlit&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_urls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/maps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to fetch URLs. Status code: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔎 Website URL Scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use Olostep Maps API to instantly extract all discovered URLs from any website. Great for SEO, scraping, site analysis, and more!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter your Olostep API Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;url_to_scrape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enter Website URL (e.g., https://example.com)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find URLs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;url_to_scrape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetching URLs...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_urls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url_to_scrape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
            &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; URLs!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;](&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;📄 Download URLs as Text File&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;discovered_urls.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;mime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/plain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  📸 Example Output
&lt;/h2&gt;

&lt;p&gt;✅ Found 35 URLs from &lt;code&gt;https://docs.olostep.com&lt;/code&gt;&lt;br&gt;&lt;br&gt;
📥 Saved as &lt;code&gt;discovered_urls.txt&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡ Why Olostep Maps API Beats Traditional Methods
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Sitemap/Robots.txt&lt;/th&gt;
&lt;th&gt;SEO Spider&lt;/th&gt;
&lt;th&gt;Olostep Maps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instant Response&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handles JS-heavy Sites&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ (Partial)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handles Big Sites&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌ (Limit)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No Setup Needed&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Easy Pagination&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  📈 Conclusion
&lt;/h2&gt;

&lt;p&gt;Using Olostep Maps API + a few lines of Streamlit code, you can build powerful &lt;strong&gt;website discovery tools&lt;/strong&gt; in minutes.&lt;/p&gt;

&lt;p&gt;No more worrying about sitemaps, robots.txt, or getting blocked by firewalls.&lt;/p&gt;

&lt;p&gt;✅ Super fast&lt;br&gt;&lt;br&gt;
✅ Reliable&lt;br&gt;&lt;br&gt;
✅ Perfect for Growth Engineering, SEO, Scraping, and Automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Ready to try?
&lt;/h2&gt;

&lt;p&gt;Register at 👉 &lt;a href="https://olostep.com" rel="noopener noreferrer"&gt;Olostep.com&lt;/a&gt; and start building your own data pipelines today!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Written by:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Mohammad Ehsan Ansari&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Growth Engineer @ Olostep&lt;/p&gt;

&lt;p&gt;Happy scraping! 🚀&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>api</category>
      <category>growthengineering</category>
    </item>
    <item>
      <title>Build a Website Knowledge Chatbot Using Streamlit, ChromaDB, Olostep, and OpenAI</title>
      <dc:creator>Mohammad Ehsan Ansari</dc:creator>
      <pubDate>Fri, 25 Apr 2025 13:43:53 +0000</pubDate>
      <link>https://dev.to/mohammad_ehsanansari_671/build-a-website-knowledge-chatbot-using-streamlit-chromadb-olostep-and-openai-21dl</link>
      <guid>https://dev.to/mohammad_ehsanansari_671/build-a-website-knowledge-chatbot-using-streamlit-chromadb-olostep-and-openai-21dl</guid>
      <description>&lt;p&gt;Have you ever wanted a smart AI assistant that understands your entire website and can answer questions like ChatGPT? In this tutorial, we’ll show you how to build it — without training your own LLM or managing any backend.&lt;/p&gt;

&lt;p&gt;We’ll use:&lt;/p&gt;

&lt;p&gt;✅ &lt;a href="https://www.olostep.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Olostep&lt;/strong&gt;&lt;/a&gt; to crawl and extract website content&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;ChromaDB&lt;/strong&gt; to store and search content embeddings with metadata&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;OpenAI&lt;/strong&gt; (v1.7.6) for embeddings and GPT-4 summarization&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Streamlit&lt;/strong&gt; to build a live chatbot UI&lt;/p&gt;

&lt;p&gt;Perfect for product sites, documentation portals, and landing pages.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔧 What You'll Need
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit &lt;span class="nv"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;1.7.6 chromadb requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/signup" rel="noopener noreferrer"&gt;OpenAI API key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://olostep.com" rel="noopener noreferrer"&gt;Olostep API key&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 How It Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Crawl website pages using Olostep’s API
&lt;/li&gt;
&lt;li&gt;Clean content and extract Markdown
&lt;/li&gt;
&lt;li&gt;Embed each page with OpenAI embeddings
&lt;/li&gt;
&lt;li&gt;Store everything in ChromaDB (including metadata)
&lt;/li&gt;
&lt;li&gt;Let users ask questions via Streamlit
&lt;/li&gt;
&lt;li&gt;Query top matches and summarize answers with GPT&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🧩 Step-by-Step Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Crawl Website
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;include_urls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/**&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/crawls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Wait and Retrieve Pages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wait_for_crawl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crawl_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/crawls/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;crawl_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_pages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crawl_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/crawls/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;crawl_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Clean Markdown Content
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#+ |\* |\&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\[(.*?)\]\(.*?\)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;`|\*\*|_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\n{2,}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Initialize ChromaDB
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;chromadb.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;embedding_functions&lt;/span&gt;

&lt;span class="n"&gt;chroma_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;openai_embed_fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddingFunction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-ada-002&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chroma_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;website-content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openai_embed_fn&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Index Content
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieve_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.olostep.com/v1/retrieve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_OLOSTEP_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retrieve_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formats&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}],&lt;/span&gt;
                    &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Indexed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;⚠️ Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Summarize with GPT
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;summarize_with_gpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sorry, I couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t find enough information.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;&lt;span class="s"&gt;
Use the following website content to answer this question:

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Q: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
A:
&lt;/span&gt;&lt;span class="sh"&gt;'''&lt;/span&gt;

    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7. Streamlit Frontend
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;streamlit&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;

&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_page_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Website Chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_icon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;💬&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;💬 Website Chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;caption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask anything based on your website content.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask your question...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Thinking...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;query_website&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_with_gpt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ✅ Live Demo Preview
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ask: &lt;em&gt;What services do you offer?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Ask: &lt;em&gt;Where is your pricing page?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Ask: &lt;em&gt;How can I contact support?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The assistant will generate answers using real indexed content from your website.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Save/load your ChromaDB collection&lt;/li&gt;
&lt;li&gt;Split large documents into smaller chunks&lt;/li&gt;
&lt;li&gt;Include source URLs in GPT responses&lt;/li&gt;
&lt;li&gt;Add memory to handle multi-turn chat&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯 Conclusion
&lt;/h2&gt;

&lt;p&gt;Congratulations! You've now built a fully functional AI-powered chatbot that can answer questions from your website using ChromaDB, Olostep, and OpenAI — all wrapped in a beautiful Streamlit app.&lt;/p&gt;

&lt;p&gt;Whether for internal docs, support, or public knowledge, this gives you ChatGPT power without managing any LLMs.&lt;/p&gt;

&lt;p&gt;Happy building! 🚀&lt;br&gt;
&lt;a href="https://gist.github.com/mdehsan873/f69481997f487e23b1d1282c82ce00f5" rel="noopener noreferrer"&gt;https://gist.github.com/mdehsan873/f69481997f487e23b1d1282c82ce00f5&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
