<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vinicius Pereira</title>
    <description>The latest articles on DEV Community by Vinicius Pereira (@vinimabreu).</description>
    <link>https://dev.to/vinimabreu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4010065%2Fd51bbccb-85bd-431c-8ae0-02f0ccfc120c.jpg</url>
      <title>DEV Community: Vinicius Pereira</title>
      <link>https://dev.to/vinimabreu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vinimabreu"/>
    <language>en</language>
    <item>
      <title>Sonnet 5 dropped today. Watch the other hand.</title>
      <dc:creator>Vinicius Pereira</dc:creator>
      <pubDate>Tue, 30 Jun 2026 22:04:06 +0000</pubDate>
      <link>https://dev.to/vinimabreu/sonnet-5-dropped-today-watch-the-other-hand-p50</link>
      <guid>https://dev.to/vinimabreu/sonnet-5-dropped-today-watch-the-other-hand-p50</guid>
      <description>&lt;p&gt;Sonnet 5 landed today and everyone's busy benchmarking it. Fair, it looks like a strong, cheap, very agentic model, close to Opus 4.8 for a fraction of the cost. But the most important model news this month isn't Sonnet 5. It's that Fable 5, Anthropic's most powerful model, has been switched off worldwide since June 12, and it's still down.&lt;/p&gt;

&lt;p&gt;Not rate-limited. Not deprecated. Export-controlled. Anthropic got an emergency US Commerce Department directive citing national security after a way to jailbreak Fable 5's safeguards surfaced, and since they couldn't verify the nationality of every request in real time, they pulled it for everyone, Americans included. Mythos 5 got a partial reprieve for a handful of cyber-defense orgs. Fable 5 is still dark.&lt;/p&gt;

&lt;p&gt;Sit with that. The most capable model on the market right now isn't gated by price, it's gated by law. "Available to everyone who can pay" quietly became "available to whoever the government allows." That's a different universe to build a company in.&lt;/p&gt;

&lt;p&gt;Now look at what shipped in the same window: Sonnet 5, explicitly the cheaper, more agentic, good-enough model. The industry's answer to "the frontier just got pulled" is "you probably didn't need the frontier anyway." Which, honestly, has been true for most of us for a while. Most production AI fails on reliability, eval, and retrieval, not on raw model IQ. A bigger brain was never the thing standing between your demo and prod.&lt;/p&gt;

&lt;p&gt;So the take: chasing the newest frontier was always a weak moat, and now it's a fragile one, it can be switched off by a government letter on a Tuesday, it literally just was. The teams that win the next year are the ones who (a) made their systems reliable enough to run a cheaper or open model without flinching, and (b) didn't bet the company on a single vendor in a single jurisdiction. Reliability and independence beat raw capability the moment capability becomes a policy decision.&lt;/p&gt;

&lt;p&gt;Tell me I'm wrong. Is Sonnet 5 a real step change for your use case, or is the actual headline that the frontier now ships with an export-control letter attached?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
    <item>
      <title>What changed since the last scrape? A small change-detection layer (stdlib only)</title>
      <dc:creator>Vinicius Pereira</dc:creator>
      <pubDate>Tue, 30 Jun 2026 20:23:53 +0000</pubDate>
      <link>https://dev.to/vinimabreu/what-changed-since-the-last-scrape-a-small-change-detection-layer-stdlib-only-6lc</link>
      <guid>https://dev.to/vinimabreu/what-changed-since-the-last-scrape-a-small-change-detection-layer-stdlib-only-6lc</guid>
      <description>&lt;p&gt;Most of my scrapers answer one question: what's on the site right now. But that's almost never the question I actually have. What I care about is what changed since the last run. A new listing showed up, a price dropped, a product disappeared, a status flipped from open to closed. The current snapshot on its own doesn't tell me any of that.&lt;/p&gt;

&lt;p&gt;For a while I rebuilt the same thing on every project: load last run's JSON, compare it to this run, work out what's new, what's gone, and what changed. It's never hard, but it's fiddly, and I kept getting the same details wrong (more on that below). So I pulled it into one small reusable piece and stopped rewriting it. It's called scrape-sentinel.&lt;/p&gt;

&lt;p&gt;This post is about the design more than the tool, because the interesting part is the handful of decisions that make change detection annoying to get right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core idea
&lt;/h2&gt;

&lt;p&gt;You give it the records from this run and a key, and it tells you what was added, removed, and changed since last time. For changed records, it tells you which fields moved and from what to what.&lt;/p&gt;

&lt;p&gt;The diff itself is a plain function with no I/O:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrape_sentinel&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;diff&lt;/span&gt;

&lt;span class="n"&gt;cs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;previous_records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ignore_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraped_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;added&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;changed&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;changed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;changed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deltas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;old&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which prints something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new: W-104
W-101 price 39.0 -&amp;gt; 35.0
W-101 in_stock True -&amp;gt; False
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The details that kept biting me
&lt;/h2&gt;

&lt;p&gt;A few decisions are the whole reason this is worth extracting instead of rewriting inline every time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Match by key, not by position.&lt;/strong&gt; This is the big one. If you diff two lists positionally, a re-sorted page or a reordered API response looks like every row changed. Matching on a stable key (one field or a few) means a reordered run shows zero changes, which is correct.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The first run is a baseline.&lt;/strong&gt; With no previous snapshot, everything looks new. The first run just records state and stays quiet instead of firing an alert for all 4,000 items.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignore the noisy fields.&lt;/strong&gt; A scraped_at timestamp or a session token changes every single run. You drop those from the comparison, or restrict it to an allow-list of fields you actually care about.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write snapshots atomically.&lt;/strong&gt; The state file is written to a temp file and renamed, so a run that dies halfway can't leave you with a corrupted snapshot that breaks the next comparison.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using it for real
&lt;/h2&gt;

&lt;p&gt;In practice you want the diff plus the I/O around it: load the last snapshot, run your scraper, compare, alert, save the new snapshot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrape_sentinel&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CallableSource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PipelineConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SnapshotStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ConsoleAlerter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebhookAlerter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_once&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# your requests / Playwright / API code, returns a list of dicts
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch_products&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PipelineConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ignore_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraped_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;alerters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;ConsoleAlerter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;catalog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,)),&lt;/span&gt;
        &lt;span class="nc"&gt;WebhookAlerter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SLACK_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;catalog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,)),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;changes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_once&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CallableSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nc"&gt;SnapshotStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./.state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;   &lt;span class="c1"&gt;# {'added': 1, 'removed': 1, 'changed': 1, 'unchanged': 2}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alerts go to the console, a webhook (Slack or Telegram), or a JSON change log. There's also a CLI with a --fail-on-change exit code, so you can put it on a cron job or a CI step and have the next step run only when something actually moved:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scrape-sentinel run &lt;span class="nt"&gt;--source&lt;/span&gt; json:catalog.json &lt;span class="nt"&gt;--key&lt;/span&gt; sku &lt;span class="nt"&gt;--state&lt;/span&gt; ./.state &lt;span class="nt"&gt;--webhook&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SLACK_WEBHOOK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What it is not
&lt;/h2&gt;

&lt;p&gt;It's not a scraper. It doesn't crawl anything for you. You bring your own requests, Playwright, or API client and hand it a list of dicts, and it owns the diff, the alert, and the snapshot. It's also standard library only, no dependencies, so dropping it into an existing project doesn't pull in a tree. The diff being a pure function is what made it easy to test heavily, which is where most of the suite lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;MIT licensed: &lt;a href="https://github.com/vinimabreu/scrape-sentinel" rel="noopener noreferrer"&gt;https://github.com/vinimabreu/scrape-sentinel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Honestly curious how other people handle this. Do you diff inside the database, keep snapshots on disk like this, hash each record, or something cleaner? It feels like the kind of thing everyone quietly rebuilds, so I'd like to know what I missed.&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>showdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
