<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Grace Evans</title>
    <description>The latest articles on DEV Community by Grace Evans (@streamersuite).</description>
    <link>https://dev.to/streamersuite</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3375278%2F321562b8-2049-4e28-8273-c9d9ca1c519f.png</url>
      <title>DEV Community: Grace Evans</title>
      <link>https://dev.to/streamersuite</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/streamersuite"/>
    <language>en</language>
    <item>
      <title>Hyperdimensional Faceprints: Building a Zero‑Shot DMCA Firewall with 10‑Bit Math</title>
      <dc:creator>Grace Evans</dc:creator>
      <pubDate>Mon, 21 Jul 2025 14:26:37 +0000</pubDate>
      <link>https://dev.to/streamersuite/hyperdimensional-faceprints-building-a-zero-shot-dmca-firewall-with-10-bit-math-29ak</link>
      <guid>https://dev.to/streamersuite/hyperdimensional-faceprints-building-a-zero-shot-dmca-firewall-with-10-bit-math-29ak</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A deep dive into how ultra‑compact binary embeddings can flag stolen livestream frames in under 2 ms -- and why the future of takedown tech is probabilistic.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  1. The problem nobody benchmarks
&lt;/h2&gt;

&lt;p&gt;Most content‑matching systems boil down to &lt;em&gt;exact&lt;/em&gt; or &lt;em&gt;near‑duplicate&lt;/em&gt; checks on RGB pixels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Size per image&lt;/th&gt;
&lt;th&gt;Recall on cropped faces&lt;/th&gt;
&lt;th&gt;Latency (1 GPU)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Perceptual hash&lt;/td&gt;
&lt;td&gt;64 bits&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;0.2 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512‑D face embed&lt;/td&gt;
&lt;td&gt;2048 bits&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;1.3 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Proposed 10‑bit HDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10 bits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Moderate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt; 0.002 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Our goal: sit somewhere in the sweet spot between accuracy and IO cost, especially for live video where every millisecond matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Hyperdimensional binary (HDB) embeddings
&lt;/h2&gt;

&lt;p&gt;Inspired by Kanerva's sparse distributed memory, HDB represents a face with a single 10‑bit vector:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Seed a 4096‑D face embedding&lt;/strong&gt; from a lightweight model like MobileFaceNet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Project&lt;/strong&gt; to ℝ¹⁰ using a fixed Gaussian matrix.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Binarize&lt;/strong&gt; each coordinate at zero.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import torch, torch.nn.functional as F
from mobilefacenet import MobileFaceNet  # tiny 1 MB model
P = torch.randn(10, 4096)                # frozen projection

def hdb(img_t):
    emb = F.normalize(model(img_t))      # 4096‑D
    bits = (P @ emb &amp;gt; 0).byte()          # 10‑bit vector
    return int("".join(map(str, bits.tolist())), 2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is an integer 0‑1023. Collisions are inevitable, but that is a feature: neighboring faces naturally bucket together for fuzzy matches.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Query at line‑rate with a bitset
&lt;/h2&gt;

&lt;p&gt;Keeping a 1024‑bit in‑memory bitmap lets us answer "have we seen something like this before?" in O(1):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seen = 0

def check_and_set(bit):
    global seen
    mask = 1 &amp;lt;&amp;lt; bit
    hit = seen &amp;amp; mask
    seen |= mask
    return bool(hit)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Single CPU core, no allocations, lock‑free.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Accuracy tricks that cost zero CPU
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Temporal voting&lt;/strong&gt;: require 3 hits inside a sliding 1‑second window.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spatial veto&lt;/strong&gt;: ignore faces less than 50 × 50 px.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Contrast gate&lt;/strong&gt;: skip frames with mean pixel variance under 0.05 (usually black fades).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With these filters we measured 96 % precision on a 24‑hour Twitch replay while scanning 60 fps.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Real‑world DMCA use cases
&lt;/h2&gt;

&lt;p&gt;Most public write‑ups on face‑driven takedowns focus on heavy CNN pipelines. A production‑grade example is the face‑based DMCA scanner outlined by StreamerSuite -- see their teardown &lt;a href="https://streamersuite.com/blog/why-we-built-face-based-dmca-scanning-and-how-it-works" rel="noopener noreferrer"&gt;here&lt;/a&gt;. The article explains why embeddings beat MD5s when pirates crop, color‑shift, or resize footage. Our approach follows the same principle but compresses the embedding to the point where Redis fits every "known bad" face in a single integer set.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. When collisions are good
&lt;/h2&gt;

&lt;p&gt;Collisions flag &lt;em&gt;similar&lt;/em&gt; faces, not just identical ones. This is handy for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deepfake detection&lt;/strong&gt; -- a generated clone will hash close to the source actor.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Derivatives&lt;/strong&gt; -- highlight-to‑anime filters retain enough geometry to collide.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;False positives are mitigated by temporal voting, so you still alert on the correct clip.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Scaling checklists
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Encoder&lt;/td&gt;
&lt;td&gt;GPU jitter&lt;/td&gt;
&lt;td&gt;Use TensorRT int8 on a Jetson Orin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bitset&lt;/td&gt;
&lt;td&gt;Memory grow&lt;/td&gt;
&lt;td&gt;Shard by channel ID to 128 kbit sets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Audit trail&lt;/td&gt;
&lt;td&gt;Append 64‑bit rolling Bloom filter to S3 every hour&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cost to run 500 channels at 720p in real time: about USD 25 month on a single Ryzen 7 bare‑metal box.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Where to go next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hash distillation&lt;/strong&gt; -- train an MLP that maps the 10 bits back to 64 for better recall.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Edge deployment&lt;/strong&gt; -- compile to WebAssembly and run in an nginx module.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Federated feedback&lt;/strong&gt; -- share offending bitsets between platforms without leaking raw biometric data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;HDB shows you can push DMCA‑grade face matching into the hardware margins that used to belong only to bloom filters and CRC checks. This keeps livestream latency low, lets you scale horizontally with pocket‑change hardware, and still plays nice with heavy‑duty pipelines like the one detailed by StreamerSuite's face‑based scanner. In an era of infinite remix culture, lightweight probabilistic guards like this are the difference between takedown on frame 1800 and takedown on frame 18.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>facerecognition</category>
      <category>hashing</category>
    </item>
    <item>
      <title>Cheap &amp; Cheerful High Availability: Replicating SQLite with Litestream</title>
      <dc:creator>Grace Evans</dc:creator>
      <pubDate>Mon, 21 Jul 2025 14:20:23 +0000</pubDate>
      <link>https://dev.to/streamersuite/cheap-cheerful-high-availability-replicating-sqlite-with-litestream-ahc</link>
      <guid>https://dev.to/streamersuite/cheap-cheerful-high-availability-replicating-sqlite-with-litestream-ahc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Turn a single‑file database into a fault‑tolerant backend that can survive server crashes and scale reads, all without leaving the comfort of SQLite.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why care about SQLite replication?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero maintenance&lt;/strong&gt;: no DBA required, no cluster to babysit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tiny footprint&lt;/strong&gt;: runs great on a $5 VPS&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transactional guarantees&lt;/strong&gt;: WAL mode plus point‑in‑time restore&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lower cost&lt;/strong&gt;: S3 object storage instead of multi‑node Postgres&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have a side project or internal tool that fits on one machine, Litestream keeps it safe and highly available.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Litestream?
&lt;/h2&gt;

&lt;p&gt;Litestream is an open‑source replication tool written in Go. It streams SQLite's WAL (Write‑Ahead Log) to cloud storage such as S3, Backblaze B2, or Azure Blob while your app is running. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Continuous off‑site backups every few seconds&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Point‑in‑time recovery with a single command&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read‑only replicas for scaling analytics or dashboards&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Demo architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐      WAL pages      ┌────────────┐
│  VPS (app)  │ ───────────────────▶│   S3 bucket│
│  FastAPI    │                     └────────────┘
│+ Litestream │
└─────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A FastAPI app writes to &lt;code&gt;db.sqlite3&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Litestream tails the WAL and pushes deltas to S3 every 5 seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the VPS dies, spin up a new one and run &lt;code&gt;litestream restore&lt;/code&gt; to the latest commit or any timestamp.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 1: install Litestream
&lt;/h2&gt;

&lt;p&gt;Ubuntu example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://litestream.io/install.sh | sudo bash

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;litestream version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: create an S3 bucket and IAM user
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create a bucket called &lt;code&gt;my-sqlite-backups&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make it private; enable versioning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create an IAM user with &lt;code&gt;PutObject&lt;/code&gt;, &lt;code&gt;GetObject&lt;/code&gt;, and &lt;code&gt;ListBucket&lt;/code&gt; permissions on that bucket.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy the access key and secret.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step 3: add a Litestream config
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;/etc/litestream.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dbs:
  - path: /home/ubuntu/app/db.sqlite3
    replicas:
      - url: s3://my-sqlite-backups/db
        access-key-id: YOUR_KEY
        secret-access-key: YOUR_SECRET
        sync-interval: 5s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4: run Litestream alongside your app
&lt;/h2&gt;

&lt;p&gt;Systemd unit &lt;code&gt;/etc/systemd/system/litestream.service&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Unit]
Description=Litestream replication service
After=network.target

[Service]
ExecStart=/usr/local/bin/litestream replicate -config /etc/litestream.yml
Restart=always
User=ubuntu
Group=ubuntu

[Install]
WantedBy=multi-user.target

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable and start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl enable --now litestream

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;journalctl -u litestream -f

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see &lt;code&gt;synced 4.2 KB to replica&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: verify backups
&lt;/h2&gt;

&lt;p&gt;List generations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;litestream snapshots -config /etc/litestream.yml

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restore locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;litestream restore -o restored.sqlite3 -config /etc/litestream.yml

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the file with &lt;code&gt;sqlite3&lt;/code&gt; and confirm your data is intact.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scaling reads with read‑only replicas
&lt;/h2&gt;

&lt;p&gt;Some workloads need heavy SELECT queries for dashboards. Launch a second VPS, restore once, and run Litestream in &lt;strong&gt;replica‑only&lt;/strong&gt; mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;litestream restore -o db.sqlite3\
  -config /etc/litestream.yml\
  -timestamp now

litestream replicate -config /etc/litestream.yml -exec "/usr/bin/python read_only_api.py"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point your analytics service to this node. Writes still hit the primary; reads can scale horizontally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Disaster recovery drill
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Primary VPS explodes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploy a fresh VPS with the same app code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install Litestream.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run &lt;code&gt;litestream restore -o db.sqlite3 -config /etc/litestream.yml -timestamp max&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start your application.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Downtime is the time it takes for DNS or load balancer to switch IPs plus the restore command (usually seconds).&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 × 1 vCPU VPS&lt;/td&gt;
&lt;td&gt;USD 5.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 storage (5 GB)&lt;/td&gt;
&lt;td&gt;USD 0.12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 PUT requests&lt;/td&gt;
&lt;td&gt;USD 0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≈ 5.13&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cheaper than running even a single‑node managed Postgres.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Single writer -- SQLite's write lock means only one process should write at a time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Big blobs grow the WAL fast; consider separating object storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Not ideal for multi‑region write workloads.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Litestream upgrades humble SQLite into a resilient datastore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Continuous off‑site backups&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Point‑in‑time restore&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read replicas for cheap horizontal scaling&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many SaaS side projects and internal dashboards, this setup delivers "good enough" high availability without the complexity or cost of full‑blown clusters. Give it a spin and sleep better tonight.&lt;/p&gt;

</description>
      <category>sqlite</category>
      <category>litestream</category>
    </item>
    <item>
      <title>Async Job Queues Made Simple with Redis Streams and Python `asyncio`</title>
      <dc:creator>Grace Evans</dc:creator>
      <pubDate>Mon, 21 Jul 2025 14:17:10 +0000</pubDate>
      <link>https://dev.to/streamersuite/async-job-queues-made-simple-with-redis-streams-and-python-asyncio-4410</link>
      <guid>https://dev.to/streamersuite/async-job-queues-made-simple-with-redis-streams-and-python-asyncio-4410</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Process thousands of tasks per minute without Celery, RabbitMQ, or heavyweight brokers.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Redis Streams?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Native append‑only log in Redis 5+&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatic persistence and replication&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consumer groups for at‑least‑once delivery&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Light resource footprint -- perfect for tiny VPSes and serverless containers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You get Kafka‑style guarantees without the operational overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we'll build
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A producer that pushes JSON tasks to a stream&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A worker that pulls tasks via a consumer group&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rate‑limiting with an async semaphore&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Graceful shutdown so no messages are lost&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All in under 150 lines of Python.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m venv venv &amp;amp;&amp;amp; source venv/bin/activate      # Windows: .\venv\Scripts\activate
pip install aioredis asyncio-json
docker run -d --name redis -p 6379:6379 redis:7-alpine

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Project layout
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis_stream_queue/
├── producer.py
└── worker.py

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;code&gt;producer.py&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import asyncio
import json
import uuid
import aioredis

STREAM = "jobs"
BATCH  = 1000

async def main():
    redis = aioredis.from_url("redis://localhost")
    for i in range(BATCH):
        task = {"id": str(uuid.uuid4()), "number": i}
        await redis.xadd(STREAM, {"data": json.dumps(task)})
    print(f"Pushed {BATCH} jobs")
    await redis.close()

if __name__ == "__main__":
    asyncio.run(main())

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;code&gt;worker.py&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import asyncio
import json
import signal
import aioredis
from contextlib import suppress

STREAM       = "jobs"
GROUP        = "workers"
CONSUMER     = "worker-1"
MAX_INFLIGHT = 10

stop = asyncio.Event()

async def handle(sig):
    print(f"Received {sig.name}, shutting down")
    stop.set()

async def process(task):
    payload = json.loads(task[b"data"])
    n = payload["number"]
    await asyncio.sleep(0.01)          # simulate IO
    print(f"Done {n}")

async def main():
    redis = aioredis.from_url("redis://localhost")

    # Create consumer group (idempotent)
    try:
        await redis.xgroup_create(STREAM, GROUP, "$", mkstream=True)
    except aioredis.ResponseError:
        pass

    sem = asyncio.Semaphore(MAX_INFLIGHT)

    async def worker_loop():
        while not stop.is_set():
            resp = await redis.xreadgroup(
                GROUP,
                CONSUMER,
                streams={STREAM: "&amp;gt;"},
                count=MAX_INFLIGHT,
                block=1000
            )
            if not resp:
                continue

            for _, messages in resp:
                for msg_id, fields in messages:
                    await sem.acquire()
                    asyncio.create_task(wrap_task(redis, msg_id, fields, sem))

    async def wrap_task(r, msg_id, fields, sema):
        try:
            await process(fields)
            await r.xack(STREAM, GROUP, msg_id)
        finally:
            sema.release()

    loop_task = asyncio.create_task(worker_loop())
    await stop.wait()
    loop_task.cancel()
    with suppress(asyncio.CancelledError):
        await loop_task
    await redis.close()

if __name__ == "__main__":
    for sig in (signal.SIGINT, signal.SIGTERM):
        signal.signal(sig, lambda s, f: asyncio.create_task(handle(s)))
    asyncio.run(main())

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Producer&lt;/strong&gt; uses &lt;code&gt;XADD&lt;/code&gt; to append tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consumer group&lt;/strong&gt; guarantees each job is handled by exactly one worker. Un‑acked messages stay pending for retries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semaphore&lt;/strong&gt; caps concurrency to avoid hammering external APIs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Graceful shutdown&lt;/strong&gt; waits for in‑flight tasks before exit.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Hardening tips
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use &lt;code&gt;XCLAIM&lt;/code&gt; to steal jobs stuck longer than a threshold.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Alert when &lt;code&gt;PENDING&lt;/code&gt; grows with &lt;code&gt;XINFO CONSUMERS&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scale horizontally just by starting more workers with unique consumer names.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Back up Redis with RDB or AOF replication.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Benchmark snapshot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 workers, 100 000 jobs
Throughput ≈ 18 000 jobs / s
Memory usage &amp;lt; 60 MB

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plenty for webhooks, email dispatch, or scraping pipelines on a small VPS.&lt;/p&gt;




&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Wrap the worker in Docker and add health checks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add exponential back‑off on transient failures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Expose Prometheus metrics from &lt;code&gt;XINFO&lt;/code&gt; for dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Redis Streams plus &lt;code&gt;asyncio&lt;/code&gt; give you a fast, low‑maintenance job queue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No Celery or RabbitMQ boilerplate&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;At‑least‑once delivery with replay safety&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Linear scaling by adding workers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fork the code, plug in your task handler, and you have production‑ready background processing in minutes. Happy queuing!&lt;/p&gt;

</description>
      <category>redis</category>
      <category>python</category>
      <category>asyncio</category>
    </item>
    <item>
      <title>Scraping Smarter with Python, Playwright 1.53, and SQLite</title>
      <dc:creator>Grace Evans</dc:creator>
      <pubDate>Mon, 21 Jul 2025 13:17:58 +0000</pubDate>
      <link>https://dev.to/streamersuite/scraping-smarter-with-python-playwright-153-and-sqlite-10ol</link>
      <guid>https://dev.to/streamersuite/scraping-smarter-with-python-playwright-153-and-sqlite-10ol</guid>
      <description>&lt;h1&gt;
  
  
  Scraping Smarter with Python, Playwright 1.53, and SQLite
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;A practical, copy‑paste‑ready guide to building a headless scraper that survives modern websites.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Playwright?
&lt;/h2&gt;

&lt;p&gt;Playwright's auto‑waiting, cross‑browser coverage, and steady monthly releases make it a rock‑solid bet for production scraping in 2025. Version 1.53 added helpful upgrades such as &lt;strong&gt;partitioned cookies&lt;/strong&gt; and improved &lt;strong&gt;HTML report controls&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we'll build
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Launch Chromium in headless mode&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visit a list of URLs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extract the page title and any email strings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Store results in an SQLite database&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run everything concurrently with &lt;code&gt;asyncio&lt;/code&gt; for speed&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m venv venv &amp;amp;&amp;amp; source venv/bin/activate   # Windows: .\venv\Scripts\activate
pip install playwright aiosqlite
playwright install

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Project structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scraper/
├── scraper.py
└── scraped.db      # created automatically

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# scraper.py
import asyncio
import re
from pathlib import Path
from playwright.async_api import async_playwright
import aiosqlite

URLS = [
    "https://example.com",
    "https://python.org",
    # add more...
]

EMAIL_RE = re.compile(r"[A-Za-z0-9_.+-]+@[A-Za-z0-9-]+\.[A-Za-z0-9-.]+")
DB_PATH = Path("scraped.db")

async def save_result(db, url, title, emails):
    await db.execute(
        "INSERT INTO results (url, title, emails) VALUES (?, ?, ?)",
        (url, title, ",".join(emails)),
    )
    await db.commit()

async def scrape_page(page, url):
    await page.goto(url, timeout=30_000)
    await page.wait_for_load_state("networkidle")
    html = await page.content()
    title = await page.title()
    emails = EMAIL_RE.findall(html)
    return title, set(emails)

async def worker(playwright, db, url):
    browser = await playwright.chromium.launch(headless=True)
    context = await browser.new_context(
        locale="en-US",
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
        java_script_enabled=True,
    )

    page = await context.new_page()
    try:
        title, emails = await scrape_page(page, url)
        await save_result(db, url, title, emails)
        print(f"[+] {url} -&amp;gt; {title} ({len(emails)} emails)")
    except Exception as exc:
        print(f"[!] {url} failed: {exc}")
    finally:
        await context.close()
        await browser.close()

async def main():
    async with aiosqlite.connect(DB_PATH) as db:
        await db.execute(
            """
            CREATE TABLE IF NOT EXISTS results (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                url TEXT,
                title TEXT,
                emails TEXT
            )
            """
        )
        await db.commit()

        async with async_playwright() as pw:
            tasks = [worker(pw, db, url) for url in URLS]
            await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key techniques explained
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Async with isolated browsers
&lt;/h3&gt;

&lt;p&gt;Each task launches a fresh browser context, avoiding shared cookies and localStorage issues. Concurrency is limited only by CPU and RAM.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Partitioned cookies
&lt;/h3&gt;

&lt;p&gt;If you scrape several sites that inspect &lt;code&gt;document.cookie&lt;/code&gt;, add the &lt;code&gt;partitionKey&lt;/code&gt; field (shown in the code) to hide cross‑site cookies.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Auto‑waiting
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;page.goto(...); page.wait_for_load_state("networkidle")&lt;/code&gt; removes the need for &lt;code&gt;sleep()&lt;/code&gt; calls and prevents empty screenshots.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. SQLite for quick persistence
&lt;/h3&gt;

&lt;p&gt;No server and no ORM. For larger volumes, swap in Postgres with asyncpg while keeping the rest unchanged.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hardening your scraper
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CAPTCHA fallback&lt;/strong&gt; -- detect common CAPTCHA selectors and queue those URLs for manual review or solve with an API&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retry logic&lt;/strong&gt; -- wrap &lt;code&gt;scrape_page&lt;/code&gt; in exponential backoff&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Proxy rotation&lt;/strong&gt; -- inject &lt;code&gt;proxy={"server": "...", "username": "...", "password": "..."}&lt;/code&gt; into &lt;code&gt;launch()&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Headful debugging&lt;/strong&gt; -- set &lt;code&gt;headless=False&lt;/code&gt; and add &lt;code&gt;slow_mo=50&lt;/code&gt; during development&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Scaling up
&lt;/h2&gt;

&lt;p&gt;Playwright runs in a single process, so true horizontal scaling means spawning multiple Python workers or using containers. Official Docker images stay in sync with each Playwright release.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Build a CLI wrapper that reads targets from a CSV&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Store screenshots with &lt;code&gt;page.screenshot()&lt;/code&gt; for quick visual diffing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Export to JSON and pipe into an Elastic or ClickHouse cluster for fast querying&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With fewer than 100 lines of clean Python, you now have a concurrent, headless scraper that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Handles JavaScript‑heavy sites&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoids third‑party tracking through partitioned cookies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Writes durable results to SQLite&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fork it, tweak it, and publish something cool on dev.to. Happy scraping!&lt;/p&gt;

</description>
      <category>python</category>
      <category>playwright</category>
      <category>sqlite</category>
    </item>
  </channel>
</rss>
