<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Hnyk</title>
    <description>The latest articles on DEV Community by Daniel Hnyk (@hnykda).</description>
    <link>https://dev.to/hnykda</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3779755%2Fce8849eb-5dc8-43b3-9d66-a183ca61dffb.jpeg</url>
      <title>DEV Community: Daniel Hnyk</title>
      <link>https://dev.to/hnykda</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hnykda"/>
    <language>en</language>
    <item>
      <title>The Self-Optimizing SEO Pipeline</title>
      <dc:creator>Daniel Hnyk</dc:creator>
      <pubDate>Fri, 20 Mar 2026 19:07:58 +0000</pubDate>
      <link>https://dev.to/hnykda/the-self-optimizing-seo-pipeline-2jfm</link>
      <guid>https://dev.to/hnykda/the-self-optimizing-seo-pipeline-2jfm</guid>
      <description>&lt;p&gt;&lt;em&gt;These posts are somewhere between a case study and a forkable example. We open-sourced the skills, agents, and Python utilities at &lt;a href="https://github.com/futuresearch/example-cc-cronjob" rel="noopener noreferrer"&gt;github.com/futuresearch/example-cc-cronjob&lt;/a&gt; - they won't work as-is (you'll need your own API keys and sources), but they show all the important bits we use in production. We build &lt;a href="https://futuresearch.ai" rel="noopener noreferrer"&gt;FutureSearch&lt;/a&gt; - forecast, score, classify, or research every row of a dataset - and these pipelines are how we market it.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update (March 2026):&lt;/strong&gt; When this post was written, we had two separate domains — futuresearch.ai for research articles and everyrow.io for product pages and docs. We've since consolidated everything onto futuresearch.ai. The pipeline is simpler now: one domain, one GSC property.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;SEO for a small product is a treadmill. We have 75 pages and our top page has 14,757 impressions and 7 clicks - 0.05% CTR. Thousands of people see that listing and scroll past it every week. Figuring out which titles to change, what to change them to, and whether the last change helped or hurt is a spreadsheet job nobody does consistently. But it compounds: a title change that lifts CTR from 0.03% to 0.1% on a 14,000-impression page means 10 more clicks per week.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://futuresearch.ai/blog/marketing-pipeline-using-claude-code" rel="noopener noreferrer"&gt;marketing pipeline from Post 3&lt;/a&gt; scans communities for people with data problems. This pipeline does something narrower: it reads our own search data and proposes changes to improve what we already have. It reads a week of Google Search Console data, spawns an Opus-model agent for every page, and proposes title and description changes. Each agent reads the history of every change we've made to that page, what the search data looked like before and after, and whether the outcome improved. The next suggestion comes from that history, and it gets better over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pipeline
&lt;/h2&gt;

&lt;p&gt;Five phases. 330 lines of markdown, running on the infrastructure from &lt;a href="https://futuresearch.ai/blog/claude-code-kubernetes-cronjob" rel="noopener noreferrer"&gt;Post 1&lt;/a&gt; using the workflow patterns from &lt;a href="https://futuresearch.ai/blog/claude-code-workflow-engine" rel="noopener noreferrer"&gt;Post 2&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Collect GSC Data
  └── MCP server fetches from Google Search Console (both domains)
       ↓ 6 API calls → raw JSON on disk
Phase 2: Prepare Per-Page Inputs
  └── Python script computes deltas, matches queries to pages
       ↓ 75 per-page JSON files
Phase 3: Analyze All Pages
  └── seo-page-analyzer agents (batches of 10) + seo-new-page-proposer
       ↓ each agent writes suggestion back to its input file
Phase 4: Record Proposed Changes
  └── Collect all suggestions into changes JSON
       ↓
Phase 5: Report + PR
  └── Markdown report with performance table + proposed changes
       ↓ branch, commit, push, PR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How It Collects Data
&lt;/h2&gt;

&lt;p&gt;The pipeline reads from Google Search Console via an MCP server - &lt;a href="https://github.com/AminForou/mcp-gsc" rel="noopener noreferrer"&gt;mcp-server-gsc&lt;/a&gt;. One-time setup: a &lt;code&gt;.mcp.json&lt;/code&gt; in the project root (the credentials file mounts as a Kubernetes secret):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"google-search-console"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp-server-gsc"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"GOOGLE_APPLICATION_CREDENTIALS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./gsc-credentials.json"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code discovers the tool automatically. The skill file says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;mcp__google-search-console__search_analytics:
  siteUrl: "sc-domain:futuresearch.ai"
  startDate: "{start}"
  endDate: "{end}"
  dimensions: "query,page"
  rowLimit: 25000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six API calls total - page performance, query-page mappings, and all queries, for each domain. Raw JSON lands on disk.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Decides What to Change
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;lib/seo_prepare.py&lt;/code&gt; transforms the raw GSC data into per-page input files. Each file has everything an agent needs to make a judgment call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai-revenue-forecast"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"futuresearch.ai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"research"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"current_metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI's Financial Forecast 2025-2027"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"gsc_current"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"clicks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"impressions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;14480&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ctr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;7.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"queries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai revenue 2026"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"impressions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;716&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"gsc_diff"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"clicks_delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"impressions_delta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2961&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"experiment_history"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://futuresearch.ai/blog/claude-code-workflow-engine" rel="noopener noreferrer"&gt;lib + agent pattern&lt;/a&gt; from Post 2: Python handles the mechanical work (parsing JSON, computing deltas, matching queries to pages), and the agent handles the judgment (is this title working? did last week's experiment improve CTR?).&lt;/p&gt;

&lt;p&gt;The skill runs agents in batches of 10. Each &lt;code&gt;seo-page-analyzer&lt;/code&gt; - running Opus, because judgment matters here - gets one page and makes one decision: suggest a title change, a description change, a content change, or nothing. Eight batches cover all pages. A separate &lt;code&gt;seo-new-page-proposer&lt;/code&gt; reads unmatched queries and flags gaps where we're missing traffic entirely.&lt;/p&gt;

&lt;p&gt;The agents follow a decision framework in the agent definition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product pages (&lt;a href="https://futuresearch.ai/docs/reference/DEDUPE" rel="noopener noreferrer"&gt;Dedupe&lt;/a&gt;, &lt;a href="https://futuresearch.ai/docs/reference/MERGE" rel="noopener noreferrer"&gt;Merge&lt;/a&gt;, &lt;a href="https://futuresearch.ai/docs/reference/RANK" rel="noopener noreferrer"&gt;Rank&lt;/a&gt;, &lt;a href="https://futuresearch.ai/docs/reference/SCREEN" rel="noopener noreferrer"&gt;Screen&lt;/a&gt;) always get experiments, even at zero impressions. Low traffic is a reason to experiment.&lt;/li&gt;
&lt;li&gt;Research pages with CTR above 2% and good position get left alone unless the top queries clearly don't match the title.&lt;/li&gt;
&lt;li&gt;Title formats rotate - question, how-to, keyword-colon-descriptor, direct imperative - so the site doesn't turn formulaic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On one run, the same pipeline proposed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"How to Search Government Websites at Scale, for Investors" → "Which Texas Cities Have the Fastest Permit Approval Times?" - question format, specific geography&lt;/li&gt;
&lt;li&gt;"Using LLMs for Data Cleaning At Scale" → "LLM Deduplication at 20,000 Rows: F1=0.996 for $1.12 per 1k Rows" - specific numbers for a developer audience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output of a single run is a PR. Two real excerpts from the March 18th report - one routine, one where the history caught a mistake:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**forecasting-top-ai-lab-2026**&lt;/span&gt; - description
&lt;span class="p"&gt;-&lt;/span&gt; Was: (empty)
&lt;span class="p"&gt;-&lt;/span&gt; Proposed: "We ranked OpenAI, Anthropic, Google DeepMind, xAI, and Meta across
  model quality, data, compute, talent, and R&amp;amp;D automation. See who is winning
  the AI race in 2026 and where each lab stands heading into Q2."
&lt;span class="p"&gt;-&lt;/span&gt; Why: 14,757 impressions, 7 clicks (0.05% CTR) despite ranking position 1-5 for
  many queries. Description is empty - Google is writing its own snippet. Adding
  a concrete description is the lowest-effort lever left on this page.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**lead-scoring-without-crm**&lt;/span&gt; - title
&lt;span class="p"&gt;-&lt;/span&gt; Was: "How to Score Leads with AI When You Don't Have a CRM"
&lt;span class="p"&gt;-&lt;/span&gt; Proposed: "AI Lead Scoring Without Clay: Rank 500 Prospects for $28"
&lt;span class="p"&gt;-&lt;/span&gt; Why: Previous experiment removed 'Clay' from the title. Result: clay lead
  scoring impressions dropped from 39 to 1, all Clay-related queries lost.
  History shows this was a clear regression. Adding it back with specific
  numbers targets the audience that was converting.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing gets applied automatically. A human reviews the proposals, picks the ones worth trying, and applies them. Takes about 20 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Gets Better
&lt;/h2&gt;

&lt;p&gt;Every page's input file includes &lt;code&gt;experiment_history&lt;/code&gt; - every change we've made, when we made it, the search data before and after, and whether the outcome improved, stayed flat, or regressed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"experiment_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"change_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"old_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI Revenue Report"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"new_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OpenAI's Revenue in 2027: A Comprehensive Forecast"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data_before"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"clicks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"impressions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ctr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;8.2&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data_after"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"clicks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"impressions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;22039&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ctr"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;7.5&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outcome"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"improved"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The analyzer reads this before suggesting the next change. A title that improved CTR informs the next experiment. One that regressed is a "don't repeat this" marker. It's closer to a consultant who keeps notes than anything resembling ML. The JSON file is the notebook. Each run reads it before writing in it.&lt;/p&gt;

&lt;p&gt;The agents don't share history across pages. The learning is per-page: what was tried, what happened, what to try next. After six runs across two months, some patterns are clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Question-format titles outperform statement titles for research articles&lt;/li&gt;
&lt;li&gt;Specific numbers in case study titles ("F1=0.996 for $1.12 per 1k Rows") lift CTR on developer-focused pages&lt;/li&gt;
&lt;li&gt;Empty descriptions on high-impression pages are a recurring catch - our top page ran for weeks with no meta description while Google wrote one for us&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where It Stands
&lt;/h2&gt;

&lt;p&gt;Pages analyzed grew from 35 to 80 over the first few runs. From the March 18th run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;80 pages tracked&lt;/li&gt;
&lt;li&gt;14,757 impressions on our top page (forecasting-top-ai-lab-2026)&lt;/li&gt;
&lt;li&gt;69 changes proposed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The docs pages are still early. The &lt;a href="https://futuresearch.ai/docs/reference/DEDUPE" rel="noopener noreferrer"&gt;Dedupe&lt;/a&gt; reference page has 12 impressions. The &lt;a href="https://futuresearch.ai/docs/reference/MERGE" rel="noopener noreferrer"&gt;Merge&lt;/a&gt; reference page has 0. The pipeline treats them the same as the 14,757-impression research articles, but with different rules: always experiment on product pages, leave well-performing research pages alone. We're building product page SEO while the research articles carry traffic.&lt;/p&gt;

&lt;p&gt;A non-technical person on the team opens the PR, reads through the proposed changes, and applies the ones that make sense. The pipeline produces 69 suggestions with reasoning and data. The human spends 20 minutes deciding which ones to run. Neither does this alone - the human wouldn't compute deltas across 80 pages every week, and the pipeline doesn't get to change titles on a 14,000-impression page without someone reviewing it first.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: &lt;a href="https://futuresearch.ai/blog/pipeline-uses-its-own-product" rel="noopener noreferrer"&gt;An LLM Pipeline That Uses Its Own Product&lt;/a&gt; - the pipeline that finds today's news, calls our own product, and generates sardonic data visualizations about Microsoft Copilot's dignity.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We build &lt;a href="https://futuresearch.ai" rel="noopener noreferrer"&gt;FutureSearch&lt;/a&gt; - forecast, score, classify, or research every row of a dataset. This pipeline is how we optimize its SEO.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://futuresearch.ai/" rel="noopener noreferrer"&gt;FutureSearch&lt;/a&gt; lets you run your own team of AI researchers and forecasters on any dataset. &lt;a href="https://futuresearch.ai" rel="noopener noreferrer"&gt;Try it for yourself.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Marketing Pipeline Using Claude Code</title>
      <dc:creator>Daniel Hnyk</dc:creator>
      <pubDate>Wed, 11 Mar 2026 16:06:53 +0000</pubDate>
      <link>https://dev.to/hnykda/marketing-pipeline-using-claude-code-3ll8</link>
      <guid>https://dev.to/hnykda/marketing-pipeline-using-claude-code-3ll8</guid>
      <description>&lt;p&gt;&lt;em&gt;These posts are somewhere between a case study and a forkable example. We open-sourced the skills, agents, and Python utilities at &lt;a href="https://github.com/futuresearch/example-cc-cronjob" rel="noopener noreferrer"&gt;github.com/futuresearch/example-cc-cronjob&lt;/a&gt; - they won't work as-is (you'll need your own API keys and sources), but they show all the important bits we use in production. We build &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt; - forecast, score, classify, or research every row of a dataset - and these pipelines are how we market it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;People who need &lt;a href="//futuresearch.ai"&gt;futuresearch.ai&lt;/a&gt; are out there - scattered across Reddit, StackOverflow, HubSpot forums, Salesforce communities, Make.com, Airtable, Shopify, GitHub, and a dozen others. Someone deduplicating a CRM where "IBM" and "International Business Machines" are the same company. Someone joining two tables that share no common key. Someone ranking leads by criteria a spreadsheet formula can't express. We narrowed it down to 18 sources where these conversations happen most often. The problem is that maybe 2-3% of posts are actually relevant. Manually scanning hundreds of posts every morning to find two or three good ones is not something a human is going to keep doing.&lt;/p&gt;

&lt;p&gt;So we built a pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Scan
  └── 18 Python scanners fetch posts from Reddit, StackOverflow, HubSpot, ...
       ↓ dedup against seen.txt
Phase 2: Enrich
  └── Fetch full thread content: comments, replies, author info, vote counts
       ↓
Phase 3: Classify
  └── 13-question rubric per thread, assign score 1-5
       ↓ filter to score 4-5
Phase 4: Propose
  └── Select strategy, match to demo catalog, draft forum response
       ↓
Phase 5: Report + PR
  └── Markdown report with metrics, draft responses. Branch, commit, push, PR.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every weekday at 08:00 UTC, a CronJob runs this end-to-end, unattended, in about 14 minutes. The output is a pull request someone on the team opens over coffee. It runs on the infrastructure from &lt;a href="https://dev.to/blog/claude-code-kubernetes-cronjob"&gt;Post 1&lt;/a&gt;, using the workflow patterns from &lt;a href="https://dev.to/blog/claude-code-workflow-engine"&gt;Post 2&lt;/a&gt;. This post is about putting the concepts from Post 1 and Post 2 in use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dealing with Signal vs Noise
&lt;/h2&gt;

&lt;p&gt;A typical run from February: 57 opportunities scanned, 35 enriched with full thread content, 35 classified. Score distribution: 1 scored 5, 1 scored 4, 33 scored 1-2. Eighty-nine percent is noise. And that's fine - those two good ones are what the whole pipeline exists for.&lt;/p&gt;

&lt;p&gt;The noise is varied and no keyword filter catches it. About 50% of Reddit "opportunities" turn out to be competitor marketing posts dressed up as questions - someone promoting their deduplication tool while pretending to ask for advice. Discussion threads that start with "What's your favorite..." are never opportunities. Platform configuration bugs dressed as data problems - someone's Make.com aggregator is misconfigured, not facing a data quality issue. Career questions on Snowflake forums. "Show HN" builder posts. Exact-match problems where VLOOKUP works fine and the person just hasn't tried it yet.&lt;/p&gt;

&lt;p&gt;Therefore, we have our own LLM powered classifier that uses a rubric with 13 structured questions. Not all of them are interesting (you can all of them in the example repo), but these are the ones that carry the most weight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;canonical&lt;/strong&gt;: Is this a common problem others face daily, or bespoke? A canonical problem means a response helps thousands of future readers, not just one person.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tools_tried&lt;/strong&gt;: What have they already tried? If they've tried fuzzy matching and it failed, they already understand why their problem is hard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tried_llms&lt;/strong&gt;: Have they tried ChatGPT for this? If they tried and it didn't work, they need a tool that actually scales.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;importance&lt;/strong&gt;: Does this look important? Business process blocked? "Our admin is drowning" is a different signal than "just curious."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;commenter_solutions&lt;/strong&gt;: What are commenters saying? If someone already solved it with a native platform feature - and the poster accepted the answer - there's no opportunity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;person_importance&lt;/strong&gt;: Does the person look important? A StackOverflow user with 700k reputation answering "there's no solution" makes the thread more visible, not less.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The classifier's instructions include: "At no point should you Write() a Python script. If you think you need one, it's because you misunderstood these instructions." We added this after a classifier tried to write a sentiment analysis script instead of just reading the thread and thinking about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Examples: Three Real Finds
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Brazilian cities.&lt;/strong&gt; Someone on StackOverflow was manually fixing about 5,000 Brazilian city name variants with SQL &lt;code&gt;UPDATE&lt;/code&gt; statements. Bill Karwin - one of the highest-reputation answerers on StackOverflow - wrote: "there's no solution to correct 100% of the variations." SOUNDEX fails on Portuguese phonetics. The pattern table approach from another answer still requires manually enumerating every variation.&lt;/p&gt;

&lt;p&gt;The pipeline found this at 8am scanning the &lt;code&gt;record-linkage&lt;/code&gt; tag. The classifier scored it 5. The proposer matched it to demo C11 (Challenging + Messy) and drafted a response showing the &lt;a href="https://github.com/futuresearch/everyrow-sdk" rel="noopener noreferrer"&gt;&lt;code&gt;everyrow&lt;/code&gt; SDK&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dedupe&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;dedupe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cities_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;equivalence_relation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Same Brazilian city, accounting for:
        - Accent differences (Florianopolis vs Florianópolis)
        - Abbreviations (Sto Andre vs Santo André, S Jose vs São José)
        - Typos and spacing variations
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;select&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;equivalence_relation&lt;/code&gt; is natural language - you describe what counts as a match and the model handles the linguistic reasoning. No regex, no phonetic algorithm, no pattern table. We reviewed the draft, tweaked a sentence, and posted it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Make.com 75K-row CSV.&lt;/strong&gt; A user on Make.com had a 75,000-row CSV and needed both exact AND similar matches. Make.com's AI agent can't handle that scale - it's designed for conversational Q&amp;amp;A, not batch processing. The only commenter suggested exact-match approaches (map/aggregator), which completely miss the semantic similarity requirement. The pipeline classified it as a score-4 opportunity and drafted a response showing how &lt;code&gt;everyrow dedupe&lt;/code&gt; handles the full 75K rows in one pass, with instructions for getting results back into a Make workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Agentforce problem.&lt;/strong&gt; "We bought Agentforce but can't use it because our Salesforce data is a mess." Company names listed 3-4 different ways, contacts missing emails, opportunities linked to wrong accounts. 58 upvotes, 35 comments. This represents a category the pipeline keeps discovering - AI-readiness problems, where companies buy AI tools and find their data isn't ready. The pipeline found it, classified it, and we posted a response showing CRM &lt;a href="https://everyrow.io/docs/reference/DEDUPE" rel="noopener noreferrer"&gt;deduplication&lt;/a&gt; with the SDK: 210 records in, 42 duplicates found, 52 seconds, $0.23.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dedupe&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;dedupe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;crm_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;equivalence_relation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Two entries are duplicates if they represent &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;the same company, accounting for abbreviations, typos, and subsidiaries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# 210 rows → 168 unique entities, 42 duplicates identified
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Works and What Doesn't
&lt;/h2&gt;

&lt;p&gt;After two months of daily runs, the source-level data is clear:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Hit rate&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reddit&lt;/td&gt;
&lt;td&gt;1.5-3%&lt;/td&gt;
&lt;td&gt;Consistently highest signal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Databricks&lt;/td&gt;
&lt;td&gt;~40%&lt;/td&gt;
&lt;td&gt;Low volume (1-2/run) but when it hits, it hits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;StackExchange&lt;/td&gt;
&lt;td&gt;2-5% on classic tags&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;record-linkage&lt;/code&gt;, &lt;code&gt;string-matching&lt;/code&gt; work. &lt;code&gt;excel&lt;/code&gt;, &lt;code&gt;google-sheets&lt;/code&gt; yield 0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Make.com&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Workflow builders who need AI at one step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Salesforce&lt;/td&gt;
&lt;td&gt;Occasional&lt;/td&gt;
&lt;td&gt;High-quality finds when they appear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;132 posts across 7 runs. Zero data problems.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retool&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;300+ posts. Platform support only.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We kept scanning n8n for seven consecutive runs hoping something would turn up. Every run found posts about workflow configuration, OAuth setup, and version upgrade bugs. The learnings file eventually said what we already knew: discontinue.&lt;/p&gt;

&lt;p&gt;Cost: $5-8 per run for our own utilities. Fourteen minutes is the overall runtime, we pay 200 Max Anthropic plan.&lt;/p&gt;

&lt;p&gt;The pipeline also reveals other interesting findings as it analyzes historical questions, such these market shifts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLM adoption inflection&lt;/strong&gt;: People who tried LLMs before asking for help went from 6-8% (2020-2023) to 33% in 2025. A third of our prospects have already tried ChatGPT and found it doesn't scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;StackOverflow collapse&lt;/strong&gt;: StackOverflow went from 23% of our opportunities in 2020 to 3% in 2025. Reddit grew from 6% to 36%. Technical Q&amp;amp;A has fragmented into product-specific communities - which is exactly why we need 18 scanners instead of one.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Response Strategy
&lt;/h2&gt;

&lt;p&gt;For opportunities scoring 4 or 5, product-specific proposer agents take over. Each proposer reads our product docs and a catalog of 29 existing demos, then generates a response using one of these strategies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PROVE_CAPABILITY&lt;/td&gt;
&lt;td&gt;Default (~80%). Show a demo proving we solve the problem.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHOW_SDK_CODE&lt;/td&gt;
&lt;td&gt;Technical audience. Lead with a code snippet.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHOW_INTEGRATION&lt;/td&gt;
&lt;td&gt;Workflow platform users. Show how results fit their pipeline.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EXPLAIN_APPROACH&lt;/td&gt;
&lt;td&gt;Audience wants to understand &lt;em&gt;why&lt;/em&gt; LLMs beat fuzzy matching.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OFFER_HANDS_ON&lt;/td&gt;
&lt;td&gt;Recent post, engaged OP. Offer to run their data.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The proposer matches each problem to the closest existing demo - 29 demos organized by difficulty, and the proposer reads the catalog and picks. When the poster provides sample data, it shows results on &lt;em&gt;their&lt;/em&gt; data. When they don't, it shows results on the closest demo we have.&lt;/p&gt;

&lt;p&gt;The test for every draft: if someone stripped the product mention, would this answer still be useful?&lt;/p&gt;

&lt;p&gt;This is where the loop closes. As we described in &lt;a href="https://dev.to/blog/claude-code-workflow-engine"&gt;Post 2&lt;/a&gt;, the output of the whole system is a pull request. A non-technical person on the team opens it, reads the report, and sees the draft responses with working code snippets and real results. They adjust the tone, maybe add a sentence from their own experience, and post it. The person on the other end gets a genuinely helpful answer to a problem they were stuck on. That's the point - not to pollute forums with product links, but to find people who are actually struggling with something our tools solve and help them. Together, it takes 15 minutes of human time for what would otherwise be a full day of research.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pipeline Teaches Itself
&lt;/h2&gt;

&lt;p&gt;After each run, the pipeline can update a learnings file. These aren't logs - they're instructions for future runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- "Remove 'duplication' tag - returns feature posts, not data problems"
- "Databricks: low volume but 40% conversion. Worth keeping."
- "If native platform feature exists and author accepts it → score 1-2"
- "Christmas Eve: 50% false positives. Likely holiday effect."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next run reads the learnings before it starts. Over 6 weeks: 642 proposals in the database, 3,800+ URLs processed. The pipeline gets better because it remembers what didn't work.&lt;/p&gt;

&lt;p&gt;The best finds aren't always in new threads. Thread archaeology - checking old discussions for unanswered or poorly-answered questions - turned up some of the strongest opportunities. The Agentforce post was months old when the pipeline found it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Simplified Example
&lt;/h2&gt;

&lt;p&gt;We put together a runnable version of this pipeline at &lt;a href="https://github.com/futuresearch/example-cc-cronjob" rel="noopener noreferrer"&gt;github.com/futuresearch/example-cc-cronjob&lt;/a&gt; - the same repo from Post 1, now with a &lt;code&gt;community-scanner&lt;/code&gt; skill alongside the original &lt;code&gt;add-numbers&lt;/code&gt; example. It has the full structure: a skill with all five phases, a classifier agent with the 13-question rubric, a proposer agent with the strategy taxonomy and SDK examples, a Python scanner that fetches from Reddit's public JSON API, and a learnings file the pipeline updates after each run. It scans a few subreddits instead of 18 sources, and runs in a single process instead of fanning out to parallel subagents, but the pipeline logic is the same. Fork it, point it at your subreddits, see what it finds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Know Now
&lt;/h2&gt;

&lt;p&gt;The infrastructure is the easy part. The know-how - which sources to scan, what questions to ask, how to draft a response that genuinely helps someone - is what those daily runs teach you. If this stopped working tomorrow, we'd manually check a few subreddits once a week. Like we did before December.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We build &lt;a href="https://futuresearch.ai" rel="noopener noreferrer"&gt;futuresearch.ai&lt;/a&gt; - forecast, score, classify, or research every row of a dataset. This pipeline is how we find the people who need it.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;FutureSearch lets you run your own team of AI researchers and forecasters on any dataset. &lt;a href="https://futuresearch.ai/blog" rel="noopener noreferrer"&gt;Try it for yourself.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Using Claude Code as a Workflow Engine</title>
      <dc:creator>Daniel Hnyk</dc:creator>
      <pubDate>Fri, 27 Feb 2026 13:14:01 +0000</pubDate>
      <link>https://dev.to/hnykda/using-claude-code-as-a-workflow-engine-403f</link>
      <guid>https://dev.to/hnykda/using-claude-code-as-a-workflow-engine-403f</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 2 of a series on using Claude Code as a production runtime. Originally published on &lt;a href="https://everyrow.io/blog/claude-code-workflow-engine" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Our marketing pipeline scans 18 community sources, enriches threads with full content, classifies opportunities with a 20-question rubric, generates draft forum responses, and creates a pull request - every weekday at 08:00 UTC. The whole pipeline definition is not e.g. Python's functions with some workflow manager and executor like Prefect or Dagster (which are both cool), but - yeah, you guessed it - a markdown file in plain English, written by my boss.&lt;/p&gt;

&lt;p&gt;I don't mean my boss &lt;em&gt;specified&lt;/em&gt; it and an engineer implemented it. I mean he opened &lt;code&gt;SKILL.md&lt;/code&gt; in his editor and typed the pipeline in English. Or more precisely - in the light of this series - he asked Claude Code to write it together with him. It's a markdown file that says things like "spawn 18 scanners in background" and "after phase 1, do phase 2." It's not a formal task DAG and isn't specified in code. And it all runs &lt;em&gt;inside&lt;/em&gt; Claude Code, as described in our &lt;a href="https://everyrow.io/blog/claude-code-kubernetes-cronjob" rel="noopener noreferrer"&gt;first post from the series&lt;/a&gt;. This post is about a generic comparison of such systems, while we will see specific instances of that in the subsequent posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rough Comparison
&lt;/h2&gt;

&lt;p&gt;We're not going to pretend this is better than Prefect or Dagster. For a lot of workloads, it's worse. But "a lot" isn't "all," and we think the tradeoff space is genuinely interesting. Here is a somewhat naive comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Prefect / Dagster&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Task definition&lt;/td&gt;
&lt;td&gt;Python functions, objects, decorators, ...&lt;/td&gt;
&lt;td&gt;Markdown files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAG&lt;/td&gt;
&lt;td&gt;Explicit dependency graph&lt;/td&gt;
&lt;td&gt;"after scanning, enrich"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Workers&lt;/td&gt;
&lt;td&gt;Containerized functions&lt;/td&gt;
&lt;td&gt;Subagents with their own context windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry logic&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@task(retries=3)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"if Python enrichment fails, try WebFetch instead"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding a new integration&lt;/td&gt;
&lt;td&gt;Install plugin, configure IO manager, write config schema&lt;/td&gt;
&lt;td&gt;"read from BigQuery"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaffolding&lt;/td&gt;
&lt;td&gt;Specific decorators, YAML, &lt;code&gt;definitions.py&lt;/code&gt;, webserver config, user code, ...&lt;/td&gt;
&lt;td&gt;Markdown files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;webserver, usercode containers, UIs, DB, ...&lt;/td&gt;
&lt;td&gt;one (cron)job as per Part 1 of this series&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;Dashboards, metrics, alerts, orchestration UIs&lt;/td&gt;
&lt;td&gt;none?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who writes it&lt;/td&gt;
&lt;td&gt;Software engineer&lt;/td&gt;
&lt;td&gt;anyone, in English&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging&lt;/td&gt;
&lt;td&gt;Stack traces, breakpoints&lt;/td&gt;
&lt;td&gt;Absolutely horrendous&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I am not gonna pretend this comparison is not skewed towards Claude Code, it heavily is, as it's not a full replacement of these. Dagster is giving you stuff like sensors, queues, concurrency, work pools and what not, and if you need those, then go for it. What we are covering here is mostly the job runtime and basic orchestration (which could still be plugged into frameworks like this to benefit from both worlds).&lt;/p&gt;

&lt;p&gt;I want to write something like "it's all markdown files", which is a little bit of an exaggeration, but not much! The whole setup is one skill (the orchestrator), a handful of subagent definitions, and some Python libraries for the mechanical stuff. Compare that to e.g. Dagster scaffolding. Dagster is pretty opinionated here and you &lt;em&gt;really&lt;/em&gt; want to do things the way it wants you to - &lt;code&gt;definitions.py&lt;/code&gt;, YAML config, webserver, user code server, and if you want to read from GCS, the right IO manager plugin configured through Dagster's abstraction layer instead of just... asking Claude to use &lt;code&gt;gsutil&lt;/code&gt;. It's all legitimate infrastructure for production workloads. If tomorrow we need to read from BigQuery, we write "query BigQuery for the last 7 days of page analytics" in the skill file and Claude figures out the &lt;code&gt;bq&lt;/code&gt; command or the MCP tool or whatever's available (setting &lt;em&gt;those&lt;/em&gt; + permissions is still some annoying boilerplate though).&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The pipeline is one skill that orchestrates six phases. Most of the heavy lifting is fanned out to subagents running in parallel. We will get into the details of the pipeline in a separate post, but just to give you an idea of what we're talking about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Scan
  ├── Python search script → produces shards
  ├── 18 scanner subagents (one per source: reddit, hubspot, shopify, ...)
  └── N search-scanner subagents (one per shard)
       ↓ poll filesystem for .json / .error files
Phase 2: Enrich
  └── Python enrichment (fetch full thread content, WebFetch fallback)
       ↓
Phase 3: Classify
  └── N classifier subagents (one per enriched file, 20 questions, score 1-5)
       ↓ poll filesystem again
Phase 4: Propose
  └── proposer-{product} subagents (one per product with score 4-5 hits)
       ↓
Phase 5: Report
  └── markdown report with metrics, top opportunities, draft responses
       ↓
Phase 6: Git
  └── branch, commit, push, open PR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator - the main Claude Code process - reads the skill, spawns the subagents via the &lt;code&gt;Task&lt;/code&gt; tool, and coordinates them. The subagents write results to disk. The orchestrator polls for output files rather than collecting agent output directly (we'll get to why in the filesystem section below). The "dependency graph" is just document order: phase 2 comes after phase 1 because it's written after phase 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Accidental Resilience
&lt;/h2&gt;

&lt;p&gt;Here's one aspect that the comparison table doesn't capture: Claude Code is accidentally resilient in ways that traditional orchestrators are not.&lt;/p&gt;

&lt;p&gt;When a Python script hits an unexpected error, it crashes. By default, the state is lost, you go to some logging tool or something and try to find the bug. If you are lucky, you fix the bug, but often you're not sure if it's hard to reproduce (reddit blocks IPs from GCP), you re-run the whole thing, and hope. Orchestrators are trying to help you by giving you e.g. retries mechanisms, which are good, but far from ideal when dealing with unknown unknowns, i.e. how many retries do you need, what the backoff period is, on what type of errors should you try to retry and so on.&lt;/p&gt;

&lt;p&gt;When Claude Code hits an error, it &lt;em&gt;reads the error message and decides what to do.&lt;/em&gt; A library isn't installed in the container? It runs &lt;code&gt;apt-get install&lt;/code&gt; (scary, but awesome). An API returns an unexpected format? It adapts the parsing. The enrichment script returns fewer results than expected? The pipeline instruction says "use WebFetch for the failed URLs" - and it does, for just the ones that failed, preserving everything that already worked.&lt;/p&gt;

&lt;p&gt;This is not magic. It's just that the "retry logic" has access to the same reasoning that wrote the original attempt. It can distinguish between "the server is down, try again" and "this approach won't work, try a different one," in a way traditional retries cannot.&lt;/p&gt;

&lt;p&gt;And the state preservation is a great feature on its own. When running locally and phase 3 of our pipeline fails, phases 1 and 2's results are still on disk &lt;em&gt;and&lt;/em&gt; in the conversation context. If we're running interactively, we can &lt;code&gt;--resume&lt;/code&gt; and say "phase 2 worked fine, start from phase 3 and here's what went wrong." The agent just remembers everything - no checkpoint files, no serialization, no cache key configuration.&lt;/p&gt;

&lt;p&gt;Prefect and Dagster have caching, and it's a real feature. But getting it right is real engineering work: hash the inputs properly for the cache key, make sure the task-level cache interacts correctly with the flow-level cache, handle the case where a cached task succeeds but the next task fails, deciding where the cache is stored... We've been through this, and sometimes it's just not worth the effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Skill Looks Like
&lt;/h2&gt;

&lt;p&gt;This is a real excerpt from our pipeline definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Phase 1: Scan&lt;/span&gt;

&lt;span class="gu"&gt;### Step 1b: Run Domain Scanners&lt;/span&gt;

Spawn all 18 domain scanners in background.
Track each task_id with its source name.

Each: Task (subagent_type: scanner, run_in_background: true): "Scan {source}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the DAG basically: "Spawn 18 things. Track them." Claude Code reads this, spawns 18 subagents, and tracks the task IDs. The "dependency graph" is the document order: Phase 2 comes after Phase 1 because it's written after Phase 1, as... any human naturally works and thinks.&lt;/p&gt;

&lt;p&gt;Running it is what we covered in the &lt;a href="https://dev.to/blog/claude-code-kubernetes-cronjob"&gt;Part 1&lt;/a&gt;. You pass &lt;code&gt;"execute scan-and-classify skill"&lt;/code&gt; as a prompt and it runs. Again, you don't have to think about deployments and flags and if you should use &lt;code&gt;deploy()&lt;/code&gt; or &lt;code&gt;serve()&lt;/code&gt; or anything, just CLI command.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an Agent Looks Like
&lt;/h2&gt;

&lt;p&gt;We do have specialized agents that do specific jobs. Agents are spawned with their own context, so the context of the main orchestrating agent doesn't explode. Each subagent is a markdown file with YAML frontmatter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scanner&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Scan a community source for marketing opportunities.&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bash, Read, Write, Glob, Grep&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonnet&lt;/span&gt;
&lt;span class="na"&gt;permissionMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bypassPermissions&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have 23 of these agent definitions. Scanners, classifiers, proposers, graphics generators, dataset finders, SEO analyzers. Each one is a markdown file describing what the agent should do, what tools it has, and what model to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Does Mechanics, Claude Does Judgment
&lt;/h2&gt;

&lt;p&gt;One of the design principles is &lt;strong&gt;putting mechanics in code and letting Claude make judgments.&lt;/strong&gt; Specifically, it's this separation of concerns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lib/scanners/reddit.py     → Fetches posts, parses JSON, handles rate limits
.claude/agents/scanner.md  → Reads posts, decides "is this a real data problem?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a strict separation - it's totally fine that the agents are writing some code. But if it's something that can be reused and standardized, it's good to add it, but it's still quite wasteful resource-wise to let agents do everything like scanning API endpoints and stuff.&lt;/p&gt;

&lt;p&gt;This is yet another part where running inside Claude Code shines - it can &lt;strong&gt;develop and improve itself while running in production.&lt;/strong&gt; No, really, let that sink in for a second, because you cannot just gloss over it: the development and the runtime blend together. You tell it to run the scanner for a site and it tells you it can't because X, but presents you with a workaround for X that it can incorporate into the skill or lib for future runs. When it encounters that the environment has changed - like an API schema on the remote is different - it can self-correct during the runtime, and even commit that as an improvement for any further runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Filesystem Is the Message Bus
&lt;/h2&gt;

&lt;p&gt;Here's where it gets ugly. In Prefect, it's the orchestrator backend that manages dependencies. Claude is of course able to get state of agents natively, and it worked fine for like 4 agents. When we scaled to 18, the orchestrator's context window filled up with all the returned output, as it seems to be a limitation that Claude cannot get the state without also parsing the output. Claude started forgetting earlier results and producing incomplete reports.&lt;/p&gt;

&lt;p&gt;The fix: &lt;code&gt;run_in_background: true&lt;/code&gt; + filesystem polling. The orchestrator's context went from O(n * output_size) to O(n * filename). The agents write their results to disk and the orchestrator only reads file paths. Specifically, when a scanner agent finishes, it writes &lt;code&gt;data/scans/reddit/2026-02-17-run1.json&lt;/code&gt;. If it fails, it writes &lt;code&gt;data/scans/reddit/2026-02-17-run1.error&lt;/code&gt;. The orchestrator polls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$ELAPSED&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; &lt;span class="nv"&gt;$TIMEOUT&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;SUCCESS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; data/scans/&lt;span class="k"&gt;*&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TODAY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-run&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;.json 2&amp;gt;/dev/null | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;ERRORS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; data/scans/&lt;span class="k"&gt;*&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TODAY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-run&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;.error 2&amp;gt;/dev/null | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;SUCCESS &lt;span class="o"&gt;+&lt;/span&gt; ERRORS&lt;span class="k"&gt;))&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-ge&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EXPECTED&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then &lt;/span&gt;&lt;span class="nb"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;fi
  &lt;/span&gt;&lt;span class="nb"&gt;sleep &lt;/span&gt;10
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;.json&lt;/code&gt; means success. &lt;code&gt;.error&lt;/code&gt; means failure. &lt;code&gt;ls&lt;/code&gt; is the health check. This is not elegant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Timeouts
&lt;/h2&gt;

&lt;p&gt;Agents will run forever if you let them, and given how imprecise and informal this setup is, you need to add some limits to it. There is nothing especially interesting here, but for completeness, we implemented that on four layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: &lt;code&gt;max_turns&lt;/code&gt; per agent.&lt;/strong&gt; A hard limit on API round-trips. When a news-finder hits 30 turns, Claude Code stops it and returns whatever it has. We tuned these empirically - 30 was too few for news-finder, 20 was right for dataset-finder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Wall-clock cap per phase.&lt;/strong&gt; 10 minutes. If a batch of agents hasn't finished, move on with whatever completed. Mark the stragglers as "timeout" in the report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Bash &lt;code&gt;timeout 10800&lt;/code&gt;.&lt;/strong&gt; After 3 hours, a second Claude wakes up to salvage partial results (see &lt;a href="https://dev.to/blog/claude-code-kubernetes-cronjob"&gt;Post 1&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 4: activeDeadlineSeconds.&lt;/strong&gt; And finally the hard limit enforced by Kubernetes, set to 4 hours in our case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging Experience Is Bad
&lt;/h2&gt;

&lt;p&gt;I want to be honest about this. Debugging a Claude Code pipeline is pretty subpar. There are no breakpoints or stack traces. When a subagent fails silently, you see nothing - you just notice a &lt;code&gt;.error&lt;/code&gt; file appeared, if you remembered to implement &lt;code&gt;.error&lt;/code&gt; files in the first place (we didn't).&lt;/p&gt;

&lt;p&gt;And as there's no formal verification of any of it - no tests for the pipeline, no type checking on the DAG and so on, you interact with the system through Claude Code because a human genuinely cannot handle the throughput of many parallel scanners writing enriched JSON. It's vibecoding at its finest.&lt;/p&gt;

&lt;p&gt;Also, the &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;? The flag is named that for a reason, and there's no way to guarantee that Claude Code won't - to paraphrase a famous Haskell tutorial - go outside and scratch your car with a potato. We run it in an ephemeral container with limited credentials to reduce the blast radius. But if you're someone who needs formal guarantees about what your code will do at runtime, this approach should give you hives. We do acknowledge this, and are fully aware of the associated risks and tradeoffs, and given what this does, the stakes are low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Quirks From the Skill File
&lt;/h2&gt;

&lt;p&gt;And of course, writing pipelines in English produces some quirks you'd never see in a traditional codebase. Here are three picks from ours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The one-sentence retry policy.&lt;/strong&gt; This is the entire fallback logic for when enrichment fails:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;If Python enrichment returns fewer opportunities than scanned,
use WebFetch for the failed URLs.
Add successful results with "enrichment_method": "webfetch".
Log URLs that fail both methods.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Prefect, that's a custom retry handler with conditional logic. Here it's a paragraph and Claude figures it out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The anti-coding instruction.&lt;/strong&gt; The classifier agent's instructions include this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;At no point should you Write() a Python script. If you think you
need one, it's because you misunderstood these instructions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We added this after a classifier tried to write a sentiment analysis script instead of just... reading the thread and thinking about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "never fully fail" rule.&lt;/strong&gt; The last section of the skill file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; Scanner fails: Log failure, continue with others
&lt;span class="p"&gt;-&lt;/span&gt; Python enrichment fails: Try WebFetch fallback, then continue
&lt;span class="p"&gt;-&lt;/span&gt; Classifier fails: Log failure, continue with other sources
&lt;span class="p"&gt;-&lt;/span&gt; Proposer fails: Log failure, keep intermediate files

Never fail the entire skill due to individual component failures.
Always produce a pipeline report.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line is doing a lot of work. Even a completely botched run produces a report saying "everything broke" - which is still more useful than a silent crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Is the UI
&lt;/h2&gt;

&lt;p&gt;I genuinely enjoy nerding about subagent orchestration and filesystem-based message buses as much as anyone. But &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt; is a startup that needs to survive. The cool architecture means nothing if it doesn't change something out in the real world behind the boundaries of the company. Our pipeline generates mostly reports, and also needs some state like what URLs it has already seen and so on. A natural thing for this would be to use a database, define a schema, set up credentials, ... Well, we use GitHub as a store (together with LFS). Every pipeline run creates a PR with a markdown report and necessary state files like a &lt;code&gt;seen.txt&lt;/code&gt; with already listed URLs. Our non-technical person opens it, reads the results, expands a draft response, tweaks a sentence, and responds to an opportunity. Or they open the news pipeline PR, pick the better graphic from two variations, downloads the PNG, and shares it. GitHub is the database, the UI, and the delivery mechanism.&lt;/p&gt;

&lt;p&gt;This bridges the gap between what an AI can do and a human can do - neither on their own is as good as both together. The AI produces 80% of the work, the human fixes the last hardest 20% and takes action, and together they ship something neither could individually.&lt;/p&gt;

&lt;p&gt;The flexibility matters more than the reliability here. When the classifier catches that a Reddit post is actually a competitor's marketing campaign, that's judgment no &lt;code&gt;@task(retries=3)&lt;/code&gt; gives you. When news breaks about tariffs and the pipeline routes to QuantGov for regulatory data, that's not something you hard-code in a DAG. And when your boss can read the pipeline definition and say "add Snowflake to the sources" and it just works because the instruction is in English - that's the point.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We build &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt; - forecast, score, classify, or research every row of a dataset. This pipeline is how we find the people who need it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tooling</category>
      <category>kubernetes</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Running Claude Code as a Kubernetes Job</title>
      <dc:creator>Daniel Hnyk</dc:creator>
      <pubDate>Fri, 27 Feb 2026 12:03:32 +0000</pubDate>
      <link>https://dev.to/hnykda/running-claude-code-as-a-kubernetes-job-25d1</link>
      <guid>https://dev.to/hnykda/running-claude-code-as-a-kubernetes-job-25d1</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 1 of a series on using Claude Code as a production runtime. Originally published on &lt;a href="https://everyrow.io/blog/claude-code-kubernetes-cronjob" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;We run Claude Code in Kubernetes for a set of long-running marketing CronJobs. One scans communities like subreddits and support forums, another searches for news and generates relevant content, and the last one optimizes SEO for &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt;, our data processing product.&lt;/p&gt;

&lt;p&gt;This originally sounded like a terrible idea, but after running it for a few months, we think it's a genuinely valid engineering approach - for the right kind of work. Everything is a tradeoff, and this series is a short journey through the practical engineering, actual use cases, and some beautiful metaphysics.&lt;/p&gt;

&lt;p&gt;Our infrastructure for &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt; and &lt;a href="https://futuresearch.ai/" rel="noopener noreferrer"&gt;futuresearch.ai&lt;/a&gt; runs on Google Kubernetes Engine, so that's where we'll start - here's what you need to make Claude Code work as a K8s CronJob, gotchas included.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;

&lt;p&gt;For reasons explained in the next posts, we need both Python and Node. Claude is excellent at writing Python glue code (Python has been preparing for this time all its life), and we write in Python as well. Whenever Claude produces something useful for itself, we ask it to add it to the &lt;code&gt;lib&lt;/code&gt; module for future reference. More on that later.&lt;/p&gt;

&lt;p&gt;We put together a minimal runnable example at &lt;a href="https://github.com/futuresearch/example-cc-cronjob" rel="noopener noreferrer"&gt;github.com/futuresearch/example-cc-cronjob&lt;/a&gt; - a Dockerfile, entrypoint, a trivial skill, and both a plain CronJob manifest and a Helm chart. Everything below is from our production setup, but if you just want to get something running, start there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dockerfile
&lt;/h2&gt;

&lt;p&gt;All right, let's start with a pretty standard Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build stage: install Python dependencies with uv&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;ghcr.io/astral-sh/uv:python3.13-bookworm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; pyproject.toml uv.lock ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--no-sources&lt;/span&gt;

&lt;span class="c"&gt;# Runtime: Python + Node.js (Claude CLI needs Node)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; nikolaik/python-nodejs:python3.13-nodejs22&lt;/span&gt;

&lt;span class="c"&gt;# jq for our "monitoring stack", librsvg2-bin for SVG→PNG, gh for PR creation&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; jq librsvg2-bin git-lfs gh &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;useradd &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/bash claudie
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; claudie&lt;/span&gt;

&lt;span class="c"&gt;# Install Claude CLI as non-root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://claude.ai/install.sh | bash

&lt;span class="c"&gt;# Skip the interactive onboarding. Claude CLI won't start without this.&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"hasCompletedOnboarding": true}'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /home/claudie/.claude.json

&lt;span class="c"&gt;# Copy venv from build stage, copy project files, set PATH&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/.venv /home/claudie/.venv&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /home/claudie/claudie&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; deploy/entrypoint.sh /home/claudie/entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; claudie:claudie /home/claudie
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; claudie&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PATH="/home/claudie/.venv/bin:/home/claudie/.local/bin:$PATH"&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["/home/claudie/entrypoint.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A couple of things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We use multistage, building Python deps and copying them later - not strictly necessary but a nice optimization space-wise.&lt;/li&gt;
&lt;li&gt;Claude Code requires Node.js - it's a Node app under the hood, hence the &lt;code&gt;python-nodejs&lt;/code&gt; base image.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;hasCompletedOnboarding&lt;/code&gt; line: without it, Claude tries to walk you through a setup wizard. Given this runs in a terminal without TTY, this is obviously not what you want, hence this mini-hack.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Entrypoint
&lt;/h2&gt;

&lt;p&gt;The entrypoint is where you set up prerequisites for your workflow - credentials for MCP servers, SSH keys, and so on. In our case, one of the more important ones is &lt;code&gt;gh&lt;/code&gt; (GitHub CLI), since we use GitHub as the place to store results and create PRs (more on that in the later posts).&lt;/p&gt;

&lt;p&gt;The actual Claude Code process is spawned like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--verbose&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output-format&lt;/span&gt; stream-json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SKILL_PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's unpack this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;-p&lt;/code&gt; simply means non-interactive mode.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; is what it sounds like - the agent can do whatever it wants. We appreciate this is controversial and that sysadmins are screaming somewhere, but empirically, we haven't seen anything bad happen with the tasks we run.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--verbose&lt;/code&gt; together with &lt;code&gt;--output-format stream-json&lt;/code&gt; gets the output out of Claude Code. By default, it only outputs the final message and you have no visibility into what it's doing. These two parameters make sure everything gets logged to &lt;code&gt;stdout&lt;/code&gt;. There is a &lt;em&gt;lot&lt;/em&gt; of detail - see the next section for filtering.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;--&lt;/code&gt; separator before the prompt is important if you use &lt;code&gt;--add-dir&lt;/code&gt;. Without it, the prompt gets consumed as another directory path.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;SKILL_PROMPT&lt;/code&gt; is literally something like &lt;code&gt;execute scan-and-classify skill&lt;/code&gt;, optionally with &lt;code&gt;--add-dir &amp;lt;some-path&amp;gt;&lt;/code&gt; if you need additional directories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filtering logs with &lt;code&gt;jq&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;When Claude runs with &lt;code&gt;--output-format stream-json --verbose&lt;/code&gt;, you get one JSON object per line - every thought, every tool call, every result... You'll want to filter this to something more sensible. We pipe it to &lt;code&gt;jq&lt;/code&gt; and by trial and error found the following to be a sensible tradeoff between verbosity and volume:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude ... | &lt;span class="nb"&gt;tee&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RAW_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;--unbuffered&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'
if .type == "assistant" then
  .message.content[]? |
  if .type == "text" then "&amp;gt;&amp;gt;&amp;gt; " + .text[0:5000]
  elif .type == "tool_use" then "[" + .name + "] " + ((.input | tostring)[0:3000])
  else empty end
elif .type == "result" then
  "[done] " + (.result // "complete")[0:5000]
else empty end'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt; for Claude's thoughts. &lt;code&gt;[Read]&lt;/code&gt; or &lt;code&gt;[Bash]&lt;/code&gt; for tool calls. &lt;code&gt;[done]&lt;/code&gt; for completion.&lt;/p&gt;

&lt;p&gt;The raw JSONL goes to &lt;code&gt;/tmp/&lt;/code&gt; for when you need to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timeout - The Safety Net
&lt;/h2&gt;

&lt;p&gt;If you open the example entrypoint in the repository, you'll notice we wrap the execution with &lt;code&gt;timeout 10800 bash -c 'claude ...'&lt;/code&gt;. Why isn't the Kubernetes job's &lt;code&gt;activeDeadlineSeconds&lt;/code&gt; enough? Because we have a catch-all mechanism if things go wrong. Three hours (10800 seconds) is the timeout just for the Claude Code part. If Claude hangs - and it will, eventually - &lt;code&gt;timeout&lt;/code&gt; kills it with exit code 124, and then a second Claude instance wakes up to collect whatever was created so far for debugging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CLAUDE_EXIT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 124 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;timeout &lt;/span&gt;600 claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="s2"&gt;"The pipeline timed out. Check what partial results exist.
       Write a report. Commit to a branch. Create a PR with [PARTIAL] prefix."&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So... the CronJob spawns backup Claudes to clean up after a failed Claude. Not sure if this is robust engineering or a cry for help (both?), but it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CronJob
&lt;/h2&gt;

&lt;p&gt;The CronJob manifest is relatively simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;batch/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CronJob&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claudie-scan-classify&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;8&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1-5"&lt;/span&gt;          &lt;span class="c1"&gt;# 8am UTC weekdays&lt;/span&gt;
  &lt;span class="na"&gt;concurrencyPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Forbid&lt;/span&gt;
  &lt;span class="na"&gt;jobTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;backoffLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;activeDeadlineSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;14400&lt;/span&gt;  &lt;span class="c1"&gt;# 4 hours - longer than the Claude timeout&lt;/span&gt;
      &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Never&lt;/span&gt;
          &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claudie&lt;/span&gt;
              &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;your container registry&amp;gt;/claudie:latest&lt;/span&gt;
              &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SKILL_NAME&lt;/span&gt;
                  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scan-and-classify"&lt;/span&gt;
              &lt;span class="na"&gt;envFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claudie-secrets&lt;/span&gt;
              &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100m&lt;/span&gt;
                  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512Mi&lt;/span&gt;
                &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
                  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole thing. &lt;code&gt;SKILL_NAME&lt;/code&gt; tells the entrypoint which skill to run. &lt;code&gt;concurrencyPolicy: Forbid&lt;/code&gt; prevents overlap. Secrets go in via &lt;code&gt;envFrom&lt;/code&gt; - the Anthropic API key, GitHub token, and whatever MCP servers need. We have three of these (scan, news, SEO) with different schedules. We wrap this in a lightweight Helm template, so adding a new skill is just an entry in &lt;code&gt;values.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;daily-news&lt;/span&gt;
    &lt;span class="na"&gt;skillName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;daily-news-content&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;14&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1-5"&lt;/span&gt;  &lt;span class="c1"&gt;# Weekdays only (Mon-Fri)&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scan-classify&lt;/span&gt;
    &lt;span class="na"&gt;skillName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scan-and-classify&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;8&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1-5"&lt;/span&gt;  &lt;span class="c1"&gt;# Weekdays only (Mon-Fri)&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;seo-pipeline&lt;/span&gt;
    &lt;span class="na"&gt;skillName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;seo-pipeline&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1,3,5"&lt;/span&gt;  &lt;span class="c1"&gt;# Mon/Wed/Fri at 10:00 UTC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  GitHub as a Database
&lt;/h2&gt;

&lt;p&gt;One pattern worth calling out: we use GitHub as our entire storage and delivery layer. Every pipeline run creates a branch, commits results, pushes, and opens a PR. The PR is the output - our cofounder opens it, reads a markdown report, and acts on it. There's no database, no dashboard, no custom UI. Much more on this in the later posts.&lt;/p&gt;

&lt;p&gt;To make this work from a container, the entrypoint sets up git and the GitHub CLI before Claude starts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git config &lt;span class="nt"&gt;--global&lt;/span&gt; user.email &lt;span class="s2"&gt;"claudie-bot@example.com"&lt;/span&gt;
git config &lt;span class="nt"&gt;--global&lt;/span&gt; user.name &lt;span class="s2"&gt;"Claudie Bot"&lt;/span&gt;

&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.ssh
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SSH_PRIVATE_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/.ssh/id_ed25519
&lt;span class="nb"&gt;chmod &lt;/span&gt;600 ~/.ssh/id_ed25519
ssh-keyscan github.com &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.ssh/known_hosts 2&amp;gt;/dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SSH_PRIVATE_KEY&lt;/code&gt; is a deploy key with write access to the repo. &lt;code&gt;GH_TOKEN&lt;/code&gt; (passed as an env var) lets &lt;code&gt;gh&lt;/code&gt; create PRs. Both go into the Kubernetes secret. The skill then just tells Claude to commit and create a PR - it knows how to use &lt;code&gt;git&lt;/code&gt; and &lt;code&gt;gh&lt;/code&gt; out of the box.&lt;/p&gt;

&lt;p&gt;Our &lt;a href="https://github.com/futuresearch/example-cc-cronjob" rel="noopener noreferrer"&gt;example repo&lt;/a&gt; demonstrates this: the &lt;code&gt;add-numbers&lt;/code&gt; skill computes a result, writes it to a file, commits to a branch, and opens a PR. A toy example, but it's the same pattern our production pipelines use every day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should You Do This?
&lt;/h2&gt;

&lt;p&gt;Probably not for anything important. I would resign if we used this for a payment pipeline. But for discovering that someone on &lt;code&gt;r/salesforce&lt;/code&gt; needs help deduplicating 5000 company records? Take my money.&lt;/p&gt;

&lt;p&gt;The next post covers what actually runs inside these CronJobs - specifically, why a 398-line markdown file replaced what would normally be a relatively non-trivial orchestration job.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We build &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow.io&lt;/a&gt; - tools for semantic deduplication, entity resolution, and qualitative ranking of datasets. This pipeline is how we find people who need them.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next: &lt;a href="https://everyrow.io/blog/claude-code-workflow-engine" rel="noopener noreferrer"&gt;Using Claude Code as a Workflow Engine&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ai</category>
    </item>
    <item>
      <title>5 DataFrame Operations LLMs Handle Better Than Code</title>
      <dc:creator>Daniel Hnyk</dc:creator>
      <pubDate>Thu, 19 Feb 2026 09:21:01 +0000</pubDate>
      <link>https://dev.to/hnykda/5-dataframe-operations-llms-handle-better-than-code-436a</link>
      <guid>https://dev.to/hnykda/5-dataframe-operations-llms-handle-better-than-code-436a</guid>
      <description>&lt;p&gt;There are things I do with DataFrames all the time that pandas was never built for. Filtering by subjective criteria. Joining tables that don't share a key. Looking up information that only exists on the web. Recently I've been using LLMs, and the results have been surprisingly cheap and accurate.&lt;/p&gt;

&lt;p&gt;Here are five operations I now handle with LLMs (with working code).&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Filter by Qualitative Criteria
&lt;/h2&gt;

&lt;p&gt;You have 3,616 job postings and want only the ones that are remote-friendly, senior-level, AND disclose salary. &lt;code&gt;df[df['posting'].str.contains('remote')]&lt;/code&gt; matches "No remote work available."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $4.24 for 3,616 rows (9.9 minutes)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;screen&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JobScreenResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;qualifies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;True if meets ALL criteria&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;screen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A job posting qualifies if it meets ALL THREE criteria:
    1. Remote-friendly: Explicitly allows remote work
    2. Senior-level: Title contains Senior/Staff/Lead/Principal
    3. Salary disclosed: Specific compensation numbers mentioned
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;JobScreenResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;216 of 3,616 passed (6%). Interestingly, the pass rate has climbed from 1.7% in 2020 to 14.5% in 2025 as more companies are offering remote work and disclosing salaries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://everyrow.io/docs/filter-dataframe-with-llm" rel="noopener noreferrer"&gt;Full guide with dataset&lt;/a&gt; · See it applied to real job postings: &lt;a href="https://everyrow.io/docs/case-studies/screen-job-postings-by-criteria" rel="noopener noreferrer"&gt;Screening job postings by criteria&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Classify Rows Into Categories
&lt;/h2&gt;

&lt;p&gt;You need to label 200 job postings into categories (backend, frontend, data, ML/AI, devops, etc.). Keyword matching misses anything that's not an exact match, but training a classifier is overkill for a one-off task like this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.74 for 200 rows (2.1 minutes). At scale: ~$9 for 1,000 rows, ~$90 for 10,000.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agent_map&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JobClassification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;backend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frontend&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fullstack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ml_ai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;devops_sre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mobile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;other&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Primary role category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Why this category was chosen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;agent_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify this job posting by primary role...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;JobClassification&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Literal&lt;/code&gt; type constrains the LLM to your predefined set, so there's no post-processing needed. You can add confidence scores and multi-label support by extending the Pydantic model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://everyrow.io/docs/classify-dataframe-rows-llm" rel="noopener noreferrer"&gt;Full guide with dataset&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Add a Column Using Web Research
&lt;/h2&gt;

&lt;p&gt;You have a list of 246 SaaS products and need the annual price of each one's lowest paid tier. There's no API for this kind of problem because it requires visiting pricing pages that all present information differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $6.68 for 246 rows (15.7 minutes), 99.6% success rate&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agent_map&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PricingInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;lowest_paid_tier_annual_price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Annual price in USD for the lowest paid tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tier_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name of the tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;agent_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Find the pricing for this SaaS product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s lowest paid tier.
    Visit the product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s pricing page.
    Report the annual price in USD and the tier name.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PricingInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each result comes with a &lt;code&gt;research&lt;/code&gt; column showing how the agent found the answer, with citations. For example, Slack's entry references slack.com/pricing/pro and shows the math: $7.25/month × 12 = $87/year.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://everyrow.io/docs/add-column-web-lookup" rel="noopener noreferrer"&gt;Full guide with dataset&lt;/a&gt; · See it applied to vendor matching: &lt;a href="https://everyrow.io/docs/case-studies/match-software-vendors-to-requirements" rel="noopener noreferrer"&gt;Matching software vendors to requirements&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Join DataFrames Without a Shared Key
&lt;/h2&gt;

&lt;p&gt;You have two tables of S&amp;amp;P 500 data — one with company names and market caps, the other with stock tickers and fair values. Without a shared column across both datasets, &lt;code&gt;pd.merge()&lt;/code&gt; is useless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $1.00 for 438 rows (~30 seconds), 100% accuracy&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;merge&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Match companies to their stock tickers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;left_table&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;companies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# has: company, price, mkt_cap
&lt;/span&gt;    &lt;span class="n"&gt;right_table&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;valuations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# has: ticker, fair_value
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# 3M → MMM, Alphabet Inc. → GOOGL, etc.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, it uses a cascade: exact match → fuzzy match → LLM reasoning → web search. The results show 99.8% of rows matched via LLM alone. And even with 10% character-level noise ("Alphaeet Iqc." instead of "Alphabet Inc."), it hit 100% accuracy at $0.44. I'd much prefer having to manually review the unmatched rows than deal with false positives.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://everyrow.io/docs/fuzzy-join-without-keys" rel="noopener noreferrer"&gt;Full guide with dataset&lt;/a&gt; · See it applied at scale: &lt;a href="https://everyrow.io/docs/case-studies/llm-powered-merging-at-scale" rel="noopener noreferrer"&gt;LLM-powered merging at scale&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Rank by a Metric That's Not in Your Data
&lt;/h2&gt;

&lt;p&gt;You have 300 PyPI packages and want to rank them by days since last release and number of GitHub contributors. This data is on PyPI and GitHub (not in your DataFrame).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; $3.90 for days-since-release, $4.13 for GitHub contributors (300 rows each, ~5 minutes)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;everyrow.ops&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rank by number of days since the last PyPI release&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;packages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;days_since_release&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK sends a web research agent per row to look up the metric, then ranks by the result. And it works for any metric you can describe in natural language, as long as it's findable on the web.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://everyrow.io/docs/rank-by-external-metric" rel="noopener noreferrer"&gt;Full guide with dataset&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Filter job postings&lt;/td&gt;
&lt;td&gt;3,616&lt;/td&gt;
&lt;td&gt;$4.24&lt;/td&gt;
&lt;td&gt;9.9 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Classify into categories&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;$1.74&lt;/td&gt;
&lt;td&gt;2.1 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web research (pricing)&lt;/td&gt;
&lt;td&gt;246&lt;/td&gt;
&lt;td&gt;$6.68&lt;/td&gt;
&lt;td&gt;15.7 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fuzzy join (no key)&lt;/td&gt;
&lt;td&gt;438&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;30 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rank by external metric&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;$3.90&lt;/td&gt;
&lt;td&gt;4.3 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All of these are one function call on a pandas DataFrame. The orchestration (batching, parallelism, retries, rate limiting, model selection) is handled by &lt;a href="https://everyrow.io" rel="noopener noreferrer"&gt;everyrow&lt;/a&gt;, an open-source Python SDK. New accounts get $20 in free credit, which covers all five examples above with room to spare.&lt;/p&gt;

&lt;p&gt;The full code and datasets for each example are linked above.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
