Stefano Casafranca

Posted on Apr 30

How I Built an Autonomous PR Agent with SerpApi, LangGraph, and LangSmith

#python #ai #serpapi #langgraph

Building in Public an Autonomous Marketing Agent with SerpApi, LangGraph, and LangSmith inside Reddit to promote my GitHub Project.

Most "AI agent" tutorials show you a ReAct loop that calls tools until the LLM decides it's done. That works for demos. It breaks in production.

I built an autonomous PR / Mkt agent called Doug that discovers relevant Reddit threads via SerpApi, drafts contextual comments, and post them on a schedule, all without human intervention. It runs on GitHub Actions, traces every decision in LangSmith, and has been operating in production for 2 weeks now.

Here's how I designed it to actually work till the point of "drafting the response for the Reddit Thread".

Disclaimer: The final step for the agent was supposed to be posting in reddit itself but seems like Reddit banned that...I have to copy the draft and post it on the thread that my agent Doug found for me each morning so I can market my plugin and help more devs get rid of using WordPress for their clients as the CMS and use the open source tool I created: Build_Script -- so Google Docs turns into the new CMS.

Why ReAct Loops Break in Production

The standard agent pattern gives the LLM full control. It picks which tools to call, how many times, and when to stop. For a chatbot? Fine. For an autonomous agent running on a cron with real API keys and real Reddit credentials? That's asking for trouble.

What happens when the LLM decides to retry a failed API call 30 times? Or ranks 0 threads and drafts 0 comments because it "wasn't confident enough"? Or ignores your subreddit allowlist because the prompt said "use your best judgment"?

I learned this the hard way. Doug's architecture is the result: deterministic control plane, bounded semantic reasoning.

The Split That Makes It Work

Doug has two layers, and they never cross:

Control plane (pure code): Scheduling, policy checks, posting caps, subreddit allowlists, deduplication, idempotency. Zero LLM involvement.
Semantic layer (bounded LLM calls): Exactly two LLM calls per cycle. One ranks threads. One drafts comments. That's it. No loops, no retries, no "let the model decide."

START
  |
  v
load_context ............. (deterministic: read brain files, load posted URLs)
  |
  v
collect_candidates ....... (SerpApi + PRAW: dual discovery)
  |
  v
merge_and_dedup .......... (deterministic: combine, deduplicate by URL)
  |
  v
policy_filter ............ (deterministic: allowlist, age, score, already-posted)
  |
  v
rank_candidates_llm ...... (LLM CALL #1: rank top 5 by relevance)
  |
  v
draft_comments_llm ....... (LLM CALL #2: draft a comment for each)
  |
  v
apply_post_policy ........ (deterministic: check mode + daily cap)
  |
  +----> post_comments ... (conditional: only if policy says yes)
  |
  v
finalize_run ............. (persist state, email summary)
  |
  v
END

The LLM never decides whether to post. Code does. The LLM never decides how many comments to write. Caps do. Its job is purely semantic: "which threads match?" and "what's a helpful comment?"

Step 1: Discovery with SerpApi

The first real problem is finding threads worth commenting on. Reddit's own search is slow to index new content and misses a lot. SerpApi flips this by searching Google's index of Reddit, which is way more comprehensive.

from serpapi import GoogleSearch

@tool
def search_serpapi(keywords: str, max_results: int = 20) -> list[dict]:
    """Search Google for Reddit threads via SerpApi."""
    params = {
        "engine": "google",
        "q": f"site:reddit.com {keywords}",
        "tbs": "qdr:w",  # last week only
        "num": max_results,
        "api_key": os.environ["SERPAPI_API_KEY"],
    }
    search = GoogleSearch(params)
    results = search.get_dict().get("organic_results", [])

    candidates = []
    for r in results:
        url = r.get("link", "")
        subreddit_match = re.search(r"reddit\.com/r/(\w+)", url)
        candidates.append({
            "title": r.get("title", ""),
            "url": url,
            "subreddit": subreddit_match.group(1) if subreddit_match else "",
            "snippet": r.get("snippet", ""),
            "source": "serpapi",
        })
    return candidates

Why SerpApi over Reddit's API? Google already ranks Reddit threads by relevance. When someone posts "best CLI tool for content management" on r/webdev, Google's index surfaces it faster and more accurately than Reddit's native search. SerpApi gives me structured access to that ranking.

I also run PRAW (Reddit's Python wrapper) as a fallback to catch threads less than 1 hour old that Google hasn't indexed yet:

@tool
def search_reddit(keywords: str, max_results: int = 20) -> list[dict]:
    """Fallback: catch fresh threads PRAW finds before Google."""
    reddit = praw.Reddit(
        client_id=os.environ["REDDIT_CLIENT_ID"],
        client_secret=os.environ["REDDIT_CLIENT_SECRET"],
        user_agent="doug-agent/1.0",
    )
    candidates = []
    for submission in reddit.subreddit("all").search(
        keywords, sort="new", limit=max_results
    ):
        candidates.append({
            "title": submission.title,
            "url": f"https://reddit.com{submission.permalink}",
            "subreddit": submission.subreddit.display_name,
            "score": submission.score,
            "age_hours": (time.time() - submission.created_utc) / 3600,
            "source": "praw",
        })
    return candidates

After both sources return, a deterministic merge-and-dedup step combines them by URL, giving priority to SerpApi when both find the same thread.

Step 2: Policy Filtering (No LLM Needed)

Before the LLM touches anything, hard constraints filter out garbage. This is pure code, no ambiguity:

SUBREDDIT_ALLOWLIST = {
    "ClaudeAI", "LocalLLaMA", "programming", "Python",
    "OpenAI", "Anthropic", "MachineLearning", "webdev",
    "coding", "learnprogramming",
}
MAX_THREAD_AGE_DAYS = 14
MIN_THREAD_SCORE = 2

def filter_candidates(candidates, posted_urls):
    """Pure code. No LLM. No hallucination risk."""
    filtered = []
    for c in candidates:
        subreddit = c.get("subreddit", "").replace("r/", "")
        if subreddit not in SUBREDDIT_ALLOWLIST:
            continue
        if c.get("age_hours", 0) > MAX_THREAD_AGE_DAYS * 24:
            continue
        if c.get("url", "") in posted_urls:
            continue  # already commented on this one
        if c.get("score", 0) < MIN_THREAD_SCORE and c.get("source") != "serpapi":
            continue
        filtered.append(c)
    return filtered

This is the part most tutorials skip. You don't need a prompt to say "only post in approved subreddits." You need an allowlist. You don't need the model to "remember" which threads you already commented on. You need a set lookup.

Step 3: Building the Graph with LangGraph

Here's where LangGraph earns its keep. Instead of a flat script, the workflow is a compiled graph with explicit nodes, edges, and one conditional branch:

from langgraph.graph import StateGraph, END

def build_reddit_hunt_graph():
    graph = StateGraph(DougState)

    graph.add_node("load_context", load_context)
    graph.add_node("collect_candidates", collect_candidates)
    graph.add_node("merge_and_dedup", merge_and_dedup)
    graph.add_node("policy_filter", policy_filter)
    graph.add_node("rank_candidates_llm", rank_candidates_llm)
    graph.add_node("draft_comments_llm", draft_comments_llm)
    graph.add_node("apply_post_policy", apply_post_policy)
    graph.add_node("post_comments", post_comments)
    graph.add_node("finalize_run", finalize_run)

    graph.set_entry_point("load_context")
    graph.add_edge("load_context", "collect_candidates")
    graph.add_edge("collect_candidates", "merge_and_dedup")
    graph.add_edge("merge_and_dedup", "policy_filter")
    graph.add_edge("policy_filter", "rank_candidates_llm")
    graph.add_edge("rank_candidates_llm", "draft_comments_llm")
    graph.add_edge("draft_comments_llm", "apply_post_policy")

    # The only conditional edge: post or skip
    graph.add_conditional_edges("apply_post_policy", route_after_policy, {
        "post_comments": "post_comments",
        "finalize_run": "finalize_run",
    })

    graph.add_edge("post_comments", "finalize_run")
    graph.add_edge("finalize_run", END)

    return graph.compile()

Why a graph instead of a script? Three reasons:

LangSmith traces every node. I see exactly where time goes, what each node produced, and where failures happen.
Conditional routing is explicit. route_after_policy returns either "post_comments" or "finalize_run". No magic.
State flows cleanly. Each node reads from and writes to a typed state object. No globals, no side effects.

Step 4: LLM Ranking (One Call, Bounded)

This is LLM call #1. One call. No ReAct loop. No retries. The model sees at most 15 candidates and returns its top 5:

def rank_candidates_llm(state: DougState) -> dict:
    candidates = state.get("filtered_candidates", [])
    if not candidates:
        return {"ranked_candidates": [], "threads_ranked": 0}

    llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0, max_tokens=4096)

    prompt = (
        "Rank these Reddit threads by how well BUILD_SCRIPT answers "
        "the poster's question. Return a JSON array of the top 5 "
        "thread URLs, ordered best to worst. Only include threads "
        "where BUILD_SCRIPT is a genuine, helpful answer.\n\n"
        f"Candidates:\n{json.dumps(candidates[:15], indent=2)}"
    )

    response = llm.invoke([
        SystemMessage(content=state["system_prompt"]),
        HumanMessage(content=prompt),
    ])

    ranked_urls = _extract_json_array(response.content)
    url_to_candidate = {c["url"]: c for c in candidates}
    ranked = [url_to_candidate[u] for u in ranked_urls if u in url_to_candidate]

    return {"ranked_candidates": ranked, "threads_ranked": len(ranked)}

The key decision here: I don't let the model explain why. I don't ask for reasoning chains. I ask for a ranked list and move on. Explanations are for debugging in LangSmith, not for runtime.

Step 5: Drafting Comments (One Call, Done)

LLM call #2. Again, single call, bounded output:

def draft_comments_llm(state: DougState) -> dict:
    ranked = state.get("ranked_candidates", [])
    if not ranked:
        return {"drafted_comments": [], "drafts_created": 0}

    llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0, max_tokens=4096)

    prompt = (
        "Draft a genuine, helpful Reddit comment for each thread. "
        "Answer the poster's question and naturally mention BUILD_SCRIPT "
        "as a solution where relevant. Keep each under 200 words. "
        "Return a JSON array with 'thread_url' and 'comment_body'.\n\n"
        f"Threads:\n{json.dumps(ranked[:5], indent=2)}"
    )

    response = llm.invoke([
        SystemMessage(content=state["system_prompt"]),
        HumanMessage(content=prompt),
    ])

    drafts = _extract_json_array(response.content)

    # Persist drafts for review
    draft_dir = MEMORY_DIR / "drafts" / datetime.now().strftime("%Y-%m-%d")
    draft_dir.mkdir(parents=True, exist_ok=True)
    (draft_dir / f"drafts-{datetime.now().strftime('%H%M%S')}.json").write_text(
        json.dumps(drafts, indent=2)
    )

    return {"drafted_comments": drafts, "drafts_created": len(drafts)}

Each draft gets persisted to disk regardless of whether it gets posted. This is important: even in shadow mode (more on that below), I can review every draft the agent produced and tweak the system prompt accordingly.

The JSON Extraction Problem

LLMs don't always return clean JSON. Sometimes there's a markdown fence around it. Sometimes there are literal newlines inside strings. Doug includes a robust parser that handles the messiest outputs:

def _extract_json_array(text: str) -> list:
    # Try the full text first
    match = re.search(r'\[.*\]', text, re.DOTALL)
    if not match:
        return []

    raw = match.group()
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        pass

    # Fix unescaped newlines inside strings
    fixed = re.sub(
        r'(?<=": ")(.*?)(?="[,\}\]])',
        lambda m: m.group().replace('\n', '\\n'),
        raw, flags=re.DOTALL,
    )
    try:
        return json.loads(fixed)
    except json.JSONDecodeError:
        pass

    # Last resort: extract individual objects
    return [
        json.loads(m.group())
        for m in re.finditer(r'\{[^{}]*\}', raw, re.DOTALL)
        if _is_valid_json(m.group())
    ]

This saved me from at least 3 failed runs where the model wrapped JSON in triple backticks or used literal line breaks inside comment bodies.

Shadow Mode: Don't Ship Blind

Doug starts in shadow mode. For three days, it runs the entire pipeline: SerpApi discovery, LLM ranking, LLM drafting, email summaries. Everything except posting to Reddit.

def check_shadow_promotion(state) -> bool:
    return (
        state.mode == "shadow"
        and state.healthy_shadow_days >= 3
        and state.health_status == "healthy"
    )

After three healthy days, mode transitions to live automatically. I almost shipped without this. During shadow mode, I caught a draft where the LLM hallucinated a feature that Build_Script doesn't have. That would have been embarrassing on Reddit.

Seeing Inside with LangSmith

Every graph.invoke() is traced with structured metadata:

result = graph.invoke({"runtime_state": state}, config={
    "run_name": "doug:reddit_hunt",
    "tags": ["reddit_hunt", state.mode, f"day-{state.mission_day}"],
    "metadata": {
        "run_id": state.run_id,
        "mode": state.mode,
        "brain_hash": state.brain_hash,
    },
})

I can open LangSmith and see: which threads SerpApi returned, how the LLM ranked them, what comments it drafted, whether the policy layer allowed posting. Every run. Retroactively. When a draft reads weird, I trace back through the ranking node to understand why that thread was selected. No guessing.

Running on GitHub Actions

Two cron workflows. No servers.

name: Doug Reddit Hunt
on:
  schedule:
    - cron: '7 15 * * 1,3,5'  # Mon/Wed/Fri 10:07 AM CT

jobs:
  hunt:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: python -m agents.doug
        env:
          SERPAPI_API_KEY: ${{ secrets.SERPAPI_API_KEY }}
          LANGCHAIN_API_KEY: ${{ secrets.LANGCHAIN_API_KEY }}
          LANGCHAIN_TRACING_V2: true
          LANGCHAIN_PROJECT: o1-swarm-doug
      - name: Commit memory
        run: |
          git config user.name "Doug [bot]"
          git config user.email "doug-bot@users.noreply.github.com"
          git add memory/
          git diff --cached --quiet || git commit -m "doug: reddit_hunt cycle"
          git push

Costs: effectively zero. GitHub Actions free tier covers it. SerpApi handles 100 searches/month on the free plan. Two Claude API calls per cycle at ~$0.01 each.

After every run, Doug commits its memory (drafts, metrics, run artifacts) back to the repo. Full audit trail, version controlled.

What I Actually Do Every Morning

Since Reddit blocked automated posting, here's my workflow now:

Wake up. Check the email Doug sent me overnight with the thread list and drafted comments.
Open each thread URL.
Read the draft. If it's good (it usually is), I copy-paste it as my comment.
If a draft needs tweaking, I edit it. Takes 30 seconds.

Total time: ~5 minutes per morning to promote Build_Script across 3-5 relevant threads. Without Doug, finding those threads alone would take 30+ minutes, and I probably wouldn't do it consistently.

What I'd Do Differently

Start with shadow mode from day one. Three days of validation caught a hallucinated feature description before it hit Reddit.

Use SerpApi as primary, not fallback. I originally had PRAW as the main discovery source. Google's index is more comprehensive, and SerpApi's structured output is cleaner to parse. PRAW is better as the "catch fresh threads" supplement.

Keep LLM calls bounded and countable. Every time I was tempted to add "one more LLM call" to handle an edge case, I wrote a deterministic check instead. Two calls per cycle. The agent got more reliable every time I said no to a third.

The Stack

Python 3.12
LangGraph for workflow orchestration
LangSmith for tracing and observability
SerpApi for Google-powered Reddit discovery
PRAW for real-time Reddit fallback
Claude Sonnet for ranking and drafting
GitHub Actions for scheduling (Mon/Wed/Fri + daily health check)
Gmail SMTP for operational reporting

Full source: o1-swarm on GitHub

If you're building agents that need to run without you watching, the architecture matters more than the model. Deterministic control, bounded LLM calls, and full observability are what separate a demo from something you actually trust to run on a cron.

Doug finds me 3-5 relevant threads every Monday, Wednesday, and Friday morning. I copy-paste and post. That's the whole workflow. The boring, reliable kind.

DEV Community