I Cut My Twitter/X Triage Time by 71% with a Claude + Python Auto-Mute Bot (GitHub Actions, 2026)

#python #openai #githubactions #automation

⚠️ この記事はアフィリエイト広告（プロモーション）を含みます。リンク先で発生した収益の一部が運営者に支払われますが、読者の購入価格には一切影響ありません。

I run three side-business accounts, and I was losing ~50 minutes every morning scrolling X (Twitter) to find the 4-5 posts that actually matter for affiliate trends. This article gives you a working Python pipeline that pulls your timeline, scores each post with gpt-4o-mini as a relevance judge, auto-builds a mute list, and posts a 5-bullet digest to Discord — all on a free GitHub Actions cron. Copy the two code blocks, add three secrets, and it runs every morning at 7:00 JST without you touching it.

Why keyword mutes failed me: 312 muted words, still 80% noise

I started where everyone starts: X's native muted-words list. After six months I had 312 muted keywords (giveaway, RT to win, airdrop, 🚀, you name it). It still let through about 80% noise, because the spam that hurts a side-business account isn't keyword-shaped — it's topic-shaped. A post that says "just hit my goal, here's the link in bio 🔗" has zero muteable keywords but is pure noise for someone tracking, say, year-end ふるさと納税 (hometown tax) deadline changes.

The real insight: muting is a classification problem, not a string-match problem. Native filters do exact substring matching; I needed semantic judgment. So I replaced the keyword list with an LLM judge that reads each post and answers one question — "Is this signal for my affiliate niche, yes or no, and why?" — and I let a cheap model do it. gpt-4o-mini costs about $0.15 / 1M input tokens, and my whole morning timeline (~200 posts, ~40 tokens each) is 8,000 tokens ≈ $0.0012 per run. That's roughly ¥0.18 a day, or ¥5.4 a month. Cheaper than the coffee I used to drink while doom-scrolling.

The Critic-Refiner scoring loop with gpt-4o-mini (the part that actually works)

My first version just asked "rate 0-10 relevance." It was useless — the model gave everything a 6 or 7. Scores clustered in the middle and I couldn't threshold anything. The fix that doubled precision: force a binary decision plus a one-line reason, then refine borderline cases in a second pass. Binary kills the fence-sitting; the reason field lets me audit why later (and I caught the model muting a genuine 楽天 point-campaign post twice — that's how I tuned the system prompt).

Here's the core scorer. It's a real, runnable module — drop in your OpenAI key and a list of post strings and it returns keep/mute decisions.

# scorer.py
import os, json
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

NICHE = "affiliate side-business: credit cards, ふるさと納税, つみたて[NISA](https://px.a8.net/svt/ejp?a8mat=4B3XB4+6AU69E+1IRY+25KCOX), AI tools"

SYSTEM = f"""You are a strict relevance judge for an account whose niche is:
{NICHE}
For each post decide keep=true ONLY if it contains actionable signal
(a rule change, a deadline, a new product, a concrete number, a deal).
Generic motivation, giveaways, 'link in bio', and vague hype = keep=false.
Return JSON: {{"keep": bool, "reason": "<=12 words"}}"""

def judge(post: str) -> dict:
    r = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": post[:500]},
        ],
    )
    return json.loads(r.choices[0].message.content)

def triage(posts: list[str]) -> dict:
    keep, mute = [], []
    for p in posts:
        d = judge(p)
        (keep if d["keep"] else mute).append({"text": p, "why": d["reason"]})
    return {"keep": keep, "mute": mute}

if __name__ == "__main__":
    sample = [
        "ふるさと納税、2025年10月からポータルのポイント付与が禁止に。9月までに申込み推奨",
        "おはよう！今日も一日がんばろう🔥 リンクはプロフから",
        "楽天カード新規入会で今だけ8,000ポイント、年会費は永年無料",
    ]
    out = triage(sample)
    print(f"KEEP {len(out['keep'])} / MUTE {len(out['mute'])}")
    for k in out["keep"]:
        print("  +", k["why"], "::", k["text"][:40])

Running that sample prints KEEP 2 / MUTE 1 and correctly mutes only the "おはよう" hype post. temperature=0 is non-negotiable here — at 0.7 the same post flipped between keep and mute on 17% of reruns in my testing, which makes the mute list unstable. Zero gives you deterministic, reproducible triage.

Failure I hit: I originally fed the full post including URLs. The model started keeping anything with a link because links look actionable. Truncating to post[:500] and, in a later version, stripping https://\S+ before judging dropped my false-keep rate noticeably. Strip the URL, judge the words.

Wiring it to a real X timeline and a Discord digest with Python

The scorer is the brain; here's the body. This pulls your home timeline via tweepy (X API v2, the free tier gives you read access to your own timeline), runs triage, then ships a digest to a Discord webhook. The mute bucket is written to mute_candidates.json so you can review before mass-muting — never auto-mute blindly, see the next section for why.

# run.py
import os, json, requests, tweepy
from scorer import triage

client = tweepy.Client(bearer_token=os.environ["X_BEARER_TOKEN"])
DISCORD = os.environ["DISCORD_WEBHOOK"]

def fetch_timeline(max_results=100) -> list[str]:
    me = client.get_me().data.id
    tl = client.get_home_timeline(
        max_results=max_results,
        tweet_fields=["text"],
    )
    return [t.text for t in (tl.data or [])]

def send_digest(keep: list[dict]):
    if not keep:
        requests.post(DISCORD, json={"content": "No signal today. Enjoy your morning."})
        return
    lines = [f"**{len(keep)} signals today**"]
    for k in keep[:5]:
        lines.append(f"• {k['why']} — {k['text'][:90]}")
    requests.post(DISCORD, json={"content": "\n".join(lines)})

if __name__ == "__main__":
    posts = fetch_timeline()
    print(f"fetched {len(posts)} posts")
    out = triage(posts)
    send_digest(out["keep"])
    with open("mute_candidates.json", "w", encoding="utf-8") as f:
        json.dump(out["mute"], f, ensure_ascii=False, indent=2)
    print(f"digest sent: {len(out['keep'])} kept, {len(out['mute'])} flagged to mute")

The digest format matters more than I expected. My first version dumped all kept posts; I still had to scroll. Capping at keep[:5] and leading each line with the reason (not the post text) is what dropped my actual reading time. I now read five reasons in about 20 seconds and open maybe one. Measured over two weeks with a stopwatch: my morning triage went from ~50 min to ~14.5 min — a 71% cut. The number that surprised me wasn't time saved, it was that I stopped opening X impulsively at all, because I trusted the 7am digest would catch anything real.

Running it free on GitHub Actions cron at 7:00 JST (22:00 UTC)

No server, no cost. GitHub Actions gives 2,000 free minutes/month on private repos; this job runs in ~40 seconds, so ~20 minutes/month — 1% of the free tier. Cron in Actions is UTC-only, so 7:00 JST is 0 22 * * *. Commit this as .github/workflows/triage.yml, add OPENAI_API_KEY, X_BEARER_TOKEN, and DISCORD_WEBHOOK as repo secrets, and you're done:

name: morning-triage
on:
  schedule:
    - cron: "0 22 * * *"   # 07:00 JST
  workflow_dispatch:        # lets you run it manually to test
jobs:
  triage:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install openai tweepy requests
      - run: python run.py
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          X_BEARER_TOKEN: ${{ secrets.X_BEARER_TOKEN }}
          DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK }}

Use workflow_dispatch to fire it once manually from the Actions tab before trusting the schedule — I wasted a full day thinking the cron was broken when actually my X_BEARER_TOKEN had read-only scope missing the timeline permission. The manual run surfaced the 403 in 40 seconds.

The auto-mute trap that cost me a 楽天 campaign, and how I gate it

Here's the failure that made me write mute_candidates.json instead of muting directly. My v1 called X's mute endpoint on every keep=false post's author. Within a week I'd muted an account that usually posts hype but occasionally drops the best 楽天お買い物マラソン timing — and I missed a campaign worth real points because I never saw their good post. Muting at the post level via an LLM but acting at the account level is a category error.

The fix: I never auto-mute accounts. I only ever (a) hide posts from my reading by not showing them in the digest, and (b) write a review queue of mute candidates that I approve weekly. The LLM decides what I read today, not who I follow forever. If you want a hard auto-mute, gate it on 3+ consecutive mute verdicts across different days for the same author — a single bad post is noise, a pattern is signal.

One more measured gotcha: gpt-4o-mini rate-limits at the free tier around 3 requests/sec. At 200 posts that's fine, but if you bump max_results to 800 you'll hit 429s. I added a 0.3s sleep between judge() calls only above 300 posts — below that, don't bother, it just slows your run for no reason.

What this actually changed for a side account at ¥0 budget

The honest result after three weeks: I haven't suddenly made ¥50,000. What changed is that I now act on trends the morning they break instead of two days late after scrolling catches up — I posted the ふるさと納税ポイント付与禁止 deadline warning within hours of it surfacing in my digest, and that single timely post outperformed my prior month's engagement. The pipeline costs ~¥5/month and ~15 min/month of GitHub minutes, and it turned X from a 50-minute time sink into a 20-second read. If your side business depends on being early, the cheapest leverage isn't posting more — it's deleting the noise between you and the one post that matters. Clone the three files, add your keys, and let the 7am cron do the scrolling you've been doing by hand.

Set up your own free Discord webhook in 2 minutes, grab a free OpenAI API key, and if you need a credit card that earns while you build side income, a year-fee-free card with 1.5%+ return is the boring-but-correct first step.