DEV Community

Cover image for I went looking for flashy agents and found 5 boring automations people actually keep
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I went looking for flashy agents and found 5 boring automations people actually keep

I started this rabbit hole looking for the fun stuff.

Browser agents. OpenClaw setups doing ten things at once. Claude Opus running inboxes. GPT-5 fixing bugs. Maybe Qwen or Llama doing the cheap background work.

The kind of automation stack that looks amazing in a demo and becomes a maintenance problem by Thursday.

Then I found a thread on r/openclaw where someone asked a much better question: what does your agent actually do on a normal day?

That question is way more useful than most "agentic workflow" discourse.

Because on a normal Wednesday, nobody cares whether your stack can theoretically book flights, compare vendors, and redesign a landing page. They care whether it saved them 45 minutes before lunch without doing anything stupid.

And the answers were refreshingly boring.

One user said they use it for inbox triage, draft replies on their phone, warehouse pick lists, shipment tracking, and scheduling. Another said they try to keep most of it deterministic to avoid hallucinations.

That second line is the whole game.

The automations people keep are usually:

  1. Triggered by structured data
  2. Deterministic in the plumbing
  3. Using the model only for the fuzzy part
  4. Keeping a human in the loop for anything risky

That pattern shows up over and over in Make, n8n, Zapier, OpenClaw, and custom scripts.

So if you're building AI automations and want something that survives past the demo phase, here are the 5 boring workflows I'd build first.

The rule: deterministic pipes, fuzzy model step

Before the list, here's the architecture rule that keeps showing up in real systems:

  • Use code, webhooks, cron, queues, and APIs for state changes
  • Use the LLM for classification, summarization, extraction, and draft generation
  • Do not let the model invent state
  • Do not let the model take irreversible actions without approval

In code, the split looks something like this:

# deterministic trigger + data fetch
email = gmail.get_message(message_id)
thread = gmail.get_thread(email.thread_id)
customer = hubspot.find_contact(email.from_address)

# fuzzy step
prompt = f"""
Classify this email into one of:
- urgent
- fyi
- scheduling
- customer_issue
- vendor
- spam

Then draft a reply in this tone: concise, helpful, direct.

Email:
{email.body}

Thread summary:
{thread.last_5_messages}

Customer context:
{customer}
"""

result = llm.responses.create(
    model="gpt-5.4",
    input=prompt
)

# deterministic output handling
save_draft_to_gmail(result.draft)
label_thread(result.classification)
notify_human_for_review(message_id)
Enter fullscreen mode Exit fullscreen mode

That's the pattern. Let the model do the text work. Let software handle the rest.

1) Inbox triage and draft replies is still the king

This one is boring in the best way.

Email is where work goes to become sludge. Too many messages, too much context switching, too many replies that are 80% the same.

That is why inbox triage keeps surviving contact with reality.

Why it works

The scope is tight:

  • Pull new messages from Gmail or Outlook
  • Classify them
  • Summarize the thread
  • Draft a response
  • Let a human approve the send

If you keep approval on the final send, the risk drops a lot while the time savings stay high.

Minimal implementation

If you're using n8n or Make, this is a very standard flow:

Gmail trigger
  -> fetch thread context
  -> fetch CRM/contact context
  -> LLM classify + summarize + draft
  -> create Gmail draft
  -> Slack/Telegram approval notification
Enter fullscreen mode Exit fullscreen mode

If you're doing it in code:

# poll unread messages every 2 minutes
*/2 * * * * /usr/bin/python3 /opt/agents/inbox_triage.py
Enter fullscreen mode Exit fullscreen mode

And your safety checks are straightforward:

if result.classification == "spam":
    archive(email.id)
elif result.classification in ["urgent", "customer_issue"]:
    create_draft(email.id, result.draft)
    send_slack_alert(email.id)
else:
    create_draft(email.id, result.draft)
Enter fullscreen mode Exit fullscreen mode

Why developers should care

This is the rare AI workflow that is:

  • easy to measure
  • easy to constrain
  • immediately useful
  • cheap to validate

You can tell in one week whether it saved time.

2) Calendar briefings are absurdly high leverage

This one surprised me.

I expected inbox triage to be useful. I didn't expect meeting briefings to be one of the most consistently loved workflows.

One Reddit user described a setup that checks Google Calendar and sends a Telegram message with the day's events. Another had a system that sends a briefing before meetings with relevant context from notes and prior follow-ups.

That is an excellent use of an LLM.

Why it works

The inputs are already structured:

  • Google Calendar
  • Notion or Obsidian notes
  • HubSpot or Salesforce records
  • Zoom transcripts
  • previous email threads

The model's job is not to act. It's to compress context.

Example briefing payload

{
  "meeting": "Vendor review with Acme Logistics",
  "starts_in": "20 minutes",
  "attendees": ["Sam Lee", "Priya Raman"],
  "last_touchpoint": "Discussed delayed shipments and SLA credits",
  "open_items": [
    "Confirm revised delivery window",
    "Review penalty terms",
    "Decide on backup carrier"
  ],
  "recommended_prep": [
    "Check last 3 shipment incidents",
    "Pull contract renewal date"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Delivery options

  • Slack DM
  • Telegram
  • iMessage bridge
  • email summary
  • calendar note update

Why this survives

Low risk, high frequency, real value.

Nobody gets fired because an AI-generated meeting brief was a little awkward. But people absolutely save time when they stop context-switching before every call.

3) Shipment tracking and ops alerts get practical fast

This is where AI automation stops sounding like a productivity toy and starts sounding like actual operations.

One of the better Reddit examples mentioned warehouse pick lists, shipment tracking, scheduling, SharePoint, and inbox workflows all tied together. That's not a toy setup. That's a real system.

If you work in ops, fulfillment, field service, or logistics-adjacent workflows, you already know the problem:

The data exists. It's just spread across carrier events, spreadsheets, ERP records, emails, and internal notes.

A good automation doesn't need to solve logistics. It just needs to notice what changed and explain what matters.

Workflow Why it's worth building first
Inbox triage + draft replies High reliability when limited to classify, summarize, and draft. High ROI for anyone with daily email volume. Low risk if a human approves sends.
Calendar or meeting briefings High reliability when sourced from Google Calendar, CRM data, notes, and transcripts. High ROI for managers, founders, sales, and client teams. Low risk because the output is informational.
Shipment tracking and operational alerts Medium-high reliability when based on carrier events and internal systems. High ROI for ops teams and warehouse workflows. Medium risk if actions are automated without review.

Good pattern

Use system events as ground truth.

For example:

shipment = carrier_api.get_status(tracking_number)
order = erp.get_order(order_id)
customer = crm.get_account(order.customer_id)

if shipment.status in ["delayed", "exception", "returned"]:
    summary = llm.responses.create(
        model="claude-opus-4.6",
        input=f"""
        Explain this shipment issue for an ops manager.
        Include likely impact, urgency, and next step.

        Shipment event: {shipment}
        Order context: {order}
        Customer tier: {customer.tier}
        """
    )

    post_to_slack(channel="#ops-alerts", text=summary.output_text)
Enter fullscreen mode Exit fullscreen mode

The model is not inventing shipment state. It is translating system state into a useful alert.

That's a much safer pattern.

4) Article clipping and summarization quietly becomes a second brain

This category sounds optional until you use it for a couple weeks.

Then you don't want to lose it.

The Reddit example that stuck with me used Obsidian Web Clipper to save articles, summarize them, and build a retrievable wiki. That's not a chatbot trick. That's memory infrastructure.

Good workflow

  • Save an article from Chrome or Safari
  • Extract metadata
  • Summarize the core argument
  • Tag it by company, topic, or project
  • Store it in Obsidian, Notion, or Drive
  • Link it to related notes

Example CLI pipeline

python clip_article.py "https://example.com/long-article"
python summarize_article.py --model qwen2.5:14b --input article.md
python tag_article.py --input summary.json
python push_to_obsidian.py --vault ~/ObsidianVault
Enter fullscreen mode Exit fullscreen mode

Why this is better than it sounds

The summary is not the main value.

The main value is that your future self can actually find and reuse what you read.

This is also a good place to use cheaper models or local inference for first-pass work.

For example:

  • Qwen or Llama for extraction and rough summaries
  • GPT-5.4 or Claude Opus 4.6 for synthesis across multiple sources

That split matters if you're running these pipelines all day.

If you're paying per token, background automations like this get expensive faster than people expect.

That's one reason predictable-cost infrastructure matters for agent workflows. If an automation is useful only when you feel safe letting it run constantly, pricing changes the architecture.

5) Product research and draft generation beats full autonomy

This is where people usually overreach.

They want: go choose the best vendor, negotiate, and place the order.

What they should probably build first: compare options, summarize tradeoffs, and generate a recommendation draft.

Those are very different risk profiles.

One of the better failure stories I ran into was a grocery-buying agent that ended up ordering 2 kg of garlic instead of 2 heads because the product page changed after working fine for months.

That is exactly why execution-heavy agents are dangerous.

Better pattern

Automate the read-heavy and draft-heavy part:

  • compare product specs across vendor pages
  • summarize reviews
  • extract pricing into a CSV
  • build a shortlist
  • draft a recommendation memo

Example output schema

{
  "vendors": [
    {
      "name": "Vendor A",
      "price": "$129/month",
      "pros": ["API access", "SOC 2", "fast onboarding"],
      "cons": ["no SSO on lower tier"]
    },
    {
      "name": "Vendor B",
      "price": "$99/month",
      "pros": ["cheaper", "better docs"],
      "cons": ["rate limits", "weak support"]
    }
  ],
  "recommendation": "Vendor A for production use, Vendor B for internal prototypes"
}
Enter fullscreen mode Exit fullscreen mode

Then let a human make the decision.

That is not less ambitious. It's just less reckless.

Why flashy agents keep disappointing people

Because the failure mode is usually not dramatic at first.

It kind of works.

Then it fails on an edge case.

Then you spend hours debugging prompts, browser state, selectors, retries, and context windows.

Then you realize the thing is also expensive.

A couple of OpenClaw threads were pretty blunt about this. One user reported burning a huge amount on Opus tokens for software upgrades, bug fixes, server management, and form filling. Another described months of effort, thousands of hours, and billions of tokens before deciding the setup was too fragile for serious work.

That doesn't mean ambitious automation is fake.

It means the cost of being wrong gets ugly fast.

And cost matters more than people like to admit.

If you're running always-on workflows, pricing affects design:

  • whether you can afford retries
  • whether you can batch requests
  • whether you can route simple work to cheaper models
  • whether you feel safe running background jobs 24/7

That is exactly why I think per-token pricing pushes people toward under-automation. They start optimizing around fear instead of utility.

If you're building agents for real work, predictable cost is not a nice-to-have. It's part of reliability.

That's also why services like Standard Compute are interesting for this category of workload. If you're already using OpenAI-compatible SDKs, you can swap the endpoint and run the same automations with flat monthly pricing instead of watching token spend every time a cron job fires.

For Make, n8n, Zapier, OpenClaw, or custom Python workers, that changes the economics of background AI tasks a lot.

If you build one thing this week, build the thing you already repeat

Not the coolest thing.

The most repeated thing.

The task you do at least five times a week and hate every time.

  • If it's email, build inbox triage.
  • If it's meetings, build calendar briefings.
  • If it's ops, build shipment alerts.
  • If it's reading, build article capture into Obsidian or Notion.
  • If it's vendor comparison, build research + draft output.

And keep the stack boring.

A cron job. A webhook. A queue. A spreadsheet. A few prompts. Maybe OpenClaw if you want agent orchestration. Maybe Make or n8n if you want visual flows. Maybe Ollama if you want local inference.

That's enough.

Basic sanity checks

If you're experimenting with local or agent setups, these are still useful:

# OpenClaw logs
cmd openclaw logs --follow

# local model inventory
ollama list

# local inference health check
curl http://localhost:11434/
Enter fullscreen mode Exit fullscreen mode

And for a boring but effective worker process:

# run every 5 minutes
*/5 * * * * /usr/bin/bash /opt/automations/run_pipeline.sh >> /var/log/automations.log 2>&1
Enter fullscreen mode Exit fullscreen mode

My actual take

The people getting real value from AI automation are usually not the ones posting the wildest demos.

They're the ones who quietly removed three annoying tasks from every day.

That sounds less impressive.

It is also what actually ships.

If your automation can survive a normal Wednesday, you're onto something.

If it only works in a demo, you built a demo.

Top comments (0)