Lars Winstand

Posted on May 14 • Originally published at standardcompute.com

I went looking for flashy agents and found 5 boring automations people actually keep

#ai #devops #automation #productivity

I started this rabbit hole looking for the fun stuff.

Browser agents. OpenClaw setups doing ten things at once. Claude Opus running inboxes. GPT-5 fixing bugs. Maybe Qwen or Llama doing the cheap background work.

The kind of automation stack that looks amazing in a demo and becomes a maintenance problem by Thursday.

Then I found a thread on r/openclaw where someone asked a much better question: what does your agent actually do on a normal day?

That question is way more useful than most "agentic workflow" discourse.

Because on a normal Wednesday, nobody cares whether your stack can theoretically book flights, compare vendors, and redesign a landing page. They care whether it saved them 45 minutes before lunch without doing anything stupid.

And the answers were refreshingly boring.

One user said they use it for inbox triage, draft replies on their phone, warehouse pick lists, shipment tracking, and scheduling. Another said they try to keep most of it deterministic to avoid hallucinations.

That second line is the whole game.

The automations people keep are usually:

Triggered by structured data
Deterministic in the plumbing
Using the model only for the fuzzy part
Keeping a human in the loop for anything risky

That pattern shows up over and over in Make, n8n, Zapier, OpenClaw, and custom scripts.

So if you're building AI automations and want something that survives past the demo phase, here are the 5 boring workflows I'd build first.

The rule: deterministic pipes, fuzzy model step

Before the list, here's the architecture rule that keeps showing up in real systems:

Use code, webhooks, cron, queues, and APIs for state changes
Use the LLM for classification, summarization, extraction, and draft generation
Do not let the model invent state
Do not let the model take irreversible actions without approval

In code, the split looks something like this:

# deterministic trigger + data fetch
email = gmail.get_message(message_id)
thread = gmail.get_thread(email.thread_id)
customer = hubspot.find_contact(email.from_address)

# fuzzy step
prompt = f"""
Classify this email into one of:
- urgent
- fyi
- scheduling
- customer_issue
- vendor
- spam

Then draft a reply in this tone: concise, helpful, direct.

Email:
{email.body}

Thread summary:
{thread.last_5_messages}

Customer context:
{customer}
"""

result = llm.responses.create(
    model="gpt-5.4",
    input=prompt
)

# deterministic output handling
save_draft_to_gmail(result.draft)
label_thread(result.classification)
notify_human_for_review(message_id)

That's the pattern. Let the model do the text work. Let software handle the rest.

1) Inbox triage and draft replies is still the king

This one is boring in the best way.

Email is where work goes to become sludge. Too many messages, too much context switching, too many replies that are 80% the same.

That is why inbox triage keeps surviving contact with reality.

Why it works

The scope is tight:

Pull new messages from Gmail or Outlook
Classify them
Summarize the thread
Draft a response
Let a human approve the send

If you keep approval on the final send, the risk drops a lot while the time savings stay high.

Minimal implementation

If you're using n8n or Make, this is a very standard flow:

Gmail trigger
  -> fetch thread context
  -> fetch CRM/contact context
  -> LLM classify + summarize + draft
  -> create Gmail draft
  -> Slack/Telegram approval notification

If you're doing it in code:

# poll unread messages every 2 minutes
*/2 * * * * /usr/bin/python3 /opt/agents/inbox_triage.py

And your safety checks are straightforward:

if result.classification == "spam":
    archive(email.id)
elif result.classification in ["urgent", "customer_issue"]:
    create_draft(email.id, result.draft)
    send_slack_alert(email.id)
else:
    create_draft(email.id, result.draft)

Why developers should care

This is the rare AI workflow that is:

easy to measure
easy to constrain
immediately useful
cheap to validate

You can tell in one week whether it saved time.

2) Calendar briefings are absurdly high leverage

This one surprised me.

I expected inbox triage to be useful. I didn't expect meeting briefings to be one of the most consistently loved workflows.

One Reddit user described a setup that checks Google Calendar and sends a Telegram message with the day's events. Another had a system that sends a briefing before meetings with relevant context from notes and prior follow-ups.

That is an excellent use of an LLM.

Why it works

The inputs are already structured:

Google Calendar
Notion or Obsidian notes
HubSpot or Salesforce records
Zoom transcripts
previous email threads

The model's job is not to act. It's to compress context.

Example briefing payload

{
  "meeting": "Vendor review with Acme Logistics",
  "starts_in": "20 minutes",
  "attendees": ["Sam Lee", "Priya Raman"],
  "last_touchpoint": "Discussed delayed shipments and SLA credits",
  "open_items": [
    "Confirm revised delivery window",
    "Review penalty terms",
    "Decide on backup carrier"
  ],
  "recommended_prep": [
    "Check last 3 shipment incidents",
    "Pull contract renewal date"
  ]
}

Delivery options

Slack DM
Telegram
iMessage bridge
email summary
calendar note update

Why this survives

Low risk, high frequency, real value.

Nobody gets fired because an AI-generated meeting brief was a little awkward. But people absolutely save time when they stop context-switching before every call.

3) Shipment tracking and ops alerts get practical fast

This is where AI automation stops sounding like a productivity toy and starts sounding like actual operations.

One of the better Reddit examples mentioned warehouse pick lists, shipment tracking, scheduling, SharePoint, and inbox workflows all tied together. That's not a toy setup. That's a real system.

If you work in ops, fulfillment, field service, or logistics-adjacent workflows, you already know the problem:

The data exists. It's just spread across carrier events, spreadsheets, ERP records, emails, and internal notes.

A good automation doesn't need to solve logistics. It just needs to notice what changed and explain what matters.

Workflow	Why it's worth building first
Inbox triage + draft replies	High reliability when limited to classify, summarize, and draft. High ROI for anyone with daily email volume. Low risk if a human approves sends.
Calendar or meeting briefings	High reliability when sourced from Google Calendar, CRM data, notes, and transcripts. High ROI for managers, founders, sales, and client teams. Low risk because the output is informational.
Shipment tracking and operational alerts	Medium-high reliability when based on carrier events and internal systems. High ROI for ops teams and warehouse workflows. Medium risk if actions are automated without review.

Good pattern

Use system events as ground truth.

For example:

shipment = carrier_api.get_status(tracking_number)
order = erp.get_order(order_id)
customer = crm.get_account(order.customer_id)

if shipment.status in ["delayed", "exception", "returned"]:
    summary = llm.responses.create(
        model="claude-opus-4.6",
        input=f"""
        Explain this shipment issue for an ops manager.
        Include likely impact, urgency, and next step.

        Shipment event: {shipment}
        Order context: {order}
        Customer tier: {customer.tier}
        """
    )

    post_to_slack(channel="#ops-alerts", text=summary.output_text)

The model is not inventing shipment state. It is translating system state into a useful alert.

That's a much safer pattern.

4) Article clipping and summarization quietly becomes a second brain

This category sounds optional until you use it for a couple weeks.

Then you don't want to lose it.

The Reddit example that stuck with me used Obsidian Web Clipper to save articles, summarize them, and build a retrievable wiki. That's not a chatbot trick. That's memory infrastructure.

Good workflow

Save an article from Chrome or Safari
Extract metadata
Summarize the core argument
Tag it by company, topic, or project
Store it in Obsidian, Notion, or Drive
Link it to related notes

Example CLI pipeline

python clip_article.py "https://example.com/long-article"
python summarize_article.py --model qwen2.5:14b --input article.md
python tag_article.py --input summary.json
python push_to_obsidian.py --vault ~/ObsidianVault

Why this is better than it sounds

The summary is not the main value.

The main value is that your future self can actually find and reuse what you read.

This is also a good place to use cheaper models or local inference for first-pass work.

For example:

Qwen or Llama for extraction and rough summaries
GPT-5.4 or Claude Opus 4.6 for synthesis across multiple sources

That split matters if you're running these pipelines all day.

If you're paying per token, background automations like this get expensive faster than people expect.

That's one reason predictable-cost infrastructure matters for agent workflows. If an automation is useful only when you feel safe letting it run constantly, pricing changes the architecture.

5) Product research and draft generation beats full autonomy

This is where people usually overreach.

They want: go choose the best vendor, negotiate, and place the order.

What they should probably build first: compare options, summarize tradeoffs, and generate a recommendation draft.

Those are very different risk profiles.

One of the better failure stories I ran into was a grocery-buying agent that ended up ordering 2 kg of garlic instead of 2 heads because the product page changed after working fine for months.

That is exactly why execution-heavy agents are dangerous.

Better pattern

Automate the read-heavy and draft-heavy part:

compare product specs across vendor pages
summarize reviews
extract pricing into a CSV
build a shortlist
draft a recommendation memo

Example output schema

{
  "vendors": [
    {
      "name": "Vendor A",
      "price": "$129/month",
      "pros": ["API access", "SOC 2", "fast onboarding"],
      "cons": ["no SSO on lower tier"]
    },
    {
      "name": "Vendor B",
      "price": "$99/month",
      "pros": ["cheaper", "better docs"],
      "cons": ["rate limits", "weak support"]
    }
  ],
  "recommendation": "Vendor A for production use, Vendor B for internal prototypes"
}

Then let a human make the decision.

That is not less ambitious. It's just less reckless.

Why flashy agents keep disappointing people

Because the failure mode is usually not dramatic at first.

It kind of works.

Then it fails on an edge case.

Then you spend hours debugging prompts, browser state, selectors, retries, and context windows.

Then you realize the thing is also expensive.

A couple of OpenClaw threads were pretty blunt about this. One user reported burning a huge amount on Opus tokens for software upgrades, bug fixes, server management, and form filling. Another described months of effort, thousands of hours, and billions of tokens before deciding the setup was too fragile for serious work.

That doesn't mean ambitious automation is fake.

It means the cost of being wrong gets ugly fast.

And cost matters more than people like to admit.

If you're running always-on workflows, pricing affects design:

whether you can afford retries
whether you can batch requests
whether you can route simple work to cheaper models
whether you feel safe running background jobs 24/7

That is exactly why I think per-token pricing pushes people toward under-automation. They start optimizing around fear instead of utility.

If you're building agents for real work, predictable cost is not a nice-to-have. It's part of reliability.

That's also why services like Standard Compute are interesting for this category of workload. If you're already using OpenAI-compatible SDKs, you can swap the endpoint and run the same automations with flat monthly pricing instead of watching token spend every time a cron job fires.

For Make, n8n, Zapier, OpenClaw, or custom Python workers, that changes the economics of background AI tasks a lot.

If you build one thing this week, build the thing you already repeat

Not the coolest thing.

The most repeated thing.

The task you do at least five times a week and hate every time.

If it's email, build inbox triage.
If it's meetings, build calendar briefings.
If it's ops, build shipment alerts.
If it's reading, build article capture into Obsidian or Notion.
If it's vendor comparison, build research + draft output.

And keep the stack boring.

A cron job. A webhook. A queue. A spreadsheet. A few prompts. Maybe OpenClaw if you want agent orchestration. Maybe Make or n8n if you want visual flows. Maybe Ollama if you want local inference.

That's enough.

Basic sanity checks

If you're experimenting with local or agent setups, these are still useful:

# OpenClaw logs
cmd openclaw logs --follow

# local model inventory
ollama list

# local inference health check
curl http://localhost:11434/

And for a boring but effective worker process:

# run every 5 minutes
*/5 * * * * /usr/bin/bash /opt/automations/run_pipeline.sh >> /var/log/automations.log 2>&1

My actual take

The people getting real value from AI automation are usually not the ones posting the wildest demos.

They're the ones who quietly removed three annoying tasks from every day.

That sounds less impressive.

It is also what actually ships.

If your automation can survive a normal Wednesday, you're onto something.

If it only works in a demo, you built a demo.

DEV Community

I went looking for flashy agents and found 5 boring automations people actually keep

The rule: deterministic pipes, fuzzy model step

1) Inbox triage and draft replies is still the king

Why it works

Minimal implementation

Why developers should care

2) Calendar briefings are absurdly high leverage

Why it works

Example briefing payload

Delivery options

Why this survives

3) Shipment tracking and ops alerts get practical fast

Good pattern

4) Article clipping and summarization quietly becomes a second brain

Good workflow

Example CLI pipeline

Why this is better than it sounds

5) Product research and draft generation beats full autonomy

Better pattern

Example output schema

Why flashy agents keep disappointing people

If you build one thing this week, build the thing you already repeat

Basic sanity checks

My actual take

Top comments (0)