Most AI coding advice still assumes a human is sitting inside an IDE waiting for autocomplete. That is useful, but it is not the most powerful workflow.
The real unlock is a small coding agent that can run on a VPS, inspect a repository, make a plan, edit files, run tests, summarize failures, and keep working while you are away.
This article shows a practical setup you can build without enterprise infrastructure.
The goal
The goal is not to create a magical senior engineer. The goal is to automate the boring middle of development:
- create boilerplate
- refactor repetitive files
- write first-pass tests
- update documentation
- scan logs
- open small pull requests
- summarize what changed
If the task requires product judgment, architecture tradeoffs, or security approval, the agent should stop and ask. If the task is mechanical, it should execute.
Minimal architecture
A useful 24/7 coding agent needs five parts:
- A task queue: where work items live
- A planner: turns a request into steps
- A tool layer: file edits, shell commands, git, test runners
- A memory layer: project notes, conventions, previous failures
- A review gate: prevents unsafe deploys
You can run all of this on a small VPS. The expensive part is usually not compute; it is token usage. That means the design should minimize unnecessary context.
Recommended stack
For a lean version, I would use:
- Python for orchestration
- GitHub issues or a local SQLite table as the task queue
- Docker for isolated execution
- ripgrep for code search
- pytest, npm test, or your normal test command
- one strong model for planning
- one cheaper model for summaries and repetitive transformations
A simple folder structure works:
agent/
tasks.db
main.py
tools/
shell.py
files.py
git.py
memory/
project_rules.md
failure_log.md
workspaces/
The agent loop is straightforward:
while True:
task = get_next_task()
if not task:
sleep(60)
continue
repo = checkout_repo(task.repo)
context = collect_relevant_context(repo, task)
plan = ask_model_for_plan(task, context)
for step in plan.steps:
result = execute_step(step)
if result.failed:
fix = ask_model_to_debug(step, result.logs)
execute_step(fix)
run_tests()
create_summary()
open_pull_request_or_request_review()
This is not glamorous, but it works.
The most important rule: give the agent less context
A common mistake is dumping the whole repository into the prompt. That is slow, expensive, and often worse.
Instead, build context in layers:
- project rules
- relevant file tree
- files found by search
- failing test output
- exact task request
For example, if the task is “add CSV export to invoices,” the agent probably needs the invoice model, invoice routes, export utilities, tests, and coding rules. It does not need your entire frontend.
Safety gates I recommend
A coding agent with shell access needs boundaries. I use rules like:
- no production credentials in the workspace
- no destructive shell commands unless explicitly approved
- no direct deploys without review
- always run tests before creating a PR
- summarize every file changed
- if tests fail twice, stop and ask for help
The best agent is not the one that never fails. It is the one that fails safely.
High-ROI tasks to automate first
Start with tasks where correctness is easy to verify:
1. Test generation
Give the agent one module and ask it to create missing tests. This is great because the test runner provides a clear signal.
2. Documentation updates
Ask it to update README sections, API examples, changelogs, or migration notes after a change.
3. Mechanical refactors
Renaming functions, updating imports, replacing deprecated APIs, and applying formatting rules are perfect agent tasks.
4. Bug reproduction
Give the agent an error message and ask it to write a failing test before attempting a fix.
5. Dependency maintenance
The agent can inspect outdated packages, read changelogs, update versions, run tests, and report risks.
A simple prompt template
Use a consistent system prompt:
You are a cautious coding agent working inside a disposable repo checkout.
Your job is to complete the task with the smallest safe change.
Before editing, inspect relevant files.
After editing, run tests.
If you are uncertain or need secrets, stop and ask.
Return a summary with files changed, commands run, and remaining risks.
Then pass task-specific context:
Task: Add CSV export for invoice list.
Repo rules: Use existing service pattern. No new dependency without approval.
Relevant files: ...
Test command: pytest tests/invoices
Cost control
To keep the monthly bill low:
- use a cheaper model for log summaries
- cache project rules
- cap each task at a maximum number of model calls
- truncate logs aggressively
- run small tasks instead of giant tasks
- store previous solutions in memory
In practice, many useful tasks cost cents, not dollars, if the agent retrieves only relevant context.
Final thought
A 24/7 coding agent is not a replacement for engineering judgment. It is a tireless junior teammate for repetitive work. The winning workflow is human direction plus agent execution plus automated tests.
Start small: one repository, one task type, one review gate. Once that loop is reliable, add more tools.
Check out my AI Prompt Packs: https://payhip.com/b/ADsQI | https://payhip.com/b/6lqVh | https://payhip.com/b/XLNPm | https://payhip.com/b/CAN9Z
Top comments (0)