DEV Community

Cover image for Loki Mode at 20K developers: 15 releases in 4 days, and what we learned about verified vs live autonomous coding
Lokesh Mure
Lokesh Mure

Posted on • Originally published at autonomi.dev

Loki Mode at 20K developers: 15 releases in 4 days, and what we learned about verified vs live autonomous coding

I was halfway through a coffee when our self-update telemetry ticked over 20,000 unique developers. Six months ago, Loki Mode was a side project to scratch a personal itch. I wanted an autonomous coding agent I would actually trust to ship a diff into my own repo. Not a Replit-style cloud sandbox. Not a Lovable-style preview. Not a Cursor-style editor pane. A loop I could leave running overnight, walk back to in the morning, and trust the result on the git diff.

This week we shipped 15 minor releases in 4 days, and I think we finally landed the thing.

This post is the honest engineering writeup. Architecture, a real comparison table with check marks, a hands-on walkthrough using a real spec (not the training-wheels quickstart), the issue-to-merged-PR workflow with screenshots, and the parts we got wrong on the way. If you skim, the comparison table in section 4 is where the punchline lives.

20,000 developers, 6,000 weekly active, 500,000 CLI sessions across Norway, USA, Hong Kong, UK and India

1. The problem we set out to solve

When Replit Agent and Lovable started landing late last year, every founder I know lit up. Spec to deployed app in 90 seconds. Public preview URL. Done.

But a quiet thing kept happening on my own builds. The preview would load. The Cmd+R refresh would show "Welcome to your todo app." I would click "add todo," type something, hit submit, and watch the request post to a function that swallowed errors silently. The agent had marked the run "complete." The preview showed something running. The preview was lying.

The thing I wanted was simple: an autonomous loop that refuses to call work done on an empty diff or a failing test. A real gate. The same gate I would want around a junior engineer's first solo PR. Not "this looks running" but "this would pass code review."

That is what Loki Mode is. Built it locally first, open-sourced it, and the user base found it.

2. The numbers, honest version

From PostHog (anonymous, opt-out via LOKI_TELEMETRY_DISABLED=true or DO_NOT_TRACK=1, never captures prompts or PRD content or source code):

Metric Value
Cumulative developers installed 20,000+
Weekly active developers ~6,000
Cumulative CLI sessions 500,000+
Top country (absolute count) United States
Top country (per-capita) Norway
Trending up fast (last 30 days) Hong Kong, United Kingdom, India
Long-tail markets Germany, Brazil, Singapore, Australia
Top install channel Bun (51%), npm (38%), Homebrew (7%), Docker (4%)
Median time-to-first-verified-build ~47 minutes from loki start ./prd.md

What is honest: this is one person (me) maintaining the project. Two open GitHub issues right now. Bus factor of one. The trade is fast release cadence, fast PR review, and a maintainer who replies on Discord within hours.

What the data tells me: the developers pulled hardest to Loki are the ones who would not let a hosted spec-to-app tool touch their repo at all. Self-hosted, your-keys, no proxy. That market is bigger than the hosted-tool TAM and BSL-1.1 source-available was built to serve exactly it.

3. How the architecture actually works

Skip this if you only care about the product comparison. Read it if you want to know what is under the hood before you trust the thing with your repo.

Inputs (the "spec" surface)

A spec is any of:

  • A PRD markdown file: ./prd.md
  • A GitHub, GitLab, Jira, or Azure DevOps issue (URL or shorthand). GitHub accepts 123, #123, owner/repo#123, and full issue URLs. GitLab takes gitlab.com/owner/repo/-/issues/N. Jira takes PROJ-123 or the Atlassian URL. Azure DevOps takes the work-item URL.
  • An OpenAPI document (YAML or JSON)
  • An OpenSpec change directory: ./openspec/changes/feature-x/
  • A plain text or YAML one-liner: loki start "build a markdown editor with file sync"

The CLI auto-detects which kind of spec you handed it based on extension, URL pattern, or none of the above. The unified entry is loki start; loki run <issue-ref> is a kept-working deprecated alias.

The execution loop: RARV-C

Every run iterates through a five-phase closure loop, with the model tier rotating per phase:

Phase Job Default model tier
Reason Architectural decisions, task decomposition, planning Planning (Claude Opus by default)
Act Code generation, file writes, tool use Development (Opus or Sonnet)
Reflect Self-critique, 3-reviewer blind council vote on the current diff Development
Verify Automated quality gates (tests, lint, security, coverage, held-out evals) Fast (Sonnet or Haiku)
Compound Episodic memory write, learning extraction for future runs Fast

The C is what makes the loop compound across sessions. After every iteration, the agent records what it tried, what worked, what failed, and which files were touched into .loki/memory/episodic/. Future runs in the same project (or sibling projects, when cross-project memory is on) get the agent's accumulated context as a "PAST FAILURES TO AVOID" block in the next prompt.

The trust layer

Loki refuses to call a run done on:

  1. An empty diff against the run-start commit. Always blocks.
  2. A red test run when a test runner was detected and ran. Always blocks.
  3. A failing held-out spec eval (section 6 walks this through). Always blocks.
  4. A council REJECT verdict from the 3-reviewer blind review. Always blocks.

Every gate writes machine-readable evidence to .loki/verify/evidence.json and a human-readable report to .loki/verify/report.md. Every completed run also emits a portable .loki/proofs/<run_id>/proof.json + index.html you can hand to a teammate, an auditor, or a PR reviewer.

Provider routing

Loki dispatches to one of these underlying agent CLIs:

Provider Tier Notes
Claude Code Tier 1, E2E-verified primary Default. Deepest SDK integration.
OpenAI Codex Tier 2, Experimental Works end-to-end.
Cline Tier 2, Experimental Via the -y autonomous flag.
Aider Tier 3, Experimental Best for narrower file-edit tasks.

Plus, via ANTHROPIC_BASE_URL, any LLM that speaks the Anthropic API:

# Route the Claude provider through local Ollama (qwen2.5-coder:32b)
export ANTHROPIC_BASE_URL=http://localhost:11434/v1
export ANTHROPIC_API_KEY=ollama
export LOKI_MODEL_OVERRIDE=qwen2.5-coder:32b
loki start ./prd.md
Enter fullscreen mode Exit fullscreen mode

LOKI_MODEL_OVERRIDE only takes effect when ANTHROPIC_BASE_URL is also set, so you can never accidentally reroute an Anthropic-native run.

Full multi-provider setup: autonomi.dev/docs/multi-provider-setup.

4. The comparison table, with check marks

This is the part the README will not tell you because it is hard to write honestly. Here is what each tool is good at, and where each tool is the right pick.

Capability Replit Agent Lovable Cursor Loki Mode
Instant cloud sandbox + URL
Multiplayer collaboration ⚠️
Marketing-page / landing-page taste ⚠️ ⚠️
Editor integration (inline) ⚠️
Runs locally (no cloud upload)
Your own provider keys (no proxy) ⚠️
Fullstack with backend + database ⚠️ ⚠️ ⚠️
Compose-first multi-service (healthchecks)
Background runs (delegate + notify)
CI-gateable verification (exit 0/1/2)
Held-out spec evals (anti-reward-hacking)
Reviewer subcalls cannot mutate code
Machine-readable evidence per run ⚠️
Shareable proof-of-run artifact
Issue-to-PR autonomous workflow
Provider-agnostic (4 + any LLM) ⚠️
Source-available license ✅ (BSL 1.1, → Apache 2.0 in 2030)
Air-gappable
Mobile browser editor

Pick Replit if the goal is to learn, demo, teach, or prototype-with-a-URL fast. They earned that lane honestly.

Pick Lovable if the spec is mostly visual and what you ship is a landing page or design-heavy frontend. Their taste of output is genuinely ahead for that work.

Pick Cursor if you want AI assistance inside the editor where you already work. The Composer + Tab autocomplete are well-tuned to existing muscle memory.

Pick Loki Mode if you are shipping into an existing private codebase, you need a deterministic CI gate on the diff before merge, your provider keys cannot leave your machine, or you want a council that physically cannot edit the code it is reviewing.

These tools can coexist. Use Cursor while you write the spec. Use Replit when teaching the team. Use Lovable for the marketing site for whatever Loki is building. The case for picking Loki Mode is specifically "verified diff before merge into my repo."

5. Workflow 1: a real PRD to a running fullstack app

The shortest path that matters. Drop a markdown spec, get a verified Git repo with a running app.

# Install (Bun recommended; v8 will be Bun-only)
bun install -g loki-mode

# Verify the install
loki version
loki doctor
Enter fullscreen mode Exit fullscreen mode

Write a real spec. The agent reads it as markdown, so be explicit about acceptance criteria. Here is the one I used to test the v7.26.0 compose-first support:

# TaskFlow

A task tracker with user auth, full-text search, and tags.

## Stack
- Backend: Node + Express + Postgres
- Cache + sessions: Redis
- Frontend: React + Vite, served by the same backend

## Acceptance criteria
- POST /api/auth/register creates a user (bcrypt hashed password)
- POST /api/auth/login returns a session cookie
- GET /api/tasks returns the logged-in user's tasks
- POST /api/tasks creates a task with title, body, tags[], due_date
- PATCH /api/tasks/:id updates a task
- DELETE /api/tasks/:id soft-deletes a task
- GET /api/search?q=... returns tasks matching the query (Postgres FTS)
- 401 when the session is missing or expired
- All endpoints return JSON; 422 on invalid input

## Run
- One command: docker compose up
- Healthcheck on the web service must reflect actual readiness
Enter fullscreen mode Exit fullscreen mode

Save that as ./prd.md and run:

loki start ./prd.md
Enter fullscreen mode Exit fullscreen mode

What you will see:

  1. Plan auto-shown. Before the agent does anything, Loki prints a complexity tier, cost estimate, iteration cap, and time estimate. The estimate is real -- it uses the actual model pricing table. Declining costs nothing.
  2. Dashboard auto-opens. A new browser tab opens at http://localhost:57374. (Skipped on CI, with --detach/--background, over SSH without a TTY, with piped stdin, or with LOKI_NO_AUTO_OPEN=1.)
  3. The agent starts iterating. Reason, Act, Reflect, Verify, Compound.

In the dashboard, the left sidebar shows your project. The main area has tabs for Overview, Tasks, RARV Timeline, Quality, Cost, and Live App.

The Live App tab is the workflow change that pulled us past 20K. Before v7.24, you had to cd to the project, run the dev server in another terminal, and pray it talked to the right port. Now the agent is still writing files and the app is already running in the iframe. You click "Add Task," type something, hit submit. You watch the bug get fixed in real time over the next iteration.

Live App Preview iframe showing the in-progress app embedded inside the dashboard. The agent is still writing iteration 6 (FTS index and search route) while the app is already running and serving the search query

For multi-service specs (like the one above), v7.26.0 ships compose-first detection. The agent gets a RUN_CONTRACT instruction telling it to generate a 12-factor docker-compose.yml with a clearly-named primary web service (either web/app by name, or labeled loki.primary=true), healthchecks on every service, depends_on wiring, env-var config, and a committed .env.example. The runner identifies the primary web service by that label and surfaces THAT in the iframe rather than accidentally surfacing a Postgres port.

How Loki turns a 3-line spec into a verified running compose stack

Behind the scenes the council reviews each iteration. The Council tab shows the 3-reviewer blind verdicts with the evidence each reviewer raised (not just APPROVED/REJECTED badges):

A real council verdict from iteration 6: Reviewer 1 APPROVES with cited line numbers, Reviewer 2 raises a CONCERN about bcrypt rounds being hardcoded to 10 when the spec required >=12, Reviewer 3 the devils-advocate APPROVES after running 5 adversarial attacks

The Cost tab tracks per-iteration spend. v7.11.0 added a pre-cap warning at 80% (not just the existing hard stop at 100%); the warning broadcasts over WebSocket so a persistent amber banner appears on every dashboard page if you walked away from the terminal:

Cost panel showing $1.60 of $2.00 spent at iteration 6 with the persistent 80% warning banner across the top, a per-iteration breakdown table with the council subcall just pushing the total over the warn threshold, and the cost-honesty contract showing the estimator quote and dispatched model agree

When the run completes, your project directory contains a working app:

docker compose up           # the stack
curl http://localhost:3000/health   # healthy
npm test                    # 47/47 passing
Enter fullscreen mode Exit fullscreen mode

And a portable proof:

loki proof list             # all proofs for this project
loki proof show <run-id>    # render the HTML in the terminal
loki proof open <run-id>    # open in your browser
loki proof share <run-id>   # publish as a GitHub gist (after redaction preview + confirm)
Enter fullscreen mode Exit fullscreen mode

The proof leads with the itemized bill (cost USD, tokens, per-model breakdown), then files-changed with the diffstat, then per-reviewer council verdicts with evidence, then quality gates, then wall-clock, provider/model, plus an integrity hash. A single chokepoint at autonomy/lib/proof_redact.py runs once before serialization and refuses to emit if it did not run. It scrubs Anthropic/OpenAI/Google/GitHub/AWS/Slack keys, Bearer tokens, JWTs, PEM private-key blocks, named secret assignments, DB URI credentials, and absolute user paths from both the JSON and the rendered HTML.

6. Workflow 2: GitHub issue to merged PR, hands-free

The thing that gets us past "demo tool" and into "production engineering tool." Hand Loki a real issue from your tracker and walk away.

Loki issue-to-PR flow: fetch -> isolate -> RARV-C build -> verify -> ship

# Issue-driven, foreground
loki start owner/repo#123

# Issue-driven, background, auto-PR + auto-merge when verified
loki start 123 --ship --bg
Enter fullscreen mode Exit fullscreen mode

What each flag does (the cascade is documented in loki start --help):

Flag Behavior
--worktree, -w Git worktree isolation. Branch: loki/issue-<n>. Working tree never touched.
--pr Implies --worktree. Auto-creates a PR via gh pr create when the run verifies.
--ship Implies --pr. Auto-merges via gh pr merge once the PR's CI passes.
--bg, --detach, -d Background mode. Implies --worktree. Local desktop notification on completion (v7.22.0).

Supported issue refs (auto-detected):

  • GitHub: 123, #123, owner/repo#123, full issue URL
  • GitLab: https://gitlab.com/owner/repo/-/issues/42
  • Jira: PROJ-123, https://org.atlassian.net/browse/PROJ-123
  • Azure DevOps: https://dev.azure.com/org/project/_workitems/edit/456

When you delegate with --bg, v7.22.0's "delegate then notify" writes a durable completion summary to .loki/COMPLETION.txt and .loki/state/completion.json and fires a local OS notification (macOS osascript, Linux notify-send). Every terminal state notifies and records a summary -- success, max-iterations, stopped, failed, genuinely-blocking pauses. The perpetual-mode auto-clear pause is correctly NOT treated as terminal, so a mid-run pause never produces a false "done" record. Zero network egress.

Opt-in LOKI_DELEGATE_BRANCH=1 isolates a run on a dedicated loki/delegate-<timestamp> branch. Opt-in LOKI_DELEGATE_PR=1 opens a local pull request on completion (a gh call from your own machine, never CI).

7. Workflow 3: gate the diff in CI before it merges

This is the third workflow, and the one I think actually moves the needle on enterprise adoption.

loki verify is a standalone verification module that does NOT enter the autonomous loop. It is the deterministic gate. Five checks scoped to the diff:

loki verify pipeline: five deterministic checks merge into one verdict and one evidence document

Run it locally:

# Verify against the default base ref
loki verify

# Or against a specific ref
loki verify origin/main

# Or for CI as machine-readable JSON
loki verify origin/main --output-json > verify-result.json
Enter fullscreen mode Exit fullscreen mode

Real output from a run on a 14-file diff:

loki verify (run id: a7c2-...)
=============================

Diff base:            merge-base(origin/main, HEAD)..HEAD
Files changed:        14
Lines added:          892
Lines removed:        47

Build         pass    (12.4s)
Tests         pass    47/47 passing  (1.8s)
Static        pass    eslint clean, tsc strict ok  (3.1s)
Secrets       pass    no secrets in diff
Dependencies  pass    no critical CVE in changed packages
Held-out      pass    5 of 5 reserved spec items satisfied

Verdict:      VERIFIED  (exit 0)
Evidence:     .loki/verify/evidence.json
Report:       .loki/verify/report.md
Enter fullscreen mode Exit fullscreen mode

Exit codes:

Code Meaning
0 VERIFIED -- all checks pass with conclusive evidence
1 CONCERNS -- inconclusive evidence, empty diff, or non-blocking warnings
2 BLOCKED -- red test, secret leak, critical CVE, failing held-out item
3 Verifier error (could not complete; never silently passes)

The diff base resolution is merge-base(base, HEAD)..HEAD -- proper PR semantics, not HEAD~1. Inconclusive evidence is never reported VERIFIED. Empty diffs yield CONCERNS, not green. Bare root-level test files are detected so discoverable tests are never silently skipped.

Important scope note: the v7.27.0 MVP is deterministic-only. No LLM in the gate path. The LLM did its work upstream in the RARV-C build loop. A single-reviewer LLM stage and the blind council are sequenced for future releases per the verification spec. This is stated honestly in loki verify --help and in the evidence document (llm_review.status = "skipped").

Drop the same command into GitHub Actions:

# .github/workflows/loki-verify.yml
name: Loki Verify
on:
  pull_request:
    branches: [main]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # full history for merge-base

      - uses: oven-sh/setup-bun@v1

      - name: Install Loki Mode
        run: bun install -g loki-mode

      - name: Run verification
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: loki verify origin/${{ github.base_ref }}

      - name: Upload evidence
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: loki-verify-evidence
          path: .loki/verify/
Enter fullscreen mode Exit fullscreen mode

The job exits 0/1/2/3. The evidence is a structured artifact you can inspect from the PR view. The deterministic checks make it a real gate, not a vibe check.

To see how trust evolves on YOUR repo over time:

loki trust              # one-line verdict + per-axis direction
loki trust --json       # machine-readable trajectory
loki trust-metrics      # block rate, p90 failure, council rejection, cost-per-verified-task
Enter fullscreen mode Exit fullscreen mode

loki trust-metrics aggregates from a durable append-only log at .loki/metrics/trust-events.jsonl. Un-instrumented projects report available: false, never fabricated zeros.

8. Workflow 4 (optional): loki quickstart -- the training-wheels mode

If you have never used the tool before and want a guaranteed-working first run with zero PRD-writing, loki quickstart is a guided 4-step interview that lands the bundled Todo app on four Enter presses. Setup check, idea (default: Todo app), template pick (deterministic offline scorer over the bundled templates, no LLM at this step), plan review.

loki quickstart
Enter fullscreen mode Exit fullscreen mode

It is genuinely just for the first 10 minutes. The real workflows are the three above.

9. Internals: how the held-out gate stops reward-hacking

The technical bit I get the most questions about, and the one I am most proud of from this release wave.

The failure mode

Once an autonomous build loop has access to the spec's acceptance checklist, an aggressive optimizer can tune to that exact checklist. The visible items pass. The spec does not. You ship something that satisfies the letter of the checklist and not the intent.

This is the same failure mode that has plagued ML benchmarks for years (BLEU, ROUGE, leaderboards). For autonomous coding it is worse because the optimizer has access to the test runner and can iterate against it.

The fix

Before the first verification, a deterministic selector reserves a slice (real impl in autonomy/prd-checklist.sh):

# Simplified; see the real bash impl linked above
def select_heldout(checklist_items):
    N = len(checklist_items)
    if N < 4:
        return []  # too small to reserve from
    count = clamp(round(0.25 * N), 1, 5)
    ranked = sorted(checklist_items, key=lambda c: sha256(c.id))
    return ranked[:count]
Enter fullscreen mode Exit fullscreen mode

Selection is reproducible (sha256(id)-ranked, not random), idempotent (only written once to .loki/checklist/held-out.json), and bounded (clamped to 1-5 items).

What the build agent sees

Everything the build loop reads is filtered to exclude the held-out IDs. The build agent literally cannot see them in its context window. It can pass every item it can see and still get blocked at the ship gate if the held-out items fail.

Concretely, the filter removes the IDs from:

  • The visible checklist summary in the run prompt
  • The per-iteration checklist progress gate
  • The completion-prompt count of "N/M items satisfied"
  • The dashboard's task panel

The completion council reads them at the ship gate

At the ship gate (called from both the standard completion route and the force-review route), council_heldout_gate (in autonomy/completion-council.sh) reads .loki/checklist/held-out.json, runs each item against the current diff with a dedicated evaluation prompt, and writes a heldout_eval trust event to .loki/metrics/trust-events.jsonl. A held-out item whose status comes back failing and is not explicitly waived blocks completion like any other critical failure.

Honest limits

This guards the prompt feed, not the filesystem. The reservation lives on disk at .loki/checklist/held-out.json. An agent with read access to the working tree could open that file and learn which items were held out. The guarantee is that no prompt or summary the build agent reads ever names a held-out item. For the most realistic attack-shape (an LLM tuning to its visible context window), that is the right defense.

For a stronger guarantee, an opt-in mode could ship that places the held-out file outside the working tree, with the tradeoff that you cannot rerun verification offline without the file path. We chose the on-disk default and named the limit explicitly.

Opt out entirely with LOKI_HELDOUT_GATE=0.

10. What we learned shipping 15 minor releases in 4 days

The release cadence is not a marketing stunt; it came out of internal practice. A few things that turned out to matter.

Coordinated arcs beat feature dumps

The previous wave (v7.9 through v7.17) was the same shape: 9 minor releases over 2 days, sequenced R1 through R10. The v7.20 through v7.35 wave was 15 minors over 4 days. Each release closes one specific user-facing problem, and the arc has a narrative the README can sustain.

If we had shipped the same functionality as a single v8.0.0 release, we would have spent weeks on integration testing and the user-facing communication would have been a mess.

A council that cannot edit code reviews more honestly

The single biggest quality improvement of the week was v7.33.0's --disallowedTools. Reviewer subcalls now physically cannot use Edit, Write, NotebookEdit, or destructive git (the list includes the git -C / --git-dir / -c flag-prefixed forms too).

Before this, we observed the council occasionally "improving" the diff under review, which technically satisfied the review goal (the new diff was now passable) but defeated the gate's purpose. The fix was small, the impact was large. This is the kind of thing you find by reading your own internal traces, not by running a benchmark.

Opt out with LOKI_REVIEW_TOOL_GUARD=0.

Honest provider labels build trust faster than full-stack promises

When v7.27.0 dropped, the README labels for Codex, Cline, and Aider went from "Supported" to "Experimental." Loki Mode now claims "Tier 1 E2E-verified primary" only for Claude Code.

This was uncomfortable to ship. The README looks less impressive. The marketing surface got smaller. But Discord activity went up, not down, the week after. The audience pulled to a tool like this is allergic to "supports five providers" marketing copy. Saying the smaller true thing builds more trust than saying the larger fuzzy thing.

Cost honesty enforced in code beats cost honesty as a marketing claim

v7.31.0 and v7.32.0 shipped the cost-honesty contract: the loki plan quote, the dashboard's reported model, and the actually-dispatched model agree across every model lever. A sonnet session pin that routes through the development tier to Opus now quotes Opus, not Sonnet. The old behavior underquoted by about 1.7x.

The work was three days of internal plumbing. It does not show up on the README feature list. Users noticed within hours because their cost dashboards stopped lying.

Auto-open the dashboard

The single highest-leverage UX change of the week was the one-line "loki start auto-opens the dashboard." We resisted it for months on the grounds of "developers don't want surprise browser windows." The data was unambiguous: with auto-open, the hit rate of users finishing their first build went up a lot.

Lesson: respect for the user's environment matters less than removing one barrier between them and a successful first run. The opt-out (LOKI_NO_AUTO_OPEN=1, plus auto-skip on CI=true / SSH-no-TTY / piped stdin) is enough.

11. The roadmap and where contributions move the needle fastest

Near-term (next 4 weeks):

  • LLM single-reviewer stage in loki verify. v7.27.0 MVP is deterministic-only. The single-reviewer stage is sequenced next per the verification spec, with the blind council after that.
  • Public hosted backend for loki proof share --hosted. Today the --hosted flag publishes to a user-supplied LOKI_HOSTED_ENDPOINT and prints an honest "no official hosted backend yet" message when unset. We are building the hosted endpoint. Opt-in. The free-forever CLI commitment in docs/OPEN-CORE-BOUNDARY.md stays.
  • Mobile dashboard polish. The dashboard is web-based but assumes a desktop browser. Mobile responsiveness needs work.
  • More benchmark task adapters. We ship loki bench with real adapters for Aider and Claude Code. We need adapters for more competitors. Cleanest contribution surface for an external PR.

Medium-term (next quarter):

  • Replay re-execution mode (loki memory replay --apply). Today loki memory replay is read-only. Re-execution needs proper sandboxing and confirmation; not shipping until that is right.
  • Embedding layer for cross-project memory. Today's retrieval uses token overlap. An embedding layer would catch synonym mismatches the keyword scorer misses.
  • 10k-episode memory index at p95 < 500ms.

Where contributions land fastest right now:

  • Benchmark task adapters for any AI coding tool that has a CLI. The contract is clean, the integration is small, and we will land any well-formed PR within 48 hours.
  • Agent and template marketplace packs for loki agent install. Install is data-only by construction (manifests are never eval'd, exec'd, or imported), so contributions are safe to land without security review for each one.
  • Language server coverage. We auto-spawn an lsp-proxy MCP for TypeScript, Python, Go, Rust. Adding Ruby, PHP, Kotlin, Swift, Elixir is small and well-scoped.
  • Dashboard panels and i18n. The dashboard is Web Components + Tailwind. Adding panels is straightforward.

If any of these interest you, drop into Discord and say hello. I respond within hours.

12. Try it in two minutes

# Install (Bun recommended; v8 will be Bun-only)
bun install -g loki-mode

# Verify the install
loki version
loki doctor

# Workflow 1: from a PRD
loki start ./prd.md

# Workflow 2: from any GitHub/GitLab/Jira issue
loki start owner/repo#123 --ship --bg

# Workflow 3: CI gate on any branch or PR diff
loki verify origin/main

# Inspect trust trajectory across all your runs
loki trust

# Per-iteration cost visibility
loki cost --last 10

# First-time exploration
loki quickstart
Enter fullscreen mode Exit fullscreen mode

If you build something with it, drop a screenshot in Discord. I will boost it.

Links

If you read this far, thank you. Tell me in the comments which part of the verification surface you would push back on, or what would make you trust an autonomous agent to ship a diff into your own codebase. That feedback is the input I most need.


Loki Mode (also called Autonomi) is built and maintained by @asklokesh. Source-available under BSL 1.1; converts to Apache 2.0 on March 19, 2030. We never proxy your provider keys, never collect prompts or code, and the telemetry that produced the numbers above is opt-out via LOKI_TELEMETRY_DISABLED=true or DO_NOT_TRACK=1.

Top comments (0)