DEV Community: gen99

I built a local-first AI desktop where chat turns into cron jobs, parallel batches, and editable .pptx

gen99 — Tue, 23 Jun 2026 11:59:12 +0000

TL;DR

Praxia Desktop is what I wanted when I got tired of:

Pasting "do this every Monday" into ChatGPT and then forgetting to actually do it every Monday
Manually running the same prompt against 50 PDFs because nothing scaled my one-PDF chat workflow
Asking an LLM for "a deck" and getting markdown back when I needed an actual editable PowerPoint to ship to a stakeholder

So I built a desktop app where chat is the only interface, and the agent on the other side:

Schedules itself — say "every Monday at 9 AM summarize the diffs in Documents/" and a POSIX cron job appears
Fans out in parallel — say "for each of these 50 PDFs, extract the action items" and 50 agents run concurrently with live progress
Writes native, editable files — say "draft a Q3 retrospective deck" and a real .pptx lands in your workspace (charts, takeaways, the lot — not screenshots, not Markdown)

Everything runs locally — your files and chat history live in your Praxia folder, not on a server I control. LLM calls go directly from your machine to whichever provider you pick (OpenAI, Anthropic, Azure, Gemini, or your own Ollama / LM Studio instance). I never see your prompts.

Free, open source (Apache 2.0), and Windows 10/11 x64.

The rest of this post walks through each of those three things in detail — what it looks like, why it matters, how it's built, and what's interesting about the implementation.

Thing 1: "Schedule this for me, weekly"

Cron is one of the great computing primitives and basically nobody outside of sysadmins uses it directly. The interface is hostile. The mental model ("five fields separated by spaces representing minute / hour / day / month / weekday, all of which can be * or 1-5 or */15 …") is something I always have to re-derive from crontab -e.

I wanted: I say it, it gets scheduled.

In Praxia, this works:

me: every Monday at 9 AM, summarize what changed in Documents/team-notes/
    since the previous Monday, and write the summary to
    workspace/weekly-team-summary.md

The agent on the other side reads that, infers:

This is a recurring task (not a one-shot)
The schedule is 0 9 * * 1 in POSIX cron
The action involves a diff over a watched folder + a write to the workspace
The write needs a user approval gate at execution time, not at schedule time

A schedule gets registered. The Schedules tab is where it lives:

)

You can inspect the cron expression, the prompt, the next run time, the last run result. You can pause it, edit the cron, edit the prompt, delete it. The same agent that registered it can also modify it — "actually, change that to every weekday morning."

This isn't "natural language cron parsing as a feature." That's a regex with extra steps. What makes it useful is that the execution context at schedule-fire time is the same one you'd get from a fresh chat: same memory, same connectors, same approval-gate semantics, same provider config.

Why it matters: lots of LLM tools can describe a workflow. Few can execute one on a schedule with all the actual side effects (file writes, API calls) properly gated and observable.

Thing 2: Fan out a chat across 50 files

The first time I tried to use an LLM at non-trivial scale was when I had 50 candidate résumés to triage. ChatGPT was great for one résumé. For 50, my options were:

Open 50 tabs (this is what I actually did the first time, regrettably)
Write a Python script (the right answer, but I was on a deadline)
Use a SaaS that does parallel agent runs (paid, locks me in)

Praxia's answer:

me: for each PDF in Documents/candidates/, extract strengths, weaknesses,
    tech stack, and a recommended team. Put each candidate's analysis in
    workspace/candidates/<name>.md and write a comparison matrix to
    workspace/candidates/MATRIX.md.

The agent:

Lists the PDFs in the folder
Spawns N parallel sub-agents (where N respects the rate limits of whatever LLM provider you've configured — OpenAI's TPM/RPM, Anthropic's per-minute budget, etc.)
Watches them concurrently in the Batches tab:

Collects results, writes per-candidate analyses, builds the comparison matrix
Surfaces failures individually so you can retry the 3 that timed out without re-running the 47 that succeeded

The user-facing UX is "I asked the agent to do a thing and watched it happen." The agent-facing UX is a fan_out tool that takes a list of input items and a sub-prompt template.

Why it matters: parallel execution is the gap between "useful for one document" and "useful for my actual workload." Once you have it, you find use cases everywhere — translating a backlog of strings, classifying support tickets, extracting fields from a stack of invoices, comparing N proposals against a rubric.

Why it's hard to do well: rate-limit awareness. Praxia configures concurrency based on the active LLM provider's published limits, so you don't have to remember that GPT-4o-mini's TPM ceiling is different from Claude Sonnet's, or that your own Ollama instance is bottlenecked on local GPU.

Thing 3: Native, editable `.pptx` from chat

This is the one I'm most proud of and it's the most technically interesting bit.

When LLMs "generate slides," what you typically get is Markdown, or an HTML preview, or a screenshot, or — at best — a Reveal.js page. What you almost never get is a real .pptx file you can open in PowerPoint and edit.

Praxia does the real thing:

me: draft a Q3 retrospective deck from Documents/sales/. Three charts,
    one summary slide, one next-actions slide. Corporate colors are
    navy and white.

What happens under the hood:

Plan — the LLM drafts an outline: 5 slides, each with a title, content type (bullets / chart / text), and source citations from the indexed sales folder.
Code-gen — the LLM writes Python code that uses python-pptx to construct the deck. Not Markdown that gets converted. Actual python-pptx calls: shapes.add_chart(), shapes.add_text_box(), text_frame.paragraphs[0].font.color.rgb = RGBColor(0x1f, 0x2a, 0x5e).
Render — the bundled Python sidecar executes that code. A .pptx file lands on disk.
Vision review — the deck is converted to PNGs (one per slide) and sent to a vision-capable LLM (GPT-4o, Claude 3.5 Sonnet, etc.) with the prompt: "Look at these slides. Are titles legible? Do text boxes fit? Are colors consistent? Are charts misaligned?"
Iterate — if the vision pass flags issues ("slide 3 title overflows", "chart 2 has overlapping labels"), the LLM regenerates the offending parts and step 4 runs again. Usually one or two iterations is enough.
Approve — Praxia surfaces the final deck in an approval dialog. You click Apply. The .pptx lands in your workspace folder.

The result you can open in PowerPoint and edit normally. Text boxes are real text boxes. Charts are real charts (driven by embedded data, not pasted images). Color schemes are consistent because the vision pass explicitly checks for inconsistency.

Why this is hard: LLMs are bad at spatial reasoning. They cheerfully generate text_box(left=Inches(5), top=Inches(3), width=Inches(8), …) on a 10-inch-wide slide and don't notice the box runs off the edge. The vision-review loop catches this — the LLM can't "see" the slide it just generated through code alone, but it can see the rendered PNG. That second pass closes the loop.

Why it matters: the difference between "I have a Markdown outline of a deck" and "I have a polished PowerPoint I can send to my CFO" is roughly an hour of fiddly work that AI tools have historically not closed. This closes it.

The shape of the app

Three tabs:

Chat — where you talk to the agent
Documents — folders Praxia watches and indexes (RAG-style retrieval, fully local)
Workspace — where Praxia writes files for you, all gated by an approval dialog

Every disk-touching action (file write, file delete, file overwrite) goes through a per-operation approval dialog. The agent never silently writes. If it tries to overwrite an existing file, you see the diff and decide.

This is one of the harder parts of building an agentic desktop app, and it's where I disagree most strongly with cloud-first AI tools. Trust comes from being able to say no. A model that occasionally hallucinates a wrong file path is fine if you have a dialog telling you what it's about to do. A model with the same hallucination rate writing silently to disk is a disaster.

Local-first, for real

Your documents and chat history live in ~/Praxia/ (or wherever you point it). I do not operate a backend that holds your data. When you send a chat message, Praxia routes it to whichever LLM provider you configured — OpenAI, Anthropic, Azure, Gemini, Ollama, LM Studio — and that provider sees the request directly from your machine.

If you want strict on-device operation: configure Ollama or LM Studio as your provider. No HTTPS calls leave your machine. The agent loop, retrieval, scheduling, batch fan-out, and .pptx rendering all happen locally.

This is a different stance than most "local AI" desktop apps, which still phone home for telemetry or model registry checks. Praxia does neither. The only outbound traffic is to the LLM provider you selected; if you selected a local one, there's no outbound traffic.

The interesting bits under the hood

Architecture (this is the only place I'll go nerdy):

Tauri 2 shell (Rust + Svelte 4 + WebView2)
        ↓ spawn / localhost HTTP
PyInstaller-frozen Python sidecar
  └─ FastAPI + litellm + chromadb + python-pptx + matplotlib + pypdfium2 + ...

The shell is a Tauri 2 app. The agent backend is a Python FastAPI server bundled as a single praxia-server.exe via PyInstaller, started as a child process at app launch. They talk over localhost HTTP.

Why this shape:

The same Python code path runs as pip install praxia for CLI users AND as the desktop sidecar. One codebase, two distribution channels.
WebView2 is enormously lighter than Electron's Chromium.
The Rust shell handles the system-integration parts (file dialogs, OAuth callbacks, OS notifications) where Tauri 2's plugin ecosystem already has the right abstractions.

A few specific things I think are worth flagging:

5-layer memory stack. Personal memory (auto-extracted from chats) → Sleep-time consolidation → Shared org memory → Frozen Markdown layer (git-managed) → optional graph layer. Three independent promotion paths (frequency / outcome / self-eval) decide which personal observations get elevated to organizational knowledge. Most agent platforms paywall this. It ships in Praxia's OSS.

Verifier loop. A CommandedAgent mode wraps the free-running agent loop with pre-retrieval + grounding verification + bounded retry + an explicit abstain path. Calibrated against an in-house multi-hop RAG harness. The difference between this and the un-verified AutonomousAgent mode is roughly 20 points of factual accuracy on private-corpus QA — at the cost of slower responses.

Per-user OAuth across 20+ SaaS connectors. When Alice connects Notion to Praxia, Praxia stores Alice's OAuth token and queries Notion as Alice. Bob's Notion view is different because Bob's OAuth token is different. This sounds obvious; most AI tools use a single service account and leak data across users.

MCP support, both stdio and HTTP/SSE. Any Model Context Protocol server you wrote for Claude Desktop or Cursor works in Praxia unchanged.

If any of those are interesting, the GitHub repo has architecture docs and the actual implementation. It's all Python with type hints, formatted with Ruff, ~780 tests passing.

Bonus: how it got onto the Microsoft Store

Short version: I wrapped the Tauri 2 build in MSIX, declared runFullTrust with a five-bullet justification ("spawn Python sidecar, read user folders, localhost loopback, outbound HTTPS to user-configured LLM API, native document generation"), and submitted to Partner Center. Cert passed on the first try in 4 days. Microsoft re-signs the MSIX with the Store identity, so SmartScreen no longer flags installation.

Try it

Microsoft Store: https://apps.microsoft.com/detail/9P9LSR34HZF3 (Windows 10/11, free)
GitHub (Apache 2.0): https://github.com/praxia-dev/praxia
PyPI (CLI version): pip install praxia
4-minute demo: https://youtu.be/Z3DFa2saHJg
Website: https://praxia.tools/
Discussions (GitHub): https://github.com/praxia-dev/praxia/discussions

If you build something interesting on top of it, please drop a note in Discussions or ping me on X at @praxia_dev. I want to know what people make.

I spent 5 weeks building an open-source multi-agent orchestrator. The hard part wasn't the agents — it was the memory.

gen99 — Tue, 02 Jun 2026 15:36:47 +0000

This is the launch post in a series on building Praxia, an Apache-2.0 multi-agent orchestrator. Later posts go deep on the TiDB Vector memory backend and a Japanese-specialized STT integration.

TL;DR

This spring I built and shipped Praxia, a multi-agent orchestrator OS, from scratch in about 5 weeks of nights and weekends (Apache-2.0).

🚀 PyPI: pip install praxia — https://pypi.org/project/praxia/
📦 GitHub: https://github.com/praxia-dev/praxia
🎬 60-second demo: https://youtu.be/o_6NbjJU1AA
🌐 Landing: https://praxia.tools/

The differentiator is automatic personal → organizational memory promotion. The "prompts that actually work," which a senior engineer painstakingly tunes, usually stay locked in that one person's head. Praxia tries to solve that with a 5-layer memory stack + a 3-path promotion engine.

What I actually started — the decision

I'd been using LangChain / CrewAI / AutoGen at work since late 2025, and one structural discomfort kept nagging at me:

What really separates a good agent from a bad one isn't the library or the model — it's the accumulated domain-specific trial and error.

And that accumulation almost always lives in one senior person's head (their Cursor / VS Code / Obsidian). Tacit knowledge that evaporates the day they leave. General-purpose frameworks are powerless against that.

In April 2026 I started writing code between day-job hours. A month later I shipped v0.1.0 to PyPI and GitHub.

Why existing frameworks didn't cut it

Four walls I hit in practice:

Wall	What was happening
Setup complexity	2-3 days just to get something running. Can't make a production call.
Tacit knowledge doesn't propagate	The prompts that work stay in one person's private space.
No evidence for evaluation	"It runs" doesn't guarantee "it works."
Agents stagnate	Build it once, and there's no feedback loop.

The second one was the killer. General agent frameworks give you strong primitives, but the process of accumulating knowledge into the organization is left entirely to the implementer.

The core design — 5 layers + 3 paths

The 5-layer memory stack

L1 PersonalMemory   Per-user (6 backends: JSON / Mem0 / Letta / Zep / Hindsight / LangMem)
L2 PromotionEngine  Nightly batch. Decides L1 → L3 promotion
L3 SharedMemory     Org-wide. RBAC gating, time decay
L4 MarkdownStore    Git-managed, PR review required, immutable
L5 GraphLayer       Optional (Zep / Graphiti), relation extraction

The key is that L1 → L4 is not manual. A sleep-time consolidator (nightly batch) scans personal memory and auto-promotes the right parts into organizational knowledge.

The 3-path promotion engine

Memory promotion is evaluated in parallel across three independent signals:

Frequency — facts repeated across N+ people
Outcome correlation — co-occurrence with wins / approved PRs / passing tests
LLM self-eval — a 0..1 "org-knowledge candidacy" score

The final score is a weighted blend, and any single decisive path triggers promotion. This deliberately avoids single-mechanism dependence (where one broken signal takes the whole thing down).

The design choices that made "I can build this myself" possible

Shipping in 5 weeks came down to four design choices.

1. Seven extension points

Every extension point is built on the same praxia.extensions.Registry primitive:

Extension point	~LoC	Entry point
Connector (Box / Notion / Slack …)	~50	`praxia.connectors`
Memory backend	~80	`praxia.memory_backends`
File parser	~30	`praxia.parsers`
Output exporter	~30	`praxia.exporters`
OAuth provider	~20	`praxia.oauth_providers`
Skill	~50	`praxia.skills`
Flow	~50	`praxia.flows`

"Extend via a pyproject.toml entry point, never edit core files" keeps the cognitive load low when you write your own.

2. Apache-2.0 with everything included (no paywall)

SSO (Google / Microsoft Entra / Okta / GitHub / Keycloak), RBAC, audit logs, per-user OAuth (13 providers), KMS-backed token encryption (AWS / Azure / GCP / Vault / local) — all in the OSS core.

Most of what commercial agent platforms paywall as an "Enterprise tier" ships here under Apache-2.0. That directly speeds up adoption decisions.

3. 100+ providers via LiteLLM

Provider quirks (Anthropic not supporting response_format, GPT-5.x disallowing temperature, Azure's deployment-name format, …) are absorbed at the LiteLLM layer. No provider-specific API keys leak into Praxia core. Fully offline operation is possible too (Ollama + a local model + backend=json).

4. A Streamlit UI that's easy to throw away

The UI is Streamlit, but the backend also runs headless as praxia serve (FastAPI). When I swap to Next.js or mobile later, I throw away only the UI layer.

What went into v0.1.0, and what didn't

Shipped (deliberately a full set):

5-layer memory + 3-path promotion engine
6 business skills (investment / sales / design / procurement / patent / legal)
6 LTM backends + Composite/Routed parallel fusion
Per-user OAuth for 13 providers
SSO + RBAC + ACL + audit logs
Autonomous agent (LLM-driven tool-use loop)
Document Designer (sandboxed python-pptx / docx → designed file output)
i18n in 8 languages (en / ja / zh-CN / ko / es / fr / de / pt-BR)

Deliberately deferred (v0.2+):

Multi-tenant GUI — OSS targets "single-org self-host"; SaaS-grade tenant isolation belongs in a future Open Core tier
PDF output — LibreOffice-based workflow recommended for now
Native Pinecone / Weaviate / Qdrant backends — can be wrapped via mem0 through Composite/Routed, so low priority

The "what NOT to build" list was as important as the "what to build" list.

The TOP 3 things that actually ate my time

This was a "ship something working in 5 weeks" sprint, and the three things that genuinely ate time were all in the memory layer and promotion engine.

1. Blending three signals that live on completely different scales

The PromotionEngine evaluates L1 → L3 promotion across Frequency / Outcome correlation / LLM self-eval in parallel. This was trickier than I expected:

Frequency is an integer in 0..∞ (reference count for the same fact)
Outcome correlation is a ratio in 0..1 (success rate of tasks where the fact co-occurred)
LLM self-eval is a continuous 0..1 — but non-deterministic, and the score distribution differs per provider (GPT-5 strict, median ~0.4; Claude lenient ~0.7; Gemini bimodal)

My first implementation was a plain weighted average. Because frequency is unbounded, "facts I just happened to touch a lot in L1" floated to the top. I ended up here:

@dataclass(frozen=True)
class PromoteSignal:
    frequency: float       # raw count
    outcome_corr: float    # 0..1
    self_eval: float       # 0..1, median of N=3 LLM calls

def decide_promotion(sig: PromoteSignal, cfg: PromoteConfig) -> bool:
    # Z-score normalize on a rolling 30-day population, then sigmoid
    z_freq = sigmoid((sig.frequency - cfg.freq_mean) / cfg.freq_std)
    # OR logic with per-path threshold
    return (
        z_freq              > cfg.freq_threshold       # 0.85
        or sig.outcome_corr > cfg.outcome_threshold    # 0.70
        or sig.self_eval    > cfg.self_eval_threshold  # 0.80
    )

Key points:

Z-score normalization tames the runaway frequency signal
OR logic with per-path thresholds means "any single decisive path promotes" — deliberately avoiding single-mechanism dependence
LLM self-eval uses the median of N=3 calls to average out non-determinism (I tried N=5 — the benefit plateaued)
decide_promotion is a pure function, so I can replay historical promotion logs to do parameter sensitivity analysis

It took ~5 redesigns to land on "Z-score → sigmoid → OR with thresholds." Lesson: don't start signal fusion with a weighted average. Per-path decisive thresholds + OR turned out to be the most robust.

2. Reconciling "fan-out search × single-destination write" in the Composite backend

The Composite backend sends search queries in parallel to multiple backends (JSON / Mem0 / TiDB / Letta…) and fuses results with Reciprocal Rank Fusion (RRF). But writes go to a single backend specified by write_to=. This asymmetry created three pitfalls.

(a) Add a backend later, and past data is invisible to it

Running Composite(backends=[A, B], write_to="A") and then adding C means C has none of the past writes. Search fan-out becomes lopsided — 2 backends hit, 1 doesn't. I discovered this during dogfooding as "one backend just has lower search quality."

→ Added a replay_writes(source=A, target=C, since=...) admin API after the fact. Needed for rebalancing.

(b) Graceful degradation when write_to goes down

When write_to is down, stopping writes but continuing reads looks attractive. But that can cause "a record that was hitting in search disappears on the next search" (because the write never re-runs after recovery).

I ended up dropping graceful degradation and raising instead. Half-baked availability stacks a data-truthfulness problem on top of eventual consistency. "When it's down, honestly say it's down" turned out to be the easiest to operate.

(c) Tie-breaking in RRF fusion

When multiple backends return the same record_id at rank 1, the scores tie exactly. Without a tie-breaking rule, ordering depends on backend registration order, and CI tests go flaky.

→ Introduced lexicographic tie-breaking on (rrf_score, timestamp DESC, backend_priority). Now ordering is reproducible even in CI.

3. Idempotency of the sleep-time consolidator

Since the nightly batch re-runs L1 → L3 promotion, not promoting the same fact twice is the crux of idempotency. Strict record_id-based dedup wasn't enough:

The same user records "Customer X prioritizes ROI" and "X-san values ROI" as two different phrasings → different record_ids, semantically identical
LLM self-eval is non-deterministic: the same text scores 0.78 / 0.82 / 0.76, so near a threshold you get "didn't promote yesterday, promoted today"

So I added fuzzy dedup:

def is_duplicate(candidate: Record, l3_recent: list[Record]) -> bool:
    # Already-promoted check: any L3 record within recent window whose
    # embedding cosine-sim >= threshold is treated as duplicate
    cand_vec = embed(candidate.text)
    return any(
        cosine_sim(cand_vec, r.embedding) >= 0.92
        for r in l3_recent  # already filtered to last 30 days
    )

The hard part was tuning the two thresholds (similarity + window):

Setting	Problem it causes
0.95 / 7 days (tight)	"Same fact" with different wording ends up duplicated across L3
0.85 / 90 days (loose)	"New but similar facts" (e.g. a similar trend for customer Y) get suppressed
0.92 / 30 days (adopted)	On a hand-labeled set of 50, both false-merge and false-split stayed < 5%

The deciding factor was instrumenting both directions at once. Not just "rate of missed duplicates" but also "rate of treating genuinely distinct facts as duplicates." Track only one side and you inevitably tune lopsided. I later applied this to the PromotionEngine's whole calibration loop.

v0.1.0 by the numbers

Metric	Value
Codebase	~25,000 LoC (Python + tests + i18n)
Tests	431 passing
Supported LLM providers	100+ via LiteLLM
Bundled connectors	19 (pull + push)
Per-user OAuth providers	13
Docs	30,000+ chars across EN + JA
Dev time	~5 weeks (nights and weekends)

Next 3 months

v0.2: First-class TiDB Vector / pgvector backends (currently via mem0 wrap)
v0.2: localhost loopback OAuth (drop the praxia serve requirement)
v0.3: Multi-tenant org features (the Open Core entry point)
Docs: expanded English tutorials
Community: Discord, more active Discussions

For anyone in the same spot

Three things I felt after sprinting for 5 weeks, for anyone debating whether to start a solo OSS project:

Ship frequency over polish. Shipping v0.1.0 teaches you far more than polishing v0.0.4 forever.
State your differentiator in one line. For Praxia: "personal → org memory auto-promotion." If you can't say it, you can't write the post, can't pitch it, and PRs won't come.
The real competitor for OSS is the commercial SaaS solving the same problem — not LangChain or CrewAI, but the paywall on paid agent platforms. Bundling those features under Apache-2.0 is itself the differentiation.

Closing

⭐ Stars / 🍴 Forks / Issues / PRs all welcome.
github.com/praxia-dev/praxia

If you liked this, the 60-second demo is here: https://youtu.be/o_6NbjJU1AA

Connecting "individual brilliance × organizational continuity" with AI — that's the mission Praxia started with this spring.

Next in this series: the enterprise-platform features I put directly in the OSS core (SSO, RBAC, audit logs, KMS-envelope-encrypted OAuth tokens) and why none of them are behind an Enterprise tier.

DEV Community: gen99

I built a local-first AI desktop where chat turns into cron jobs, parallel batches, and editable .pptx

TL;DR

Thing 1: "Schedule this for me, weekly"

Thing 2: Fan out a chat across 50 files

Thing 3: Native, editable .pptx from chat

The shape of the app

Local-first, for real

The interesting bits under the hood

Bonus: how it got onto the Microsoft Store

Try it

I spent 5 weeks building an open-source multi-agent orchestrator. The hard part wasn't the agents — it was the memory.

TL;DR

What I actually started — the decision

Why existing frameworks didn't cut it

The core design — 5 layers + 3 paths

The 5-layer memory stack

The 3-path promotion engine

The design choices that made "I can build this myself" possible

1. Seven extension points

2. Apache-2.0 with everything included (no paywall)

3. 100+ providers via LiteLLM

4. A Streamlit UI that's easy to throw away

What went into v0.1.0, and what didn't

The TOP 3 things that actually ate my time

1. Blending three signals that live on completely different scales

2. Reconciling "fan-out search × single-destination write" in the Composite backend

3. Idempotency of the sleep-time consolidator

v0.1.0 by the numbers

Next 3 months

For anyone in the same spot

Closing

Thing 3: Native, editable `.pptx` from chat