gen99

Posted on Jun 23

I built a local-first AI desktop where chat turns into cron jobs, parallel batches, and editable .pptx

#ai #llm #opensource #productivity

TL;DR

Praxia Desktop is what I wanted when I got tired of:

Pasting "do this every Monday" into ChatGPT and then forgetting to actually do it every Monday
Manually running the same prompt against 50 PDFs because nothing scaled my one-PDF chat workflow
Asking an LLM for "a deck" and getting markdown back when I needed an actual editable PowerPoint to ship to a stakeholder

So I built a desktop app where chat is the only interface, and the agent on the other side:

Schedules itself — say "every Monday at 9 AM summarize the diffs in Documents/" and a POSIX cron job appears
Fans out in parallel — say "for each of these 50 PDFs, extract the action items" and 50 agents run concurrently with live progress
Writes native, editable files — say "draft a Q3 retrospective deck" and a real .pptx lands in your workspace (charts, takeaways, the lot — not screenshots, not Markdown)

Everything runs locally — your files and chat history live in your Praxia folder, not on a server I control. LLM calls go directly from your machine to whichever provider you pick (OpenAI, Anthropic, Azure, Gemini, or your own Ollama / LM Studio instance). I never see your prompts.

Free, open source (Apache 2.0), and Windows 10/11 x64.

The rest of this post walks through each of those three things in detail — what it looks like, why it matters, how it's built, and what's interesting about the implementation.

Thing 1: "Schedule this for me, weekly"

Cron is one of the great computing primitives and basically nobody outside of sysadmins uses it directly. The interface is hostile. The mental model ("five fields separated by spaces representing minute / hour / day / month / weekday, all of which can be * or 1-5 or */15 …") is something I always have to re-derive from crontab -e.

I wanted: I say it, it gets scheduled.

In Praxia, this works:

me: every Monday at 9 AM, summarize what changed in Documents/team-notes/
    since the previous Monday, and write the summary to
    workspace/weekly-team-summary.md

The agent on the other side reads that, infers:

This is a recurring task (not a one-shot)
The schedule is 0 9 * * 1 in POSIX cron
The action involves a diff over a watched folder + a write to the workspace
The write needs a user approval gate at execution time, not at schedule time

A schedule gets registered. The Schedules tab is where it lives:

)

You can inspect the cron expression, the prompt, the next run time, the last run result. You can pause it, edit the cron, edit the prompt, delete it. The same agent that registered it can also modify it — "actually, change that to every weekday morning."

This isn't "natural language cron parsing as a feature." That's a regex with extra steps. What makes it useful is that the execution context at schedule-fire time is the same one you'd get from a fresh chat: same memory, same connectors, same approval-gate semantics, same provider config.

Why it matters: lots of LLM tools can describe a workflow. Few can execute one on a schedule with all the actual side effects (file writes, API calls) properly gated and observable.

Thing 2: Fan out a chat across 50 files

The first time I tried to use an LLM at non-trivial scale was when I had 50 candidate résumés to triage. ChatGPT was great for one résumé. For 50, my options were:

Open 50 tabs (this is what I actually did the first time, regrettably)
Write a Python script (the right answer, but I was on a deadline)
Use a SaaS that does parallel agent runs (paid, locks me in)

Praxia's answer:

me: for each PDF in Documents/candidates/, extract strengths, weaknesses,
    tech stack, and a recommended team. Put each candidate's analysis in
    workspace/candidates/<name>.md and write a comparison matrix to
    workspace/candidates/MATRIX.md.

The agent:

Lists the PDFs in the folder
Spawns N parallel sub-agents (where N respects the rate limits of whatever LLM provider you've configured — OpenAI's TPM/RPM, Anthropic's per-minute budget, etc.)
Watches them concurrently in the Batches tab:

Collects results, writes per-candidate analyses, builds the comparison matrix
Surfaces failures individually so you can retry the 3 that timed out without re-running the 47 that succeeded

The user-facing UX is "I asked the agent to do a thing and watched it happen." The agent-facing UX is a fan_out tool that takes a list of input items and a sub-prompt template.

Why it matters: parallel execution is the gap between "useful for one document" and "useful for my actual workload." Once you have it, you find use cases everywhere — translating a backlog of strings, classifying support tickets, extracting fields from a stack of invoices, comparing N proposals against a rubric.

Why it's hard to do well: rate-limit awareness. Praxia configures concurrency based on the active LLM provider's published limits, so you don't have to remember that GPT-4o-mini's TPM ceiling is different from Claude Sonnet's, or that your own Ollama instance is bottlenecked on local GPU.

Thing 3: Native, editable `.pptx` from chat

This is the one I'm most proud of and it's the most technically interesting bit.

When LLMs "generate slides," what you typically get is Markdown, or an HTML preview, or a screenshot, or — at best — a Reveal.js page. What you almost never get is a real .pptx file you can open in PowerPoint and edit.

Praxia does the real thing:

me: draft a Q3 retrospective deck from Documents/sales/. Three charts,
    one summary slide, one next-actions slide. Corporate colors are
    navy and white.

What happens under the hood:

Plan — the LLM drafts an outline: 5 slides, each with a title, content type (bullets / chart / text), and source citations from the indexed sales folder.
Code-gen — the LLM writes Python code that uses python-pptx to construct the deck. Not Markdown that gets converted. Actual python-pptx calls: shapes.add_chart(), shapes.add_text_box(), text_frame.paragraphs[0].font.color.rgb = RGBColor(0x1f, 0x2a, 0x5e).
Render — the bundled Python sidecar executes that code. A .pptx file lands on disk.
Vision review — the deck is converted to PNGs (one per slide) and sent to a vision-capable LLM (GPT-4o, Claude 3.5 Sonnet, etc.) with the prompt: "Look at these slides. Are titles legible? Do text boxes fit? Are colors consistent? Are charts misaligned?"
Iterate — if the vision pass flags issues ("slide 3 title overflows", "chart 2 has overlapping labels"), the LLM regenerates the offending parts and step 4 runs again. Usually one or two iterations is enough.
Approve — Praxia surfaces the final deck in an approval dialog. You click Apply. The .pptx lands in your workspace folder.

The result you can open in PowerPoint and edit normally. Text boxes are real text boxes. Charts are real charts (driven by embedded data, not pasted images). Color schemes are consistent because the vision pass explicitly checks for inconsistency.

Why this is hard: LLMs are bad at spatial reasoning. They cheerfully generate text_box(left=Inches(5), top=Inches(3), width=Inches(8), …) on a 10-inch-wide slide and don't notice the box runs off the edge. The vision-review loop catches this — the LLM can't "see" the slide it just generated through code alone, but it can see the rendered PNG. That second pass closes the loop.

Why it matters: the difference between "I have a Markdown outline of a deck" and "I have a polished PowerPoint I can send to my CFO" is roughly an hour of fiddly work that AI tools have historically not closed. This closes it.

The shape of the app

Three tabs:

Chat — where you talk to the agent
Documents — folders Praxia watches and indexes (RAG-style retrieval, fully local)
Workspace — where Praxia writes files for you, all gated by an approval dialog

Every disk-touching action (file write, file delete, file overwrite) goes through a per-operation approval dialog. The agent never silently writes. If it tries to overwrite an existing file, you see the diff and decide.

This is one of the harder parts of building an agentic desktop app, and it's where I disagree most strongly with cloud-first AI tools. Trust comes from being able to say no. A model that occasionally hallucinates a wrong file path is fine if you have a dialog telling you what it's about to do. A model with the same hallucination rate writing silently to disk is a disaster.

Local-first, for real

Your documents and chat history live in ~/Praxia/ (or wherever you point it). I do not operate a backend that holds your data. When you send a chat message, Praxia routes it to whichever LLM provider you configured — OpenAI, Anthropic, Azure, Gemini, Ollama, LM Studio — and that provider sees the request directly from your machine.

If you want strict on-device operation: configure Ollama or LM Studio as your provider. No HTTPS calls leave your machine. The agent loop, retrieval, scheduling, batch fan-out, and .pptx rendering all happen locally.

This is a different stance than most "local AI" desktop apps, which still phone home for telemetry or model registry checks. Praxia does neither. The only outbound traffic is to the LLM provider you selected; if you selected a local one, there's no outbound traffic.

The interesting bits under the hood

Architecture (this is the only place I'll go nerdy):

Tauri 2 shell (Rust + Svelte 4 + WebView2)
        ↓ spawn / localhost HTTP
PyInstaller-frozen Python sidecar
  └─ FastAPI + litellm + chromadb + python-pptx + matplotlib + pypdfium2 + ...

The shell is a Tauri 2 app. The agent backend is a Python FastAPI server bundled as a single praxia-server.exe via PyInstaller, started as a child process at app launch. They talk over localhost HTTP.

Why this shape:

The same Python code path runs as pip install praxia for CLI users AND as the desktop sidecar. One codebase, two distribution channels.
WebView2 is enormously lighter than Electron's Chromium.
The Rust shell handles the system-integration parts (file dialogs, OAuth callbacks, OS notifications) where Tauri 2's plugin ecosystem already has the right abstractions.

A few specific things I think are worth flagging:

5-layer memory stack. Personal memory (auto-extracted from chats) → Sleep-time consolidation → Shared org memory → Frozen Markdown layer (git-managed) → optional graph layer. Three independent promotion paths (frequency / outcome / self-eval) decide which personal observations get elevated to organizational knowledge. Most agent platforms paywall this. It ships in Praxia's OSS.

Verifier loop. A CommandedAgent mode wraps the free-running agent loop with pre-retrieval + grounding verification + bounded retry + an explicit abstain path. Calibrated against an in-house multi-hop RAG harness. The difference between this and the un-verified AutonomousAgent mode is roughly 20 points of factual accuracy on private-corpus QA — at the cost of slower responses.

Per-user OAuth across 20+ SaaS connectors. When Alice connects Notion to Praxia, Praxia stores Alice's OAuth token and queries Notion as Alice. Bob's Notion view is different because Bob's OAuth token is different. This sounds obvious; most AI tools use a single service account and leak data across users.

MCP support, both stdio and HTTP/SSE. Any Model Context Protocol server you wrote for Claude Desktop or Cursor works in Praxia unchanged.

If any of those are interesting, the GitHub repo has architecture docs and the actual implementation. It's all Python with type hints, formatted with Ruff, ~780 tests passing.

Bonus: how it got onto the Microsoft Store

Short version: I wrapped the Tauri 2 build in MSIX, declared runFullTrust with a five-bullet justification ("spawn Python sidecar, read user folders, localhost loopback, outbound HTTPS to user-configured LLM API, native document generation"), and submitted to Partner Center. Cert passed on the first try in 4 days. Microsoft re-signs the MSIX with the Store identity, so SmartScreen no longer flags installation.

Try it

Microsoft Store: https://apps.microsoft.com/detail/9P9LSR34HZF3 (Windows 10/11, free)
GitHub (Apache 2.0): https://github.com/praxia-dev/praxia
PyPI (CLI version): pip install praxia
4-minute demo: https://youtu.be/Z3DFa2saHJg
Website: https://praxia.tools/
Discussions (GitHub): https://github.com/praxia-dev/praxia/discussions

If you build something interesting on top of it, please drop a note in Discussions or ping me on X at @praxia_dev. I want to know what people make.

DEV Community

I built a local-first AI desktop where chat turns into cron jobs, parallel batches, and editable .pptx

TL;DR

Thing 1: "Schedule this for me, weekly"

Thing 2: Fan out a chat across 50 files

Thing 3: Native, editable `.pptx` from chat

The shape of the app

Local-first, for real

The interesting bits under the hood

Bonus: how it got onto the Microsoft Store

Try it

Top comments (0)

TL;DR

Thing 1: "Schedule this for me, weekly"

Thing 2: Fan out a chat across 50 files

Thing 3: Native, editable .pptx from chat

The shape of the app

Local-first, for real

The interesting bits under the hood

Bonus: how it got onto the Microsoft Store

Try it

Thing 3: Native, editable `.pptx` from chat