DEV Community: Hugo Kuznicki

Risk Rules Enforced in Code, Not Vibes: Building a Trading Engine With a Kill Switch

Hugo Kuznicki — Mon, 13 Jul 2026 16:00:08 +0000

Most trading automation fails in the same place: the risk rules live in the trader's head, not in the code. "I'll cut it if it moves against me." "I won't hold more than a few positions." Under pressure, those intentions bend. So I built a trading engine where the risk rules are enforced by the software and there's no polite way around them.

It's called Apex Wallet. Here's the design idea.

Rules are code, not suggestions

Every incoming signal is checked against hard limits before anything executes:

Max positions — the engine refuses to open beyond the cap.
Portfolio-heat cap — total risk exposure is bounded; a trade that would breach it is rejected.
Instant kill switch — one action halts execution entirely.

The point is that these aren't dashboards you're supposed to watch. They're gates in the execution path. A trade that violates a limit doesn't get a warning — it doesn't happen.

Pluggable brokers: paper-to-live is a config change

Execution goes through a broker abstraction with interchangeable backends — yfinance, ccxt, Alpaca. Paper trading and live trading are the same code path with a different adapter, so moving from simulation to real money is a configuration change, not a rewrite. That matters because the risk logic you tested in paper is literally the same logic running live.

Auditability by default

Every decision is journaled. When something surprising happens, you can reconstruct exactly what the engine saw and why it acted — which signal came in, which checks passed, what executed. A trading system you can't audit is a trading system you can't trust.

There's also a CLI and a local browser dashboard showing live positions and portfolio heat, so the state of the system is always visible.

The architecture lesson (beyond trading)

This pattern generalizes past finance: when a constraint really matters, encode it as a gate in the execution path, not as a guideline in a runbook. Separate the thing that decides (signals) from the thing that enforces (risk checks) from the thing that acts (brokers). Each is testable in isolation, and the enforcement layer can't be skipped because it sits between decision and action.

Takeaway

Automation you can trust comes from making the important rules unbreakable in code, keeping the risky boundary (paper vs. live) a config flag over identical logic, and journaling everything so it's auditable. That's the pattern — the trading part is just the example.

If you want me to go deeper on the portfolio-heat calculation or the broker abstraction, let me know in the comments.

Ranking 80 Tickers in ~1 Second With Only Free Data

Hugo Kuznicki — Sun, 12 Jul 2026 16:00:05 +0000

Finding the handful of names worth looking at on any given day usually means one of two things: paying for a screener subscription, or clicking through dozens of charts by hand. I didn't want either, so I built a local scanner that does it automatically — and it runs on free data with nothing to subscribe to.

The idea

Pull price and volume history for an 80-ticker universe, run each name through the technical signals that actually matter, and roll them into a single ranked list so the interesting setups float to the top.

The signals

Each ticker gets evaluated for:

Momentum and recent trend
Volume spikes vs. its own average
Golden / death crosses (moving-average crossovers)
RSI conditions
Proximity to 52-week high / low
Breakout setups

Instead of reading six indicators per name, each rolls up into one composite score. You get a ranked shortlist, not a spreadsheet you have to interpret.

The stack

Python, Flask, and yfinance for free OHLCV data — no paid feed. A signal-processing layer computes the indicators and combines them into the composite ranking. The front end is a sortable dashboard: every column sorts, inline sparklines show recent price action, filters narrow by signal type, and auto-refresh keeps the board current. Click any ticker for a detail chart.

The part I care about: speed

It processes roughly 54 tickers in about one second. That number matters more than it looks. A scanner you have to wait on is a scanner you stop opening. When the broad universe collapses into a ranked shortlist almost instantly, checking the market becomes a glance instead of a chore.

What makes it reusable

The signal logic is modular. The universe, the indicators, and the weighting are all things you can change — so your edge (whatever thresholds and signals you actually trust) becomes the thing the scanner optimizes for. It replaces a paid screener and a lot of manual chart-flipping with one local dashboard at zero ongoing cost.

Takeaway

You don't need a paid screener to turn a watchlist into a ranked shortlist. Free data plus a modular signal layer plus a fast dashboard gets you most of the way — and because it's local, there's nothing rate-limiting you out.

If you'd want a breakdown of how the composite score is weighted, drop a comment — that's the piece people usually want to tune first.

I Built a $0 Local AI Automation Stack: Agentic Coding + Market Data Over MCP

Hugo Kuznicki — Sat, 11 Jul 2026 23:00:45 +0000

Serious AI-assisted development has a habit of turning into a stack of subscriptions: a coding-assistant plan, a market-data API, a backtesting service, model credits. Each one is reasonable on its own, and together they quietly become a monthly bill that also locks you into someone else's rate limits.

I wanted to know how far you can get without any of that. The answer turned out to be: surprisingly far. Here's the stack I ended up with, running at $0 recurring cost and scripted so it rebuilds from scratch.

The goal

A capable, agentic AI workflow — coding and quantitative research — running entirely on free and local infrastructure, and reproducible so it isn't a one-off config living on a single machine.

What I wired together

Agentic coding runs through Aider, configured to talk to local Ollama models by default and fall back to the free tiers on Groq and OpenRouter when a task wants more horsepower. Same agent, same workflow, whether it's fully offline or tapping a free hosted model.

Market data and research run on OpenBB, with MCP servers set up so the AI agent can drive market-data queries directly. This is the part that changed how it feels to work: the agent doesn't just suggest code that would fetch data — it calls a structured tool and pulls the actual data. VectorBT handles fast, vectorized backtesting inside the same environment.

The whole thing is documented and scripted, so it rebuilds rather than rots.

Why MCP is the unlock

The difference between "AI that writes code about your data" and "AI that operates on your data" is a tool interface. MCP (the Model Context Protocol) gives agents a structured way to call real tools — market data, internal APIs, a database — instead of hallucinating what the response might look like. Once the OpenBB layer was exposed over MCP, the agent could answer research questions by actually running the query.

The tradeoffs (honest version)

Local models aren't frontier models. For heavy reasoning I still route to a free hosted tier, and there are days the free tiers are rate-limited. The win isn't "local is as good as GPT-class" — it's that local-first with free-tier fallback covers the large majority of real work at zero marginal cost, and keeps your code and data on your machine.

Takeaway

If you want agentic tooling without committing to a stack of subscriptions — or you want agents that can operate tools rather than just generate text — local-first routing plus MCP is a genuinely practical pattern in 2026. You get modern capability with the cost and privacy under your control.

I'm writing up each piece of this stack in more detail. If there's a part you want to see first — the Aider/Ollama routing, the OpenBB MCP server, or the VectorBT backtests — say so in the comments.

How I Run My Content Tooling on a Local Model for $0

Hugo Kuznicki — Sun, 28 Jun 2026 04:58:53 +0000

A few months ago I added up what I was spending on AI APIs just to draft social posts. It wasn't a lot — a few dollars here, a few there — but it was a recurring cost for something I do every single day. And every time I wanted to experiment, regenerate, or tweak a prompt, a little meter ticked in the back of my head telling me to stop wasting tokens.

So I moved the whole thing local. No API keys, no per-token billing, nothing leaving my machine. Here's exactly how, including the parts that aren't as clean as the pitch.

Why local at all?

Three reasons, in order of how much they actually mattered to me:

Cost goes to zero. Not "cheaper" — zero. Once the model is on your disk, generating a thousand drafts costs the same as generating one.
Iteration becomes free, which changes your behavior. This is the part nobody tells you. When each generation is metered, you ration attempts. When it's free, you regenerate aggressively — and the output gets better because you stop being precious about it.
Privacy by default. My prompts, drafts, and half-baked ideas never touch a third-party server. For content I haven't published yet, that's a real comfort.

The setup: Ollama in five minutes

Ollama is the easiest way to run an LLM locally. Install it, pull a model, and you've got an HTTP server on localhost that speaks a simple API.

# Install (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull an instruct-tuned model
ollama pull llama3.1:8b

# It's now serving on http://localhost:11434

That's the entire infrastructure. No account, no key, no dashboard. The model runs as a local service and you talk to it over HTTP like any other API — except this one is on your machine and free.

The pipeline

My content workflow is deliberately boring: one topic in, a batch of platform-specific posts out. The whole thing is a thin layer around three ideas — a per-platform prompt template, a call to the local model, and a tiny bit of cleanup.

Here's the core call. Ollama exposes a /api/generate endpoint:

import requests

def generate(prompt, model="llama3.1:8b"):
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": model, "prompt": prompt, "stream": False},
    )
    return resp.json()["response"].strip()

No SDK, no auth header, no OPENAI_API_KEY in your environment. It's just a POST to localhost.

The interesting part is the templating. Each platform gets its own prompt with its own constraints baked in:

TEMPLATES = {
    "twitter": (
        "Write 3 punchy tweet hooks about: {topic}\n"
        "Rules: under 280 chars, no hashtags, no emoji spam, "
        "lead with the most surprising angle."
    ),
    "linkedin": (
        "Write a short LinkedIn post about: {topic}\n"
        "Rules: 1 strong opening line, 3 short paragraphs, "
        "a question at the end. Plain language, no buzzwords."
    ),
    "thread": (
        "Outline a 5-tweet thread about: {topic}\n"
        "Each tweet on its own line, numbered, each able to stand alone."
    ),
}

def run(topic, platforms):
    out = {}
    for p in platforms:
        prompt = TEMPLATES[p].format(topic=topic)
        out[p] = generate(prompt)
    return out

Call run("local LLMs for content", ["twitter", "linkedin", "thread"]) and you get a dict of drafts back, generated entirely on your own hardware, for nothing.

The real product wraps this with a UI, a platform picker, and output cleanup — but the engine is genuinely this small. That's the point. Most of the value isn't in the model; it's in the templates that constrain the model into something usable.

The thing that actually makes it good: tight prompts

Smaller local models are less forgiving than a frontier API. A vague prompt to GPT-class hosted models still produces something passable. A vague prompt to an 8B local model produces mush. So the work shifts from "pay for a smarter model" to "write a sharper prompt."

Concretely, what moved quality the most:

Bake the constraints into the template, not the topic. Character limits, tone, structure — put them in the reusable template so every generation inherits them.
Ask for multiple options. "Write 3 hooks" beats "write a hook" — you pick the best and the model explores more of the space.
Keep a Modelfile for a custom system prompt if you find yourself repeating instructions:

FROM llama3.1:8b
SYSTEM "You are a concise copywriter. No clichés, no 'in today's
fast-paced world', no emoji unless asked. Plain, specific language."

ollama create copywriter -f Modelfile

Now copywriter carries that voice everywhere and your per-call prompts get shorter.

The honest tradeoffs

I'm not going to pretend local is strictly better. It isn't.

Long-form coherence is weaker. For short-form (hooks, captions, threads) local models are great. For a 2,000-word essay that needs to hold an argument, a frontier API still wins. Know which job you're doing.
Cold-start latency is real. The first request after the model unloads is slow. Keep it warm if you generate in bursts (ollama run in the background, or a keepalive ping).
You own the ops. No hosted API means no one else patches, scales, or babysits it. For a personal tool that's fine; for a product serving others it's a real consideration.
Hardware matters. An 8B model is comfortable on a modern laptop. Bigger models want more RAM/VRAM. Match the model to your machine instead of reaching for the biggest one.

The trade I'm making — slightly less polish in exchange for $0 cost, full privacy, and unlimited iteration — is overwhelmingly worth it for high-frequency, templated work. That's most of what content generation actually is.

Wrapping up

The headline isn't "local models are magic." It's that for the specific job of churning out daily, templated content, the economics and the workflow both flip in local's favor — and the setup is genuinely a five-minute Ollama install plus a few prompt templates.

I packaged my own version of this into a small tool called Content Studio (idea → batch of posts, runs fully local, $0 to run) if you'd rather not wire it up yourself — it's on Gumroad and the open-source pieces live on my GitHub. And if you want the longer build-in-public breakdowns, I write them up in my newsletter.

But honestly — even if you build your own from the snippets above, do it. Watching your API bill hit $0 while your output goes up is a weirdly satisfying way to start a week.