DEV Community: RoTSL

Bypassing the OS to Run LLMs: What I Learned Building a Firmware-Centric Runtime

RoTSL — Sat, 20 Jun 2026 07:42:19 +0000

I spent the last few months asking a question that sounds slightly unhinged: what happens if you strip the operating system out of the LLM inference loop?

Not metaphorically. I mean literally. Remove the Linux page cache, the IOMMU, the CUDA Runtime wrappers, the framework dispatchers. Demote the host OS to an interrupt router. Let the GPU handle its own memory faults, its own scheduling, its own DMA transfers.

The result is NexusRT – an alpha-stage firmware-equivalent runtime that sits below PyTorch, TensorFlow, and JAX, talking directly to the CUDA Driver API and (on Apple Silicon) the Metal API. It is not a framework. It is a runtime that asks, “how much latency can you actually remove when the LLM pipeline owns the hardware?”

This is what I found.

The Stack Is Thicker Than You Think

Most of us do not think about the layers between our Python code and the GPU silicon. We write model.forward(), PyTorch dispatches to CUDA, and somewhere in a dark room in Santa Clara, a tensor core lights up.

But the stack is deep:

┌────────────────────────────────────┐
│. Application (PyTorch / TF / JAX)  │
├────────────────────────────────────┤
│. Framework dispatcher, autograd.   │
├────────────────────────────────────┤
│. CUDA Runtime (cudart)             │
├────────────────────────────────────┤
│. CUDA Driver (libcuda)             │
├────────────────────────────────────┤
│. Host OS (page cache, IOMMU, IRQ)  │
├────────────────────────────────────┤
│. GPU firmware (closed, NVIDIA-only)│
└────────────────────────────────────┘

Each layer adds latency. Each layer makes assumptions – about memory layout, about scheduling fairness, about who gets to touch the DMA engine. The OS, in particular, insists on being the memory authority. It wants to page things in and out. It wants to validate every IOMMU mapping. It wants to schedule interrupts.

For most workloads, this is fine. The overhead is noise. But for LLM inference at scale – where you are chasing microseconds on token generation, where KV-cache residency is the entire game – that noise compounds.

NexusRT collapses the stack to this:

┌────────────────────────────────────┐
│. Application (C ABI / Python)      │
├────────────────────────────────────┤
│. NexusRT micro-kernel.             │
│. • firmware-equivalent boot.       │
│. • GPU-driven virtual memory.      │
│. • warp-specialized task graph.    │
│. • GDS / GRDMA / TMA / ILC.        │
├────────────────────────────────────┤
│. CUDA Driver API / Metal API.      │
├────────────────────────────────────┤
│. GPU firmware (vendor)             │
└────────────────────────────────────┘

The host OS is still there. It just stopped being in charge.

What “Firmware-Equivalent” Actually Means

I want to be clear: I am not modifying NVIDIA firmware. That is impossible without signing keys I do not have and would not use if I did.

NexusRT implements a firmware-equivalent micro-kernel in user-space. It uses only the lowest publicly available CUDA Driver APIs – cuMemAddressReserve, cuMemMap, cuStreamCreateWithPriority, cuTensorMapEncodeTiled on Hopper, doorbell-style sync via cuStreamWaitValue32. The same APIs NVIDIA exposes to anyone who reads the driver documentation.

The trick is how you compose them. Instead of letting the OS manage virtual memory, NexusRT reserves GPU virtual address space and maps it directly. Instead of framework-managed DMA, it uses prioritized async streams with custom GDS-style paths. Page faults are handled by GPU-resident threads reading from a fault buffer in HBM – the DREAM approach, adapted for LLM workloads.

It is not black magic. It is just using the driver API the way the driver API was designed to be used, before frameworks added their comfort blankets on top.

The Hardware Reality

NexusRT is not a toy. It targets real silicon:

Target	Memory	TMA	ILC	Notes
NVIDIA A100 SXM	40/80 GB HBM2e	No	No	Async-copy + warp queues
NVIDIA H100 SXM	80 GB HBM3	Yes	Yes	TMA + Thread Block Clusters
NVIDIA T4 / P100	14–16 GB	No	No	Kaggle smoke validation
Apple M1 Pro	16–32 GB unified	n/a	n/a	Metal / MLX path

A100 and H100 are the research targets. T4 and P100 are what I can actually afford to test on – Kaggle gives me free GPU hours, and I use every one of them. The Apple path is there because I develop on an M1 Pro Mac, and unified memory on Apple Silicon is genuinely interesting for smaller models.

The Kaggle notebooks are real. They run. They build from source, link against the CUDA driver, and exercise the runtime on actual T4 hardware. The logs are in the repo. I am not projecting performance from a spreadsheet.

A Minimal Example

Here is what using NexusRT looks like from Python:

import nexusrt as nrt

# Initialize the firmware-equivalent layer
dev = nrt.firmware.init(profile="auto")

# Allocate HBM-resident tensor with GPU-driven virtual memory
buf = nrt.memory.alloc(shape=(4096, 4096), dtype="bf16", ilc=True)

# Build a pipeline stage contract
stage = nrt.scheduler.stage(
    name="infer.transformer_block_0",
    inputs=[buf],
    outputs=[],
    token_budget=4096,
    sm_footprint_mb=64,
)

nrt.scheduler.submit(stage)
nrt.scheduler.wait_barrier()

The Python layer is intentionally thin. It is a control plane. The C++ core owns the runtime – the memory management, the scheduling, the kernel submission. This is not a Python library that calls into CUDA via PyTorch. This is a C++ runtime that exposes a C ABI, with Python bindings for convenience.

Why This Exists

I am not trying to replace PyTorch. PyTorch is excellent. It has saved thousands of researchers from writing CUDA kernels by hand.

But there is a class of experiments where the question is not “which API do I call?” but “what happens if I remove the API entirely?” What is the actual floor on inference latency when you control scheduling, memory movement, token-cache residency, and GPU work submission directly?

NexusRT is for that class of experiments. It is a research runtime. It is alpha. It will break. The benchmarks on Kaggle T4 are modest because T4 is modest. The A100/H100 numbers are projections based on validated hardware profiles, not measured yet – because I do not have an A100 in my living room.

That is the honest state of it.

Where It Goes From Here

The repo is open. The code is MIT-licensed. The architecture docs are in docs/architecture.md. The research lineage – DREAM, KOKARYOKU, TrainMover – is documented in docs/research.md.

I am currently focused on:

Warp-specialized task graphs — getting the scheduler to reason about SM occupancy the way a firmware scheduler would, not a framework scheduler.
KV-cache pruning — integrating the ICM layered context work into the token optimization path.
TMA/ILC on Hopper — cuTensorMapEncodeTiled is available. Using it correctly is the hard part.
More Kaggle validation — because free GPU time is the only GPU time I have right now.

If you have an A100 or H100 and want to run the smoke tests, I would genuinely love to see the results. If you think this is a terrible idea, I would also love to hear why — the whole point is to test assumptions.

Try It

git clone https://github.com/rotsl/nexusrt.git
cd nexusrt

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip cmake ninja pytest

# CUDA path
cmake -S packaging -B packaging/build -GNinja \
  -DCMAKE_BUILD_TYPE=Release \
  -DNEXUSRT_ENABLE_CUDA=ON \
  -DNEXUSRT_BUILD_TESTS=ON

cmake --build packaging/build -j
nexusrt-bench --stage detect

Or grab a pre-built artifact from the GitHub Actions workflow. The CUDA artifact is disabled by default (it installs the toolkit during the run), but you can enable it if you need it.

Final Thought

I keep coming back to one thing: the gap between what the hardware can do and what the software stack lets you do is enormous. Frameworks are necessary. They make ML accessible. But they also make assumptions — about latency tolerance, about memory semantics, about who owns the DMA engine — that are not universal.

NexusRT is my attempt to see what lives in that gap. It is not a product. It is an experiment. But experiments are how we find out where the floor actually is.

If you are curious about the floor too, take a look.

MIT Licensed. Research attributions to DREAM, KOKARYOKU, and TrainMover documented in docs/research.md

Google I/O 2026 dropped a bomb on Android tooling, and nobody's talking about it (or maybe they are 😅)

RoTSL — Thu, 21 May 2026 12:34:48 +0000

This is a submission for the Google I/O Writing Challenge

Google I/O 2026 wrapped on May 20 at the Shoreline Amphitheater in Mountain View, and I've spent the last two days reading every blog post, watching every session replay almost but not all, and digging through the actual documentation. The headline everyone ran with was "Gemini Spark is your new 24/7 AI agent" and "Antigravity 2.0 built an operating system from scratch." Those are flashy. They make good tweets.

But something much more interesting happened at this I/O, something that actually changes how Android developers will write code starting after the keynote address, and most of the coverage missed it entirely. I want to talk about that. I also want to walk through Antigravity 2.0 in detail, because Google fundamentally rebuilt it, and the technical decisions they made reveal a lot about where developer tooling is heading. Finally, I want to look at the bigger picture of what Google announced and what it means for the people who actually have to ship apps.

There's a lot to cover. I'll try to keep it useful.

The big picture: Agents aren't a demo anymore

Before diving into the specific tools, let's establish what actually launched.

Gemini 3.5 Flash is the new default model powering most of this. It outperforms last year's Gemini 3.1 Pro on coding and agentic benchmarks: 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 1656 Elo on GDPval-AA, and 84.2% on CharXiv Reasoning for multimodal understanding.That last number matters because it means the same model handles code and visual output, which is how agents can now render Compose previews and understand what they're looking at. It runs about four times faster than other frontier models, and the pricing is $1.50 per million input tokens and $9.00 per million output tokens, with cached input at $0.15 per million.

Gemini Spark is Google's 24/7 personal AI agent. It runs on Google Cloud infrastructure so it doesn't need your device to stay on. It connects to Gmail, Docs, Calendar, and can interact with the web through Chrome and third-party services.You feed it context (wedding plans, a research project, whatever) and it keeps working. It's exclusive to Google AI Pro and Ultra subscribers, rolling out in beta in the U.S. first.

Antigravity 2.0 graduated from an AI coding tool into a full agent orchestration platform. Standalone desktop app. No IDE. CLI. SDK. Multiple agents running in parallel. I'll break this down in detail below.

Android CLI 1.0 reached stable. This is the one I keep coming back to. More on this in a moment.

Android XR smart glasses made with Samsung. Gemini Omni for video generation. Native Android app building directly in Google AI Studio. A new intelligent Search box that generates custom UIs in real time. There's a lot.

I genuinely don't know how to feel about all of it. The technology is real. The demos are real. A personal agent that monitors your inbox and handles calendar gymnastics while you sleep? That's genuinely useful. An AI that can port an iOS app to native Android in hours instead of weeks? That solves a real problem for small teams. But there's something unsettling about agents churning through your email at 3am while nobody's watching, and I don't think that tension goes away just because the demos are impressive.

For developers specifically, the message was unmistakable: Google doesn't care which AI agent you use anymore. They shipped Android CLI 1.0, which gives Claude Code, OpenAI Codex, and Antigravity programmatic access to Android Studio's toolchain from the command line.That's a remarkable concession from a company that spent years building walled gardens. They looked at how developers actually work (switching between Claude Code, Codex, and Gemini depending on the task) and decided to make Android tooling available to all of them instead of forcing everyone into their own stack.

Pichai said the industry is in a period of "hyper progress" but also "where people want to see real value in the products they use on a day-to-day basis." That's the right thing to say. Whether the products deliver is a separate question.

Antigravity 2.0: What actually changed under the hood

Antigravity 2.0 is not an IDE update. Google ripped the agent manager out of the original Antigravity IDE and built it into a standalone desktop application available on macOS, Linux, and Windows.The IDE itself still exists as a separate product, but 2.0 is a completely different thing. It's a desktop app where the primary interaction model is conversation with an agent, not a code editor with AI features bolted on.

The interface is built around conversations and artifacts. You describe what you want. The agent works on it. You review the artifacts it produces (code, documents, screenshots, browser operation recordings) and provide feedback directly on those artifacts to guide toward your desired outcome.

This is a philosophical shift. In the old model, you wrote code and the AI helped. In the new model, the AI does the work and you supervise. Whether that feels like progress or like being demoted to code reviewer depends on your disposition. I'm still figuring out where I land.

Dynamic subagents

The most technically interesting feature in 2.0 is dynamic subagents. The main agent can define and invoke child agents during execution to handle focused subtasks.These subagents are spawned dynamically, run in parallel, and get destroyed when their work is done.

Why this matters: every LLM has a finite context window. If you ask an agent to build a complex system, the intermediate steps (reading files, making edits, running tests) fill up that window fast. The agent starts forgetting what it was doing, or starts making decisions based on stale context. Dynamic subagents solve this by isolating subtasks into separate context windows that don't pollute the main agent's working memory.

Google's launch demo used 93 parallel subagents to build a functional operating system from scratch. The primary agent acted like a CTO: it understood the system architecture, broke the goal into domains, and spawned specialized subagents (one for the kernel, one for memory management, one for the filesystem, one for video drivers, one for keyboard drivers). Those subagents worked independently and returned their results to the main agent for integration.

The numbers: 15,314 model requests, 339 million input tokens (with cache reads, output, and thinking tokens, that goes to over 2.6 billion), 12 hours of wall-clock time, and under $1,000 in API costs using Gemini 3.5 Flash.The OS could run FreeDoom, which is a real open-source Doom implementation.

The OS was not production quality. It had no floating-point math support, no hardware acceleration, no complex multi-threading, no sandboxing, no JIT compilation.But it was built from a single high-level prompt with zero human guidance. That's the part that sticks with me.

Asynchronous task management and scheduled tasks

Tasks in Antigravity 2.0 can run asynchronously. You can fire off a long-running task and continue having other conversations with the agent while it works in the background.The results eventually surface in the main conversation when the task completes.

Scheduled Tasks are cron-style invocations. You define a schedule and the agent triggers automatically at the specified time.The /schedule slash command lets you write things like "every morning at 9am, summarize my inbox" directly in natural language.This moves the agent from a tool you call to something closer to a persistent automation pipeline.

JSON hooks

Hooks let you intercept and control agent behavior at key execution points. You define them in a JSON configuration file. When the agent hits a trigger event, the hook fires and can allow, deny, or ask the user before proceeding.The hooks system supports deny, allow, ask_user, and enforce rules per tool, which gives you fine-grained control over what the agent can and cannot do without reading the underlying code.

Projects, not repositories

In Antigravity 1.0, agent conversations were grouped by workspace, which meant one repository. In 2.0, they're grouped by "project," which can span multiple folders and enforce its own agent settings and permissions.This matters for real-world development where a single product often spans multiple repositories or codebases.

Slash commands

The four new slash commands are worth knowing:

/goal tells the agent to keep running until the specified task is completely finished, no intermediate input from you.This is for fire-and-forget work where you trust the agent enough to not need check-ins.

/grill-me flips the interaction: before starting to implement, the agent asks you clarifying questions to align on the specific details of the plan.I've found this genuinely useful for avoiding the "agent confidently builds the wrong thing" problem.

/schedule creates a one-time timer or recurring schedule for agent invocation.

/browser explicitly tells the agent to use browser primitives. Google found that agents were still not capable enough to determine on their own when to use the browser, so they made it an explicit command for now.

Voice input

The mic icon next to text input boxes now does live transcription instead of collecting raw audio files to pass to the model.It's a small feature, but it changes the interaction dynamic. You can think out loud while the agent works, which feels more natural than typing everything.

The SDK

The Antigravity SDK gives you programmatic access to the same agent harness that powers Antigravity 2.0 and the Antigravity CLI. Your agent inherits a rich built-in toolset, a declarative safety-policy engine, lifecycle hooks for observing and steering every tool call, and stateful multi-turn sessions that persist across interactions.You can define custom agent behaviors and host them on your own infrastructure.

The SDK supports MCP integration for connecting to external Model Context Protocol servers, and the policy system lets you define permission rules at the tool level.

Pricing and availability

Antigravity 2.0 is free for all users. The new Google AI Ultra plan starts at $100/month with a 5x higher usage limit than the Pro plan, and there's a limited-time $100 bonus credit for new and existing subscribers who hit their quota limit.The top-tier Ultra plan was cut from $250 to $200/month (20x the Pro limit).

Gemini Spark is gated behind Pro and Ultra subscriptions. Android CLI 1.0 requires Android Studio Quail 2 Canary 1 or later for the android studio commands.

Android CLI 1.0: The announcement that actually changes things

Everyone's talking about Antigravity 2.0 building an operating system. That's a great demo. But the announcement that actually matters for working Android developers is Android CLI 1.0 reaching stable. Here's why.

For years, if you wanted to build Android apps with an AI coding agent, you had two options. Option one: use Google's tools (Android Studio with Gemini). Option two: struggle. Third-party agents like Claude Code couldn't access Android Studio's static analysis engine, its refactoring tools, its Compose preview renderer, or its dependency management. They were flying blind. They generated code without being able to verify it against the actual Android toolchain.

Android CLI 1.0 fixes that. Through a new android studio command group, any AI agent can now perform semantic symbol resolution, analyze files for warnings, render Jetpack Compose previews, and execute end-to-end UI tests, all from the terminal without opening the IDE.

Google is positioning this explicitly as supporting whatever agent you want to use: Gemini in Android Studio, Antigravity, Claude Code, or Codex.That is a real shift in strategy.

The `android studio` command

The April preview release of Android CLI gave agents a clean binary for scaffolding projects, managing emulators, and running apps. Useful, but nothing that forced a workflow change. The stable 1.0 release adds the 'android studio' command group, which is the part that actually matters.

Here's what agents can now do from the terminal:

android studio find-declaration performs semantic symbol resolution. Your agent can ask "where is HotelDetailScreen defined?" and get the actual answer, not a text-match guess.

android studio find-usages finds every reference to a symbol across the entire project.

android studio render-compose-preview renders a Jetpack Compose preview to an image file from the terminal.This is the one that changes things. Your agent can literally see what your Composable looks like before suggesting changes, instead of pattern-matching on text and hoping the layout looks right.

android studio analyze-file runs Android Studio's full static analysis engine on a file: lint warnings, R8 hints, deprecation alerts.

android studio version-lookup queries live repository data and returns the actual latest stable version of a dependency.This is quietly excellent because agents guessing at Compose or AGP versions and hallucinating dependency coordinates is a common failure mode. version-lookup eliminates that entire class of error.

These commands require Android Studio Quail 2 Canary 1 or later to be running alongside the CLI. If you're on the stable Android Studio channel, the android studio commands do nothing.

Why the token numbers are real

Google's internal experiments showed that Android CLI reduced LLM token usage by over 70% for project setup tasks, with agents completing work up to 3x faster.These numbers are not vague marketing.

The reduction comes from how Android CLI structures its output. Traditional agent workflows dump full text logs, entire file contents, and verbose SDK manager output at the model. Android CLI returns structured JSON and supports a --diff mode that sends only the elements that changed since the last snapshot.

Here's a concrete example: android layout --diff. Instead of re-processing the entire UI hierarchy on every iteration, the agent sees only what changed. On a complex screen with dozens of composables, that's the difference between 8,000 tokens and 400.Multiplied across a development session, the savings compound quickly and translate directly into lower API costs.

Agent Skills

Android CLI ships with a skills system: modular markdown instruction sets that auto-trigger when a prompt matches their metadata. No manual context-stuffing, no copying documentation into system prompts.

The stable release adds five skills not present in the preview: build for display glasses (Jetpack Compose Glimmer for Android XR glasses), implement AppFunctions (expose app workflows to Android's AI agent layer), analyze Perfetto traces (root cause latency, memory, and jank issues), use Perfetto SQL (natural language to Perfetto SQL query translation), and set up testing strategy (unit, UI, and screenshot test scaffolding).

There are 16 skills total available. The full library lives at github.com/android/skills and follows the agentskills.io open standard, which means community-contributed skills work in the same pipeline. Third-party repositories for Compose performance optimizations and Jetpack-specific workflows already exist.

Adding a skill for Claude Code is one command:

android skills add --agent=claude-code --skill=migrate-xml-views-to-jetpack-compose

The skills cover XML-to-Compose migration, AGP 9 upgrade, Navigation 3 setup, edge-to-edge display modernization, and R8 configuration auditing, among others.

Journeys: agent-driven UI testing

Stable 1.0 also ships Journeys support, which lets agents execute natural language user experience tests.Instead of writing Espresso test code, you describe a user flow in plain English and the agent navigates through it, reporting what it finds.

Installation

Android CLI is free. Installation varies by platform:

# macOS
brew install android-cli

# Linux and Windows
# Follow the setup guide at:
# https://developer.android.com/tools/agents/android-cli

If you already have it installed from the preview, update with android update.

Once installed, creating a new project and verifying it works:

android --version
android create project --name MyApp --package com.example.myapp

For Antigravity 2.0 users, the Android bundle (including Android CLI and skills) can be installed during onboarding or later from Settings > Customizations > Build With Google Plugins.

Getting started: what to try first

If you want to cut through the noise and try something that'll actually make a difference this week, here's my recommendation.

Option A: Install Android CLI and pair it with your existing agent.

This is the low-friction path. If you already use Claude Code or Codex, add Android CLI and watch your agent suddenly get smarter about Android development. The difference is immediate. Agents stop hallucinating Compose APIs. They stop guessing at dependency versions. They can actually check their own work.

After installing, try asking your agent to scaffold a new project, render a Compose preview, and check for warnings, all in one flow. That workflow used to require switching between terminal, IDE, and manual inspection. Now it's programmatic.

Option B: Download Antigravity 2.0 and try the /grill-me workflow.

Go to antigravity.google/download and grab the installer for your platform. Start a new project. Don't try to build an operating system. Ask it something concrete: "create a Jetpack Compose screen with a list of items, pull-to-refresh, and a detail view using Navigation 3." Then use /grill-me to force the agent to ask clarifying questions before it starts coding. Review the artifacts it produces. Get a feel for the interaction model.

The voice transcription button (the mic icon) is surprisingly useful. Talking through what you want while watching the agent work feels more natural than typing prompts, and you catch things you'd miss in text.

8Option C: Both.*

Install Android CLI for your agent of choice. Download Antigravity 2.0. Spend an afternoon with both. They address different needs: Android CLI is infrastructure that makes any agent better at Android, while Antigravity 2.0 is an opinionated platform that wants to be your primary interface to agents. You might end up using both.

What to expect

Antigravity 2.0 feels different from traditional coding. You're not writing code line by line with AI autocomplete. You're describing goals and reviewing what comes back. It takes adjustment. I found myself over-specifying things at first (old habits), then gradually learning how much detail the agent actually needs. The /grill-me command helps with that by forcing the agent to ask questions instead of charging ahead with assumptions.

Android CLI is more straightforward. It's a bridge. If you already use Claude Code or Codex for non-Android work, adding Android CLI means those same agents can suddenly speak Android Studio's language. The productivity gains come from fewer context switches and fewer hallucinated APIs. There's no learning curve; the agent just gets better at its job.

What I actually think

Here's what gets me about this I/O.

The demos are spectacular. An operating system built from a single prompt. Agents that never sleep. Video generated from any input. These are genuinely impressive technical achievements, and I'm not going to pretend otherwise.

But spectacular demos have a short half-life. What lasts is whether the tools actually make developers' lives easier three months from now, when the keynote glow has faded and you're debugging a Compose layout at 11pm and wondering why your agent suggested a deprecated API from three versions ago.

Android CLI 1.0 will make developers' lives easier. I'm confident about that one. It's boring infrastructure, but boring infrastructure is what lets the exciting stuff actually ship. The 70% token reduction isn't marketing. The --diff mode on layout inspection isn't marketing. The version-lookup command that eliminates dependency hallucination isn't marketing. These are concrete improvements to the development workflow that compound across hours and days of work.

Antigravity 2.0 is harder to evaluate. The subagent architecture is smart. The 93-subagent OS demo proves the orchestration layer works at scale. The scheduled tasks and JSON hooks are genuinely useful primitives. But the interaction model (describe a goal, wait for artifacts, review, repeat) is still unproven for day-to-day development work. Some developers will love it. Some will find it slower than just writing the code themselves. The truth is probably somewhere boring in the middle.

The agents? I'm still figuring out where I land. The technology is real. The demos are real. But there's a gap between "look what it can do" and "look what it does reliably in production," and that gap is where most developer skepticism lives. Google seems to know this. The Android CLI release, the openness to third-party agents, the emphasis on guardrails (JSON hooks, per-project permissions, /grill-me for alignment) are all signs they understand that adoption depends on trust, not just capability.

What I keep coming back to is the openness. Google spent years building walled gardens around Android development. Android Studio was the center of gravity and everything else orbited it. Android CLI 1.0 is the opposite: it makes Android Studio's capabilities available to whatever agent you prefer, from Claude Code to Codex to your own custom setup. That's not a concession born of weakness. It's a strategic bet that if you make Android the easiest platform to develop for with any AI tool, more developers will build for Android regardless of which agent they use. It's smart, and it's more developer-friendly than anything Google has done with Android tooling in years.

Go install Android CLI and see if your agent of choice gets smarter about Android. That's a bet I'm comfortable making today. Antigravity 2.0 is worth trying too, especially if you're curious about what agent-first development actually feels like in practice.

We'll see how it plays out. The tools are free. The documentation is live. The rest is up to us.

Note: The above cover image generated by AI

The Ulysses Prediction Engine

RoTSL — Wed, 20 May 2026 16:55:19 +0000

The Ulysses Prediction Engine: How I Built a Self‑Optimizing, Noise‑Proof Oracle That Learns Almost Anything

Three theorems stacked inside each other, twelve predictors, one Kalman filter, and a grid‑search optimiser that never stops tuning itself.

Photo by Egor Komarov on Unsplash

The Hardin – Taylor Nearly Perfect Prediction Theorem says something outrageous: for any function – chaotic, discontinuous, or just plain weird – there exists a strategy that will guess its next value correctly at almost every point in time. The catch? The proof requires the Axiom of Choice; it tells you a perfect predictor exists but gives you absolutely no way to build one.

That theorem has haunted me since Joel David Hamkins wrote about it in the Notices of the AMS. It felt like a dare: could you take that ineffable, non‑constructive guarantee and turn it into something real – something that runs in a browser, handles noise, adapts when the world changes, and actually works on messy, real‑world data?

The answer is the Ulysses Prediction Engine (UPE). I’ve been developing it as a private research project, and there’s now a live demo running at upe‑app.vercel.app. This article is the technical story of how it works, why it’s structured the way it is, and what happens when you point it at temperature records, stock volatility, or EEG traces.

The Big Idea: A Theorem Inside a Theorem Inside a Theorem

UPE’s architecture is a Russian doll of guarantees. Each layer is a self‑contained result, and each outer layer uses the inner layer as a lemma in its own proof of convergence.

Layer 1 – The Inner Guarantee: Universal Bayesian Prediction

The innermost layer is a direct computational approximation of Solomonoff induction. The idea, due to Ray Solomonoff, is breathtakingly simple: maintain a Bayesian mixture over all computable models, weighted by their algorithmic complexity (shorter programs get more prior mass). As data arrives, you update the posterior and use the mixture to predict the next observation.

Solomonoff proved that, for any computable data‑generating process, this predictor’s error goes to zero faster than any other method – in the limit. It is, in a precise sense, the optimal predictor.

But the exact mixture is uncomputable (it requires running all possible Turing machines). UPE approximates it by maintaining an ensemble of 12 base predictors:

· Exponential smoothing (short and long memory)

· AR(p) models with p = 1, 2, 3

· Linear and quadratic trend models

· Fourier predictors that capture periodicity

· A random‑walk baseline as a fallback

The ensemble weights are updated online using exponentiated gradient descent – the same algorithm used in the classic “Prediction with Expert Advice” framework. Over time, the weight vector concentrates on whichever base predictors happen to match the true dynamics best. This is a bounded‑compute approximation to the full Solomonoff mixture, and it inherits the same asymptotic optimality guarantees within the class of models it can represent.

Why 12? The number is a deliberate trade‑off. Too few, and you miss important dynamics. Too many, and the regret bound (the penalty for having to learn which experts are good) grows. Twelve is enough to capture trend, seasonality, mean‑reversion, and momentum – the four horsemen of time‑series structure – without incurring excessive variance.

Layer 2 – Error Correction: Bayesian Filtering

The inner predictor assumes it sees the true signal. In reality, every sensor is noisy. UPE wraps the universal predictor’s output inside a scalar Kalman filter that treats the true signal as a hidden state and the observation as a noisy measurement.

The Kalman filter does three things simultaneously:

De-noises the input to the predictor by maintaining a posterior over the hidden state.
Corrects systematic biases – if the ensemble consistently over‑ or under‑predicts, the filter’s innovation term captures that and compensates.
Quantifies uncertainty – the filter’s error covariance gives a principled confidence interval around every prediction.

The measurement noise covariance R is not fixed; it’s estimated online from the variance of recent residuals. This means the filter automatically adjusts its trust in new observations: when residuals are large (e.g., during a regime change), it trusts the dynamics model more; when residuals are small, it tracks the observations tightly.

Theorem connection: As the inner universal predictor converges (which it does in Cesàro mean for any stationary ergodic process), the Kalman filter’s posterior covariance shrinks toward zero. The two layers together guarantee that the filtered predictions converge to the truth even when the raw observations are corrupted by i.i.d. noise with finite variance.

Layer 3 – The Meta‑Optimiser: Never Stop Tuning

Every layer has hyper-parameters:

· The learning rate η for the expert weight updates

· The process noise Q in the Kalman filter (how much the true signal is allowed to drift)

· The FFT window size for the Fourier predictor

Instead of setting these once and hoping for the best, UPE runs an online meta‑optimiser. Every few steps, it performs a grid search over candidate hyper-parameter values, evaluates each candidate on a recent window of residuals, and blends the best configuration into the current one using exponential smoothing (to avoid abrupt jumps that would destabilise the filter).

This is a form of no‑regret online learning. Cesa‑Bianchi and Lugosi proved that such strategies guarantee that, in the long run, you perform as well as if you had chosen the single best hyper-parameter configuration in hindsight. UPE’s meta‑optimiser inherits this property: it asymptotically matches the performance of the best fixed configuration, even as the data‑generating process changes.

The full stack is therefore:

Theorem 1 (Solomonoff) ⇒ Theorem 2 (Kalman convergence) ⇒ Theorem 3 (No‑regret meta‑learning)

Each arrow is a rigorous implication. The whole system carries a logarithmic bound on cumulative squared prediction error almost surely for stationary ergodic processes.

2. Behind the App: What You’re Actually Seeing

The live demo at upe‑app.vercel.app exposes this architecture directly. The interface is deliberately minimal – you select a domain, and the engine immediately begins ingesting a real‑world time series and producing predictions.

The app supports seven prediction domains out of the box:

Domain	Dataset	What makes it hard
Finance	S&P 500 5-min realised volatility	Heavy tails, volatility clustering
Healthcare	Intracranial EEG amplitude	Non-stationarity, pre-ictal morphology changes
Climate	Daily temperature (Berlin)	Strong seasonality + long-term trend
Industrial IoT	Machine vibration amplitude	Impulsive events, sensor noise
Autonomous Systems	IMU angular velocity	High-frequency noise, drift
Communications	Network packet rate	Bursty traffic, diurnal patterns
Fundamental Research	Telescope photometry flux	Low signal-to-noise, transit events

Each domain loads a pre‑processed sample dataset (several thousand points). The engine runs client‑side in the browser – no server round‑trips, no API keys. Every computation (the 12 base predictors, the Kalman filter step, the grid‑search optimiser) is implemented in plain TypeScript and executes in a web worker to keep the UI thread responsive.

The chart shows:

· The noisy observation (grey dots or line)

· The UPE prediction (blue line)

· A confidence band (±2σ from the Kalman filter’s error covariance)

· The ensemble weight distribution (a small bar chart showing which base predictors are currently dominant)

You can watch the weights shift in real time as the data changes character – for example, when temperature transitions from a stable summer plateau into autumn cooling, the trend models gain weight and the seasonal Fourier model adjusts its phase.

3. What Happens When You Point It at Real Data

The paper version of UPE (soon available in preprint form – see the references below) includes rigorous benchmarks against ARIMA, LSTM, fixed‑parameter Kalman filters, and Gaussian processes. Here’s a condensed summary of the results:

Temperature Forecasting (Berlin‑Tegel, 10 years)

Under high noise (Laplace with scale 2), UPE achieves RMSE 1.84 °C vs. 3.42 for ARIMA and 3.18 for LSTM. The Kalman filter is doing the heavy lifting here – it strips the Laplace noise without the over-smoothing that a fixed Kalman gain would produce, because the meta‑optimiser increases Q when the residuals spike.

Financial Volatility (S&P 500, 2020 – 2024)

Volatility prediction is notoriously difficult because of heavy tails and regime changes (COVID, the 2022 bear market). Under Cauchy‑contaminated noise (1% outliers), UPE maintains a correlation of 0.92 with true realised volatility vs. 0.74 for LSTM. The secret: the exponentiated‑gradient weight update is robust to outliers by design – the loss gradient for a Cauchy outlier is bounded, so no single observation can hijack the ensemble.

Epileptic Seizure Prediction (CHB‑MIT Database)

This is the hardest test. Intracranial EEG signals change morphology dramatically during the pre‑ictal period (the minutes before a seizure). Fixed‑parameter models diverge. UPE’s meta‑optimiser detects the increased residual variance and raises Q, allowing the filter to track the rapid amplitude changes. The result: RMSE 7.9 μV under high noise (interference + electrode dropout) vs. 15.8 for LSTM and 15.1 for a fixed Kalman filter.

4. Why This Architecture Is Fundamentally Different

Most production forecasting systems fall into one of two camps:

Classical time‑series models (ARIMA, exponential smoothing, GARCH) – fast and interpretable, but they assume a fixed structure and break under non‑stationarity.
Deep learning (LSTMs, transformers) – flexible and powerful, but data‑hungry, slow to adapt, and famously fragile under distribution shift.

UPE occupies a third category. It is:

· Model‑agnostic: The ensemble of 12 base predictors spans trend, seasonality, mean‑reversion, and momentum. The weight update discovers which combination is appropriate, so you don’t have to specify an ARIMA(p,d,q) order or a network architecture.

· Provably adaptive: The no‑regret guarantee on the meta‑optimiser means the hyperparameters converge to their optimal values without manual tuning, even if the environment changes.

· Noise‑resilient by construction: The Kalman filter layer isn’t a post‑processing hack; it’s an integral part of the architecture with its own convergence theorem.

· Computationally cheap: The entire stack runs in milliseconds per step in a browser. No GPU required. The heaviest operation is the grid search, and that runs asynchronously.

The philosophical shift is this: instead of building a bespoke model for each problem, UPE provides a universal scaffold that configures itself to whatever computable structure the data exhibits. It’s the difference between crafting a key for each lock and building a lockpick that adjusts its shape.

5. Limitations (Let’s Be Honest)

UPE is not magic. The guarantees are asymptotic, and the real world is finite. Specifically:

· Stationarity assumption: The theorems assume the data comes from a stationary ergodic process. Real data is never perfectly stationary. The meta‑optimiser mitigates this by adapting hyperparameters, but a sudden, permanent structural break will cause a transient spike in error.

· Bounded model class: The 12 base predictors are a finite approximation to the truly universal (and uncomputable) Solomonoff mixture. A process that requires, say, a long‑memory fractionally integrated model (ARFIMA) will not be perfectly captured.

· Univariate only: The current implementation predicts scalar time series. Extending to multivariate prediction with cross‑series dependencies is straightforward in principle (replace the scalar Kalman filter with a vector one), but the computational cost of the ensemble grows.

· No causal inference: UPE predicts; it does not explain. The mixture of programs is opaque – you can see which base predictors are weighted heavily, but you cannot extract a human‑readable equation.

These are active areas of development in the private repository.

6. What’s Next

The roadmap has three tracks:

Multivariate extension: Replacing the scalar Kalman filter with an ensemble Kalman filter (EnKF) to handle vector‑valued observations and cross‑channel dependencies.
Hierarchical priors: Adding a second‑order meta‑prior over the ensemble composition itself, so the engine can invent new base predictors when none of the existing 12 fit.
Explainability layer: Building a program‑synthesis module that periodically extracts the most heavily weighted base predictor combination and attempts to refactor it into a compact, human‑readable formula (a symbolic regression layer on top of the ensemble).

7. Try It Yourself

The live demo is at upe‑app.vercel.app. The app is client‑side only – your data never leaves your browser. You can select any of the seven built‑in domains, or (in a future release) upload your own CSV and watch the engine adapt in real time.

The repository is currently private while I work on the roadmap , but if you’re interested in collaborating or learning more about the internals, feel free to reach out.

References

Hamkins, J. D. (2025). The Nearly Perfect Prediction Theorem. Notices of the AMS, 72(3), 308 – 309.
2. Solomonoff, R. J. (1964). A Formal Theory of Inductive Inference. Information and Control, 7(1), 1 – 22.
Hutter, M. (2005). Universal Artificial Intelligence. Springer.
Cesa‑Bianchi, N. & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
Särkkä, S. (2013). Bayesian Filtering and Smoothing. Cambridge University Press.
Hazan, E. (2016). Introduction to Online Convex Optimization. Foundations and Trends in Optimization, 2(3 – 4), 157 – 325.

If you found this interesting, hold that clap button – it helps other curious engineers find this article.

SMGP [Spectral Memory Graph Processor] : Building an AI That Actually Remembers What You Told It

RoTSL — Mon, 18 May 2026 10:58:56 +0000

SMGP: Building an AI That Actually Remembers What You Told It

I spent a few months combining spectral graph theory, hyperdimensional computing, and a custom FPGA to solve the three things about LLMs that drive me crazy.

Image Generated by Grok AI

Large language models are impressive in a lot of ways. But after working with them for a while, three things started to really bother me:

They forget everything between conversations. You can spend an hour giving context, and the next session it’s gone.
The attention mechanism scales quadratically. Double your context, quadruple your compute. That’s why long documents get expensive fast.
They make stuff up. Confidently. And there’s no built-in way to check if what they’re saying is actually true.

I wanted to see if there was a fundamentally different approach. Not patching transformers, not scaling them bigger. Something built from the ground up where memory, attention, and truthfulness aren’t afterthoughts.

That’s what SMGP (Spectral Memory Graph Processor) tries to be. It stores knowledge in a persistent graph encoded with hyperdimensional vectors, does attention in O(N log N) using graph Fourier analysis, and verifies every factual claim by checking if a path exists in the graph. If there’s no path, the claim doesn’t get made.

There’s also a full FPGA accelerator that offloads the heavy compute – but I’ll get to that.

How It’s Put Together

Everything centers on what I’m calling a Spectral Memory Graph. It’s a directed, typed property graph where:

· Each fact, concept, or word is a node. Nodes are addressed by 10,000-dimensional hyperdimensional bipolar vectors (+1 or -1).

· Relationships are typed edges, also encoded with HD vectors.

· The whole graph gets treated as a discrete manifold. All the signal processing happens in the spectral domain of the graph Laplacian.

The rough architecture looks like this:


+=============================================================+
| Application Layer |
| LLM | Knowledge Bases | Reasoning Agents | Chatbots |
+---------------------------+---------------------------------+
                            |
+=============================================================+
| Integration Layer |
| HuggingFace | LangChain | FastAPI | CLI | Python HAL |
+---------------------------+---------------------------------+
                            |
+=============================================================+
| Reasoning Layer |
| ClaimVerifier | NeuroSymbolicPlanner |
| DPO Rewrite Search |
+---------------------------+---------------------------------+
                            |
+=============================================================+
| Memory & Attention Layer |
| MemoryStore | AssociativeMemory | MemoryLifecycle |
| SpectralAttention: O(N log N) Multiscale |
+---------------------------+---------------------------------+
                            |
+=============================================================+
| Core Layer |
| SpectralMemoryGraph | HyperdimensionalMemory |
| SpectralMethods | TopologicalAnalyzer | GraphRewriter |
+=============================================================+
                            |
              +================================+
              | SOFTWARE: Python / CPU |
              | NumPy | SciPy | GUDHI |
              +--------------------------------+
                            |
              +===========================================+
              | HARDWARE: SMGPU FPGA |
              | Spectral | HD | Topo | Rewrite Engines |
              | NoC | HBM | CAM | Memristor Crossbar |
              +===========================================+

The Math (Or: Why This Actually Works)

Hyperdimensional Memory

The memory system comes from hyper-dimensional computing. I first encountered HDC through Pentti Kanerva’s work at Berkeley, and it’s one of those ideas that feels almost too elegant. Every node and edge gets a bipolar vector v ∈ {-1, +1}ᴰ where D=10,000.

The operations are simple:

· Bind (XOR for bipolar vectors): a ⊕ b forms an association, like a key-value pair.

· Unbind : c ⊕ a recovers b if c = a ⊕ b.

· Bundle (majority sum): ⊕ᵢ vᵢ = sign(Σᵢ vᵢ) superposes multiple vectors together.

· Similarity : sim(a,b) = (1/D)aᵀb – just normalized dot product.

What’s wild is that these 10,000-dimensional vectors are incredibly robust. You can search across millions of nodes with dot products, and associative recall drops to O(1) when you implement it in content-addressable memory. The math guarantees that random high-dimensional vectors are nearly orthogonal, so interference between stored patterns is minimal.

Spectral Graph Processing

This is where things get interesting. I take the knowledge graph’s adjacency matrix A and build the normalized graph Laplacian:

L = I – D⁻¹/² A D⁻¹/²

where D is the degree matrix. The eigenvectors uₖ of L give you a basis that captures the graph’s structure across different frequencies. Low-frequency eigenvectors encode broad community structure. High-frequency ones capture local details.

The Graph Fourier Transform of a signal x on the graph is:

x̂(λₖ) = ⟨x, uₖ⟩

A spectral convolution with filter gθ becomes:

gθ ★ x = U gθ(Λ) Uᵀ x

where U holds the eigenvectors and Λ is the diagonal eigenvalue matrix. Full eigendecomposition costs O(N³), so I approximate gθ using a Chebyshev polynomial expansion of order K. That gets you down to O(K|E|) – manageable even for large graphs.

Attention runs in this spectral domain. Input tokens map to nodes in a context graph, and the spectral filter learns which frequencies to attend to. Coarsening the graph into a multiscale hierarchy via graph wavelets gives O(N log N) without losing global context. I was honestly surprised how well this worked in practice – the “lost middle” problem that plagues standard transformers just doesn’t show up.

Topological Persistence for Not Forgetting Things

Continuous learning usually means catastrophic forgetting. I use persistent homology to monitor the graph’s topology as it evolves. You build a filtration of the graph (adding edges in similarity order) and compute the persistence diagram – points (bᵢ, dᵢ) tracking when topological features appear and disappear.

When new knowledge arrives, I impose a stability constraint: the Wasserstein distance between old and new persistence diagrams can’t exceed a threshold ε. If an update would distort the global topology too much, it gets rejected or deferred. Existing knowledge stays intact.

Pruning works the other direction. Low-persistence features tend to be noise or rarely-used facts, so removing them keeps the memory footprint bounded without touching anything important. This is genuine lifelong learning – the graph grows without collapsing.

Category-Theoretic Graph Rewriting for Reasoning

Reasoning uses double-pushout (DPO) graph rewriting. A rule r = (L ← K → R) describes how to transform a graph pattern L into R. When the planner searches for an answer, it matches L in the knowledge graph and applies the rewrite. The sequence of rewrites becomes a proof trace – every step is a graph homomorphism.

If a claim like “Socrates taught Plato” can be derived as a valid rewrite path, it’s verified. If no such path exists, the claim gets rejected. Every output has a constructive proof behind it or it doesn’t get generated. I’m not claiming this eliminates every possible error – garbage in, garbage out still applies – but it does eliminate the class of hallucination where the model just confidently invents things.

The Software Side

The Python package (pip install smgp) splits into layers:

| Module | What It Does |
| ------------------------- | ----------------------------------------------------------------------- |
| `core.graph` | `SpectralMemoryGraph` — the central knowledge graph with HD addressing |
| `core.hyperdim` | `HyperdimensionalMemory` — bind, unbind, bundle, similarity search |
| `core.spectral` | `SpectralMethods` — Laplacian, eigendecomposition, GFT, Chebyshev conv |
| `core.topology` | `TopologicalAnalyzer` — persistent homology, pruning |
| `core.category` | `GraphRewriter` — DPO-based rewriting |
| `memory` | `MemoryStore`, `AssociativeMemory`, `MemoryLifecycle`, pruning policies |
| `attention.spectral_attn` | `SpectralAttention` — O(N log N) multiscale attention |
| `reasoning` | `ClaimVerifier`, `NeuroSymbolicPlanner` with explainable variants |
| `integration` | HuggingFace, LangChain, FastAPI, ONNX/vLLM, vector DBs, federation |
| `utils` | Auto-tuning, multimodal embeddings, CLI, benchmarks |

Using it as an LLM replacement:

from smgp.integration.huggingface import SMGPForCausalLM
model = SMGPForCausalLM(graph_hd_dim=10000)
model.add_knowledge("Socrates", "person")
model.add_knowledge("Socrates", "Plato", "taught")
output = model.generate(prompt="Who did Socrates teach?", verify_claims=True)

SMGPForCausalLM subclasses PreTrainedModel and slots into any HuggingFace pipeline. The standard transformer feed-forward gets replaced with spectral attention over the knowledge graph, and generation calls the verifier to ground outputs. LangChain users get a persistent SMGPMemory and SMGPVerifierTool for agents that can fact-check themselves.

The Hardware Accelerator (SMGPU)

The spectral transforms, HD operations, and topology calculations add up. Running everything in pure Python on a CPU works for small graphs, but it’s not fast. I designed SMGPU as a domain-specific accelerator in synthesisable SystemVerilog, targeting the Xilinx Alveo U280 (8 GB HBM2, 250 MHz).

Microarchitecture

                          +---------------------------+
                          | PCIe / AXI Host |
                          +-------------+-------------+
                                        |
                          +-------------v-------------+
                          | ISA Decoder |
                          | 32-bit Instruction |
                          | Fetch & Dispatch |
                          +------+------+------+-------+
                                 | | |
          +----------------------+ | +----------------------+
          | | |
 +--------v---------+ +--------v----------+ +--------v---------+
 | Spectral Engine | | HD Engine | | Topology Engine |
 | - Laplacian | | - Bundle: Maj | | - Union-Find |
 | - Eigen-decomp | | - Bind/Unbind | | - Barcode Emit |
 | - Chebyshev | | XOR | | - Wasserstein |
 | - GFT / Wavelet | | - Permute: Shift | | - Stability |
 | 16x16 Systolic | | - Similarity | | Check |
 +--------+---------+ | 16 Parallel Banks| +--------+--------+
          | +---------+----------+ |
          | | |
 +--------v---------+ +---------v----------+ +--------v---------+
 | Rewrite Engine | | Associative Cache | | Memristor |
 | - DPO Match |<-------->| CAM, 256 Entries |<-------->| Crossbar Array |
 | - DPO Apply | | - O(1) Recall | | - 1024x1024 |
 | - Proof Trace | | - 1024-dim HD Keys | | - Analog MVM |
 +--------+---------+ +--------------------+ +-----------------+
          |
 +--------v----------+ +-------------+-------------+ +-------------------+
 | Graph DMA |<--->| 2D Mesh NoC Router |<--->| HBM Controller |
 | Scatter-Gather | | 5-port, XY Routing | | 4-ch, 256-bit |
 +-------------------+ +---------------------------+ +---------+---------+
                                                                        |
                                                              +---------v---------+
                                                              | HBM2 External |
                                                              +-------------------+

Compute Engines

| Engine | Architecture | Key Operations | Pipeline |
| -------- | ------------------------------- | -------------------------------------------------------------- | ------------ |
| Spectral | 16×16 systolic array | Laplacian, eigen-decomp, Chebyshev conv, GFT, wavelets | 4 stages |
| HD | 16 parallel banks | Bundle majority, bind/unbind XOR, permute, similarity popcount | 1–20 cycles |
| Topology | Union-Find with parallel prefix | Filtration, persistent homology, barcode, Wasserstein | 6 stages |
| Rewrite | Backtracking search FSM | DPO pattern match, rule apply, proof trace | 10-state FSM |

Memory Subsystem

| Component | Specification |
| ------------------ | --------------------------------------------------------------------- |
| Associative Cache | 256-entry CAM, 1024-dim projected keys, O(N) parallel lookup |
| HBM2 Controller | 4 channels, 256-bit data, burst-16, AXI4-Stream |
| Memristor Crossbar | Behavioral, 1024×1024 conductance cells, 8-bit resolution, analog MVM |

Interconnect

| Component | Specification |
| --------- | --------------------------------------------------------------------- |
| NoC | 2D mesh, 5-port routers, XY deterministic routing, 2 virtual channels |
| DMA | Scatter-gather, 256-entry descriptor queue, CSR-aware graph traversal |

Instruction Set

All instructions are 32-bit:

[31:28] opcode. – NOP, GRAPH_CTOR, SPECTRAL, HD, TOPOLOGY, REWRITE, MEMORY, SYSTEM
[27:24] sub_opcode. – sub-operation
[23:16] flags. – START, DONE_IRQ, CHAINED, STREAMING, BLOCKED, PRECISION_Q
[15:0] operand. – address, immediate, or register select

8 opcode classes with 3 – 7 sub-opcodes each. The Python HAL translates high-level graph operations into these instructions. If the FPGA isn’t available, everything falls back to pure Python automatically.

from smgp.core.graph import SpectralMemoryGraph
from hardware.sw.smgp_hal.executor import HWExecutor
executor = HWExecutor(device="/dev/smgpu0", fallback=True)
graph = SpectralMemoryGraph(hd_dim=10000, seed=42, executor=executor)
graph.add_edge("Socrates", "Plato", "taught")
from smgp.core.spectral import SpectralMethods
spectral = SpectralMethods(graph)
eigenvalues, _ = spectral.compute_eigen() # runs on FPGA at 250 MHz

Benchmarks

Software (Pure Python)

| Operation | Throughput (queries/s) | Latency (μs) | Memory (MB) |
| --------------------------------------------- | ---------------------: | -----------: | ----------: |
| Graph node creation | 12,000 | 83 | 0.8 |
| HD similarity search: 10K-dim, 10K candidates | 2,400 | 416 | 381 |
| Claim verification: 100-node graph | 8,100 | 123 | 1.2 |
| Spectral attention: 1K tokens, 128-dim | 850 | 1,176 | 45 |

AMD Ryzen 9 5950X, 32 GB RAM, single-threaded.

Hardware Acceleration (FPGA @ 250 MHz)

| Operation | Latency | Throughput | Speed-up vs. CPU |
| -------------------------- | ---------: | ------------------: | ---------------: |
| Laplacian: 4K nodes | ~82 μs | 49 M edges/s | 38× |
| Eigen-decomp: K=64 | ~5.2 ms | 12.3 K iterations/s | 15× |
| Chebyshev conv: K=4 | ~0.33 ms | 12.2 M nodes/s | 22× |
| HD bind/unbind: 10K-dim | 1 cycle | 250 M ops/s | 80× |
| HD similarity: 10K-dim | 313 cycles | 0.8 M queries/s | 12× |
| Topology barcode: 4K nodes | ~1.4 ms | 2.9 K graphs/s | 8× |
| Wasserstein distance | ~0.7 ms | 1.4 K comparisons/s | 11× |
| DPO match: pattern 4 nodes | ~32 μs | 31 K patterns/s | 24× |

CPU baseline: Intel Xeon Gold 6242, 32 cores. FPGA: Alveo U280, Vivado 2023.2, 250 MHz.

The HD bind/unbind hitting 80× speed-up makes sense – those are just XOR operations in parallel across 10,000-bit vectors. The topology numbers are lower because Union-Find has inherent serial dependencies that are harder to parallelize. I’m still looking at whether a different algorithm for persistence could close that gap.

FPGA Resource Utilization (Alveo U280, 16×16 systolic)

| Resource | Used | Available | Utilization |
| -------- | -----: | --------: | ----------: |
| LUTs | 95,328 | 1,043,000 | 9.1% |
| FFs | 78,112 | 2,086,000 | 3.7% |
| BRAM | 120 | 1,872 | 6.4% |
| DSPs | 384 | 9,024 | 4.3% |
| URAM | 48 | 960 | 5.0% |

Plenty of room on the chip. The design is parameterized – you can scale the systolic array from 8×8 up to 32×32 depending on what fits. There’s also an OpenLane ASIC flow targeting SkyWater 130 nm if you want to go beyond FPGAs.

Testing

| Layer | Tests | Result | Method |
| ------------------------ | --------------: | ------------ | ----------------------------------------- |
| Python core | 210+ | All passing | pytest, Hypothesis property-based testing |
| Python integrations | 20+ | All passing | pytest with mock services |
| RTL unit tests | 6 engines | 6/6 | Verilator 5.048, Cocotb |
| Golden model comparisons | 2: spectral, HD | Bit-accurate | Python vs. RTL fixed-point |
| CI regression benchmarks | 8 | Tracked | Airspeed Velocity: ASV |

The end-to-end RTL test loads a small knowledge graph, runs spectral attention, verifies a DPO rewrite, and checks the output against the software golden model. That’s been the most useful test for catching integration bugs between engines.

Running It in the Cloud

You don’t need your own FPGA. The Chameleon Cloud testbed (NSF-funded, free for research) has bare-metal Alveo U280 nodes.

Build the bitstream locally (requires Vivado 2023.2):

cd hardware/fpga/build
make synth BOARD=xcu280
make impl BOARD=xcu280
make xclbin. # produces smgp_u280.xclbin

Reserve a Chameleon node, upload the .xclbin, program the FPGA, and you’re running the same Python code with hardware acceleration. Full instructions are in the repo.

What’s Next

The HAL-to-core wiring is designed but needs merging. The PCIe driver skeleton exists but needs the register map filled in to match the RTL. I’m also looking at multi-FPGA scaling – partitioning the knowledge graph across several accelerators – and at formal verification of the ISA.

Power measurement is on the list too. I want to see how SMGPU compares to GPU inference in joules per query, not just latency. That might be the more interesting number.

Where This Came From

I got tired of patching the same problems over and over. Better prompting, RAG, fine-tuning – they all help around the edges but don’t change the fact that transformers have no persistent state and no built-in notion of truth. SMGP is an attempt to build something where memory, attention, and verification aren’t add-ons. They’re the foundation.

If you want to dig in, contribute, or talk about taping out an ASIC, the repo is at github.com/rotsl/smgp.

The thing I keep thinking about: we’ve spent a long time optimizing the transformer architecture within its inherent constraints. What if the constraints themselves are the problem?

This project draws on ideas from spectral graph theory (Chung, 1997), hyperdimensional computing (Kanerva, 1988; Plate, 1995), topological data analysis (Edelsbrunner et al., 2002), and algebraic graph transformation (Ehrig et al., 2006). The full references are in the repository.

What Happens When You Try to Reverse Biology? A Deep Look at the Protein DNA Analysis Simulator

RoTSL — Fri, 24 Apr 2026 12:57:47 +0000

Most biology tools move in one direction. DNA becomes RNA. RNA becomes protein.

Photo by Warren Umoh on Unsplash

That flow is drilled into anyone who has taken genetics or molecular biology. After a while, it becomes background knowledge. You stop questioning it because the pathway feels settled.

Then a project comes along and asks a slightly uncomfortable question:

What if we tried to move backward?

The Protein → DNA Synthesis Simulator explores that idea.

Project links:

This is not laboratory software. It is not a validated bioinformatics pipeline. It is a simulation designed to explore a biological question.

That distinction matters from the beginning.

This project is simulation and hypothetical only and has not been tested or validated. This project is for educational purposes only.

Why reverse biology is harder than it sounds

At first glance, reverse translation seems simple.

If DNA creates proteins, then proteins should point back to DNA.

But biology does not preserve information perfectly.

A protein sequence contains amino acids. DNA contains codons.

The problem is that multiple codons can encode the same amino acid.

For example:

Leucine can be encoded by six codons
Serine can be encoded by six codons
Arginine can be encoded by six codons

This redundancy is called degeneracy of the genetic code.

Once translation happens, part of the original DNA detail disappears.

You keep the protein.

You lose some of the nucleotide history.

That means reverse translation is not reconstruction.

It is estimation.

The simulator starts with a simple but useful idea

The main GitHub project allows users to input a protein sequence and generate a possible DNA equivalent.

The workflow is straightforward:

Enter a protein sequence
Parse amino acids
Match amino acids to codons
Build a theoretical DNA sequence
Display a reconstructed output

The important word here is possible.

The simulator does not claim to discover the original DNA strand.

It generates a biologically plausible candidate.

That may sound like a small distinction, but it changes how the project should be understood.

This is closer to a thought experiment than a prediction engine.

The protein analysis page adds a second layer

The main simulator introduces reverse translation.

Main Page

Link: https://rotsl.github.io/protein2dna_synthesis-simulator/protein_analysis/

Instead of immediately converting protein into DNA, the analysis layer pauses to inspect the sequence itself.

That shift makes the project more interesting.

A protein sequence contains more than letters.

It contains patterns.

The analysis process can reveal:

Amino acid composition
Sequence length
Repeating motifs
Structural tendencies
Hydrophobic or hydrophilic regions
Codon ambiguity possibilities
Conserved residue clusters

This changes the role of the simulator.

You stop treating proteins as outputs and start treating them as encoded information.

Protein sequences carry clues, not complete answers

One misconception in biology education is that proteins behave like exact reflections of DNA.

They do not.

A protein preserves order.

It does not preserve every codon decision.

This becomes obvious during reverse translation.

One protein sequence may correspond to many DNA sequences.

Different organisms may also favor different codon usage patterns.

A bacterial sequence and a mammalian sequence may encode the same amino acids using different codon preferences.

The simulator exposes this uncertainty instead of hiding it.

That is one of its strongest features.

The protein analysis page changes how users think

Many educational biology tools focus on output.

Input something. Receive a result.

The analysis page encourages a different rhythm.

You enter a sequence and ask:

What kind of protein is this?
Are certain amino acids overrepresented?
Are there repeating patterns?
Could this suggest structural behavior?
How ambiguous is reverse translation?

These questions matter because biological interpretation happens before prediction.

Researchers rarely jump directly to answers.

They inspect patterns first.

The analysis page mirrors that mindset.

Analysis

The live simulation version feels more interactive

The GitHub build introduces the idea.

The version feels closer to a working prototype.

Try it out here : https://protein-dna-simulator.vercel.app/

Preview structures

At first, both versions appear similar.

Protein input. DNA output. Translation logic.

But the live build feels faster and more responsive.

It behaves less like a static webpage and more like an active sequence workspace.

You can experiment quickly.

Change one amino acid.

Watch the sequence shift.

Remove residues.

See how output changes.

That immediate feedback matters.

Learning becomes easier when interaction is continuous.

The live version works like a live sequence interpreter

Many biology tools rely on a submission model:

Paste sequence
Configure settings
Submit request
Wait for processing
Read results

The live version presented shortens that cycle.

You enter data and receive near-instant interpretation.

That makes experimentation feel natural.

You stop thinking in terms of “jobs” and start thinking in terms of exploration.

The live simulator combines several biological layers

The interface appears to merge multiple ideas into one workflow.

Engineering Tab

The system includes:

Protein parsing
Amino acid recognition
Codon mapping
DNA sequence estimation
Educational translation logic
Sequence relationship visualization

This matters because reverse translation is not a single operation.

It is a chain of assumptions.

Each amino acid creates branching possibilities.

The simulator turns those possibilities into something visible.

Reverse translation exposes a hidden truth about biology

Most diagrams simplify biology into arrows.

DNA → RNA → Protein.

That model is useful.

It is also incomplete.

Real biology includes ambiguity.

Codon redundancy means several nucleotide sequences can create identical proteins.

That creates uncertainty.

The simulator does not remove uncertainty.

It places uncertainty at the center of the experience.

That makes the project more honest than many educational demos.

The project feels closer to computational biology than traditional teaching software

There is a subtle shift that happens when using the simulator.

You stop memorizing.

You start interpreting.

That makes the project feel closer to lightweight bioinformatics.

Professional sequence-analysis tools often involve:

Pattern recognition
Sequence comparison
Codon usage analysis
Structural inference
Similarity scoring
Translation mapping

The simulator is not competing with research-grade systems.

It borrows concepts from computational biology and simplifies them into something approachable.

Inspiration from published Science research

The project notes inspiration from research published in Science:

https://www.science.org/doi/abs/10.1126/science.aed1656

The simulator does not reproduce the paper’s findings.

The connection is conceptual.

Modern biology increasingly depends on inference.

Researchers often estimate relationships between biological systems rather than directly observing every process.

Protein folding prediction, sequence inference, and molecular modeling all rely on computational interpretation.

The simulator fits within that broader idea.

It asks:

If proteins preserve traces of DNA history, how much can we estimate from those traces?

That question alone makes the project worth exploring.

Why uncertainty is the most valuable part of the simulator

The strongest lesson here is not reverse translation.

It is uncertainty.

Science education sometimes creates the illusion that biology always produces clean answers.

The simulator quietly pushes against that assumption.

You expect one DNA sequence.

You discover many possibilities.

You expect certainty.

You find ambiguity.

That is closer to how real biological reasoning works.

What makes the project useful for education

The simulator works well for:

Students learning transcription and translation
Beginners exploring codon relationships
Developers interested in biological computation
Bioinformatics learners experimenting with sequence logic
People curious about molecular coding systems

The project does not require laboratory experience.

It asks users to think.

That alone gives it educational value.

Where the simulator could grow

There are several additions that could deepen the learning experience.

Organism-specific codon bias

Different organisms prefer different codons.

Adding selectable species would make reverse translation more realistic.

Multiple DNA candidates

Instead of returning one sequence, the simulator could generate ranked alternatives.

Probability scoring

Codon likelihood could help explain why some outputs are more plausible.

Structural hints

The analysis page could flag patterns associated with alpha helices or beta sheets.

Sequence comparison

Comparing proteins side-by-side would help explain mutation and similarity.

These additions would not make the simulator “correct.”

They would make uncertainty easier to understand.

Final thoughts

The Protein → DNA Synthesis Simulator works best when treated as a reasoning tool.

It does not reconstruct biology.

It explores biological possibility.

The GitHub version explains the concept.

The protein analysis page adds interpretation.

The live web version makes the process interactive.

Together, they create a small ecosystem for thinking about biological information in reverse.

The project becomes more interesting once you stop asking:

“Is this the original DNA?”

and start asking:

“What assumptions make this sequence plausible?”

That shift changes the experience.

You stop seeing proteins as endpoints.

You start seeing them as traces.

And sometimes, tracing hidden information is more interesting than certainty.

Project links

Main simulator: https://rotsl.github.io/protein2dna_synthesis-simulator/
Protein analysis page :
Interactive web version: https://protein-dna-simulator.vercel.app/

References

[1] Peiwei Deng, Heewon Lee, Carlos Armijo, Hao Wang, and Albert Gao. Protein-templated synthesis of di-nucleotide repeat DNA by an anti-phage reverse transcriptase. Science, page aed1656, 2026. PDB: 9Z6Y.

[2] Alexander A. Green, Pamela A. Silver, James J. Collins, and Peng Yin. Toehold switches: De-novo-designed regulators of gene expression. Cell, 159:925–939, 2014.

[3] Andrew V. Anzalone, Peyton B. Randolph, Jessie R. Davis, Alexander A. Sousa, LukeW.Koblan, JonathanM.Levy, PeterJ.Chen, ChristopherWilson, GregoryA. Newby, Aditya Raguram, and David R. Liu. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature, 576:149–157, 2019.

[4] Andrew V. Anzalone, Xin D. Gao, Christopher J. Podracky, Andrew T. Nelson, Luke W. Koblan, Aditya Raguram, Jonathan M. Levy, Jeffry A. M. Mercer, and David R. Liu. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nature Biotechnology, 40:731–740, 2022.

[5] Rohan R. DRT3b Engineering Studio: Protein →RNA →DNA simulator. https://github.com/rotsl/protein2dna_synthesis-simulator, 2026. Open-source web simulation platform. Live app: https://protein-dna-simulator.vercel.app/.

[6] Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Oriol Vinyals, Demis Hassabis, and Sameer Velankar. AlphaFold Protein Structure Database: massive structural coverage for biology and medicine. Nucleic Acids Research, 50:D439–D444, 2022.

[7] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. Highly accurate protein structure prediction with AlphaFold. Nature, 596:583–589, 2021.

[8] Kirsten L. Frieda, James M. Linton, Sahand Hormoz, Junhong Choi, Koo-Lok K. Chow, Zeba S. Singer, Mark W. Budde, Michael B. Elowitz, and Long Cai. Synthetic recording and in situ readout of lineage information in single cells. Nature, 541:107–111, 2017.

[9] Bushra Raj, Daniel E. Wagner, Aaron McKenna, Shristi Pandey, Allon M. Klein, Jay Shendure, Deepak L. Bhatt, and Bhatt Bhatt. GESTALT: a method for tracing lineage and clonal dynamics at single-cell resolution. Nature Biotechnology, 36:442–450, 2018.

[10] Wenyuan Tang, James H. Hu, and David R. Liu. PEAR: a highly multiplexed lineage recorder based on prime editing and in situ barcode readout. Nature Methods, 21:1054–1065, 2024.

[11] Junhong Choi, Wei Chen, Anna Minkina, Florence M. Chardon, Chase C. Suiter, Samuel G. Regalado, Silvia Domcke, Nobuhiko Hamazaki, Choli Lee, Beth Martin, Ryan M. Daza, and Jay Shendure. A temporally resolved, multi-symbolic molecular recorder based on sequential DNA writing. Nature Chemical Biology, 18:1204–1212, 2022.

[12] Nathaniel Roquet, Ava P. Soleimany, Alyssa C. Ferris, Scott Wick, and Timothy K. Lu. Synthetic recombinase-based state machines in living cells. Science, 353:aad8559, 2016.

⚠️ Important disclaimer

This project is simulation and hypothetical only and has not been tested or validated. This project is for educational purposes only.

Hypercontext: a framework for agents that actually know what they're doing

RoTSL — Mon, 20 Apr 2026 16:41:25 +0000

I built Hypercontext because I got tired of agent frameworks that treat context like a static blob you shove into a prompt and hope for the best. Most tools out there assume context is something you pass. I wanted something that treats context as something you can inspect, compress, score, and rewrite while the agent is running.

Hypercontext is still in Alpha phase.

This isn't about adding another layer of abstraction over OpenAI's API. It's about making agents aware of their own reasoning so they can fix it when it breaks.

What it actually does

Hypercontext is a self-referential agent framework for Python and TypeScript. The core idea is simple: agents should be able to read and modify their own system prompts, tool descriptions, memory, and runtime capabilities based on whether they're actually succeeding at the task.

The framework ships with:

A Python SDK with orchestration, agents, scoring, memory, compression, deduplication, convergence detection, archive helpers, and extensions
A TypeScript SDK for Node.js with the same primitives
A modular extension system for adding capabilities without modifying the core runtime
Built-in research tooling extensions for web search, retrieval, and evidence gathering
A CLI for running compression, archive queries, provider discovery, orchestration, and extension workflows
A curses-based terminal UI for browsing and pinning commands without leaving the shell
A browser dashboard for visual inspection
An MCP stdio daemon for Claude Desktop, Claude Code, and Codex integration
An HTTP MCP server for web integrations

Both SDKs are zero-dependency where possible. The Python core is pure Python. The TypeScript SDK has minimal deps. You can run the whole thing against Ollama locally without touching a cloud provider.

The problem with most agent frameworks

I've used agents. They all share the same blind spot: context is treated as immutable input. You construct a prompt, feed it to the model, get output back. If the output is wrong, you tweak the prompt and try again. The agent itself has no idea what worked and what didn't across runs.

Hypercontext changes this by making context a first-class citizen that agents can manipulate. Each generation gets tracked as a node in a lineage tree. You can see which parent led to which result, which branch is going stale, and which context configuration produced the best score. Successful strategies get archived and reused. Failed ones get pruned.

This isn't theoretical. The archive stores scored generations so later runs can compare branches and identify the strongest evolution path. Memory is split between persistent storage (lessons across runs) and episodic storage (context within a single session).

Hypercontext also treats tooling as adaptive context. Extensions can expose capabilities dynamically during execution, allowing an agent to decide when external retrieval, research, or analysis should become part of its reasoning loop.

How the context loop works

Here's the basic flow:

The agent receives a task and its current context window
It generates a response and scores the result against a fitness function
If the score is below threshold, the agent reflects on what went wrong
It rewrites its own system prompt, tool descriptions, memory, or extension configuration based on that reflection
The new context configuration gets tested in the next generation
Successful configurations get archived; failed ones get discarded

This happens automatically in the TaskAgent and MetaAgent classes. You don't need to hand-code the reflection logic unless you want to.

The MetaAgent goes further. It can perform repository-aware tool use, extension orchestration, and self-modification workflows. If you point it at a codebase with --workdir, it can inspect files, suggest modifications, and track whether those modifications improved the code quality.

Extensions system

One of the biggest additions to Hypercontext is the extension architecture.

Extensions allow you to add runtime capabilities without bloating the core framework. Instead of hardcoding every tool into the agent runtime, Hypercontext lets you compose functionality based on the task.

Extensions can provide:

Tool registries
Retrieval pipelines
Research workflows
Context enrichers
External APIs
Scoring hooks
Middleware layers
Memory augmentation
Custom orchestration behaviors

This keeps the framework modular while still allowing deeply integrated workflows.

Extensions are designed to be lightweight and composable. You can enable only what your workflow requires.

Research tools extension

The research_tools extension adds structured information gathering to Hypercontext.

Instead of forcing agents to rely purely on static context or hallucinated recall, the extension provides a research layer that can gather, refine, and reuse evidence during execution.

The extension includes capabilities for:

Query expansion
Iterative research loops
Evidence collection
Citation-aware retrieval
Multi-step search refinement
Context injection from retrieved sources
Search result scoring and filtering
Research memory persistence across generations

This matters because long-running reasoning tasks often fail not from poor prompting but from incomplete information. Research tooling allows the agent to actively reduce uncertainty.

The extension integrates directly into the context evolution loop, meaning research results become part of the lineage and scoring process rather than existing as disconnected tool outputs.

Installation and setup

For Python:

pip install hypercontext

For Node.js:

npm install hypercontext-node-sdk

That's it. No separate MCP package to install. No complex dependency tree.

The Python package includes:

Core SDK
CLI
TUI
Browser dashboard launcher
MCP stdio daemon
HTTP server
Built-in extensions

The npm package is the SDK layer for Node.js projects.

Provider setup

Hypercontext doesn't lock you into a provider. It supports Claude, OpenAI, Ollama, OpenAI-compatible servers, and local transformers models. You set credentials via environment variables or a YAML config file with named presets.

For Claude:

HYPERCONTEXT_PROVIDER=anthropic
HYPERCONTEXT_MODEL=claude-sonnet-4-20250514
ANTHROPIC_API_KEY=your-key-here

For Ollama (fully local):

ollama serve
ollama pull llama3

HYPERCONTEXT_PROVIDER=ollama
HYPERCONTEXT_MODEL=llama3
OLLAMA_BASE_URL=http://localhost:11434

The named preset feature is useful when you want multiple backends in one project. You define them in a YAML file and resolve by name at runtime. The framework expands ${VAR} values from the environment, so secrets stay out of config files.

Using extensions in Python

Extensions are loaded directly into the runtime.

from hypercontext import HyperContext
from hypercontext.extensions import ResearchToolsExtension

hc = HyperContext(output_dir="./hypercontext_output")

hc.use(
    ResearchToolsExtension()
)

summary = hc.run(max_generations=3)
print(summary)

You can compose multiple extensions together depending on the workflow.

hc.use(ResearchToolsExtension())
hc.use(CustomMemoryExtension())
hc.use(CustomScoringExtension())

Extensions participate in orchestration rather than existing as isolated plugins.

Using it in Python

Direct orchestration is straightforward:

from hypercontext import HyperContext

hc = HyperContext(output_dir="./hypercontext_output")
summary = hc.run(max_generations=3)

print(summary)

If you want provider-backed calls without the full orchestration loop:

from hypercontext import LLMClient
from hypercontext.providers import ProviderRegistry

registry = ProviderRegistry.instance()

provider = registry.create(
    "anthropic",
    model="claude-sonnet-4-20250514",
    api_key="your-key-here",
    base_url="https://api.anthropic.com",
)

client = LLMClient(provider=provider)

text, history, metadata = client.complete(
    "Summarize this in one sentence."
)

For agent workflows, you choose between TaskAgent (repeatable tasks) and MetaAgent (repository-aware reasoning and self-modification). Both support context evolution and extension-aware execution.

Using it in TypeScript

The Node SDK follows the same patterns:

import {
  ContextWindow,
  TaskAgent,
  StructuredOutputParser,
  EnhancedToolRegistry,
  LoggingMiddleware,
} from "hypercontext-node-sdk";

const window = new ContextWindow(4096);
window.add("Important context", 1.0, "system");

const agent = new TaskAgent({
  name: "demo",
  maxTokens: 1024,
});

const result = agent.forward({
  query: "hello",
});

const parser = new StructuredOutputParser();

console.log(
  parser.parseFirst('Answer: {"status":"ok"}')
);

const registry = new EnhancedToolRegistry();

registry.use(new LoggingMiddleware());

registry.registerTool(
  {
    name: "echo",
    description: "Echo a payload back",
    parameters: { type: "object" },
  },
  async (args) => args,
);

The TypeScript SDK includes context compression, retrieval, lineage tracking, persistent memory, fitness evaluation, structured output parsing, and extension support.

CLI and terminal UI

The Python package includes a full CLI:

python -m hypercontext version
python -m hypercontext providers
python -m hypercontext run --generations 5 --output-dir ./runs/demo --workdir .
python -m hypercontext compress --input long_text.txt --ratio 0.4
python -m hypercontext archive --list
python -m hypercontext extensions --list

The TUI is a curses dashboard for browsing commands, pinning favorites, and executing them without leaving the terminal.

python -m hypercontext tui --workdir /path/to/project

For desktop assistants, the stdio MCP daemon handles Claude Desktop, Claude Code, and Codex:

python -m hypercontext mcp --workdir /path/to/project

For browser integrations, the HTTP server exposes the same tools over a REST interface:

python -m hypercontext serve --port 8080 --workdir /path/to/project

MCP integration without the hassle

Most MCP implementations require you to install a separate package and configure JSON files.

Hypercontext bundles the stdio daemon and HTTP server directly.

You don't need to install additional MCP dependencies.

The stdio daemon speaks the Model Context Protocol natively. Claude Desktop can discover and invoke Hypercontext tools without manual configuration. The HTTP server provides the same capability for browser and web integrations.

Extensions can also expose MCP-accessible tools, making them available to external clients automatically.

Context compression and deduplication

One of the practical problems with long-running agents is context bloat.

Hypercontext includes a ContextCompressor that reduces text size while preserving semantic meaning.

There's also a validator that checks compression fidelity so you don't accidentally drop important information.

The deduplication layer identifies repeated patterns across generations and collapses them.

This matters when you're running evolutionary loops where similar context configurations get tested repeatedly.

Lineage tracking

Every generation gets a unique ID and tracks its parent.

You can query the lineage tree to answer questions like:

Which generation produced the best score?
Which parent led to this result?
Which branch hasn't improved in the last 10 generations?

This isn't just logging.

The lineage data feeds back into the parent selection strategy for the next generation.

Stagnant branches get deprioritized. High-fitness branches get explored further.

Research extension outputs can also become lineage artifacts, meaning evidence chains are tracked alongside prompt evolution.

Archive and transfer learning

The archive stores proven context configurations ranked by fitness score.

When you start a new task, the framework can query the archive for context patterns that worked well on similar tasks.

This is transfer learning without neural network retraining.

You're transferring context strategies instead of model weights.

The archive is queryable via CLI:

python -m hypercontext archive --query "task:code-review fitness:>0.8"

Archived runs can include extension-derived context, allowing successful research workflows to be reused across future tasks.

What I learned building this

I started this project after reading the Hyperagents paper and getting frustrated that none of the existing frameworks implemented the meta-cognitive ideas in a practical way.

Most research code is a mess of Jupyter notebooks and hardcoded paths.

I wanted something you could actually install and use.

The hardest part wasn't compression or lineage tracking.

It was designing the agent loop so self-modification doesn't spiral into chaos.

If an agent can rewrite its own system prompt, it can also break its own system prompt.

The convergence detection layer stops the loop when scores plateau or when context configurations start cycling.

I also learned that modularity matters more than feature count.

Extensions let Hypercontext grow without turning the framework into an unmaintainable monolith.

Current state and what's next

The framework is functional and I'm using it in my own projects.

The Python package is on PyPI, the TypeScript SDK is on npm, and the docs are on GitHub Pages.

I'm currently working on:

Better convergence heuristics for multi-objective optimization
A web-based lineage visualizer
Additional extension categories
Improved research pipelines
Benchmark suites to compare context strategies across tasks
Better local model workflows
More retrieval-aware orchestration patterns

The repo includes runnable examples for evolution, lineage tracking, self-modifying agents, extensions, provider workflows, and research tooling.

If you want to see what the framework can do without writing code, start with:

examples/python/feature_gallery.py

Try it

# Python
pip install hypercontext
python -m hypercontext version

# TypeScript
npm install hypercontext-node-sdk

Docs
Extensions
PyPI
npm

NoB (Noticeably Better): a compiled language that tries to stay out of your way

RoTSL — Mon, 13 Apr 2026 12:39:52 +0000

Most new languages promise the same things: performance, simplicity, better tooling. NoB is trying to hit those too—but the interesting part is how it actually does it.

At its core, NoB is a compiled language that targets C++20, with a second execution path through a bytecode VM. That split ends up being more practical than it sounds.

Website: https://nob-lang.omni-flows.uk/
Source: private / proprietary
Platforms: macOS, Linux, Windows (via WSL2)

What NoB actually is

NoB compiles .nob code into C++20, then uses clang++ to produce a native binary.

There’s also a VM mode that skips compilation entirely and runs bytecode instead.

That gives you two very different workflows:

Native pipeline → slower startup, fast runtime
VM pipeline → instant startup, slower runtime

In practice, it feels like using two tools under one language.

The two pipelines (and when they matter)

Native (default)

nob file.nob
nob file.nob -o app

This is what you’d use for anything serious.
• Compiles via C++20
• Uses clang++
• Runs as a native binary
• Supports everything (networking, threads, async, etc.)

VM mode

nob --vm file.nob

This skips the compiler completely.

It’s useful for:
• quick scripts
• REPL work
• testing ideas

But it’s not feature-complete. Networking, threading, and some advanced features don’t work here.

The syntax (closer to Python than C++)

The syntax leans readable without being too loose.


set name to "Alice"

function greet(name)
  return "Hello, {name}"
end

print greet("Bob")

A few things stand out:
• set vs let (mutable vs immutable)
• 1-based indexing
• string interpolation built-in
• structured blocks without braces

It’s easy to pick up, especially if you’ve used Python or Lua.

Features that are actually interesting

Tail-call optimization

Recursive functions don’t blow the stack if written in tail form:


function sum_tail(n, acc)
  if n == 0 then return acc end
  return sum_tail(n - 1, acc + n)
end

This gets compiled into a loop automatically.

Pipe operator


words
  |> filter(function(w) return len(w) > 3 end)
  |> map(function(w) return upper(w) end)
  |> sort()

It makes chained transformations easier to read.

Macros (compile-time)


macro swap(a, b)
  set tmp to a
  a = b
  b = tmp
end

These run at compile time, not runtime.

Python backend


nob py file.nob -o file.py --run

This converts NoB into Python so you can use Python libraries like NumPy or OpenCV.

There are limits (no macros, no pipe operator), but it’s useful when you need the ecosystem.

Built-in concurrency (native only)


set t to thread_spawn(function()
  print "running"
end)

thread_join(t)

Includes threads, mutexes, channels, and async support.

Tooling (built-in)

NoB ships with a lot already included:


nob repl
nob check file.nob
nob profile file.nob
nob fmt file.nob
nob gui

Notable pieces:
• GUI REPL (nob gui)
• formatter and profiler included
• package manager (nob pkg)

You don’t need to assemble a separate toolchain.

Performance

From the official benchmarks:

Benchmark	Python	NoB	Speedup
Simple loop	0.498s	0.046s	~10.9×
Prime count	0.090s	0.018s	~4.9×
Fibonacci	0.734s	0.484s	~1.5×

What this means
• Big gains in loops and numeric work
• Smaller gains in recursive workloads
• Much faster than Python for CPU-heavy code

Compilation targets


nob file.nob --profile simd
nob file.nob --profile cuda
nob file.nob --profile wasm

Supports:
• SIMD optimized builds
• CUDA / OpenCL
• WebAssembly
• LLVM IR

Platform support

Platform Support
macOS Native
Linux Native
Windows WSL2

Pricing (indicative)

Tier	Price
Free	£0
Indie	£5–10/month
Pro	£10–20/month

Where it fits

NoB makes sense if you want:
• something faster than Python
• something simpler than C++
• built-in tooling without extra setup

Less ideal if you need:
• a large ecosystem
• long-established tooling

Final thoughts

NoB isn’t trying to reinvent programming. It’s trying to remove friction.

The dual pipeline is the most practical part—you can prototype quickly in VM mode, then switch to native when performance matters.

It’s early, but the core design is solid. Worth keeping an eye on.

I built a safety net for python environments because I was tired of debugging “It works on my machine”

RoTSL — Thu, 09 Apr 2026 09:59:33 +0000

Why every python developer needs a preflight check for their code

I used to have a recurring nightmare. I’d be halfway through a machine learning experiment, three hours into training, when suddenly everything would explode. Not because my code was wrong, but because my environment was quietly broken in a way I couldn’t see coming. Wrong Python version. A package that got installed with pip instead of conda. A wheel that claimed to support my architecture but didn’t. Rosetta 2 running my arm64 Python as x86_64 and tanking my GPU acceleration.

The error messages were always cryptic. The fixes were always tedious. And the worst part? I never knew if my environment was actually healthy until something went wrong.

So I built EnvGuard—a CLI tool that validates your Python environment before you run your code, not after it breaks. Think of it as a preflight checklist for your Python projects. If something’s wrong, it blocks execution and tells you exactly what’s broken and how to fix it. If everything passes, your command runs in a validated environment.

It’s macOS-first (because that’s where I do my work), runs on Linux with partial support, and deliberately doesn’t support Windows (because life’s too short for three-platform maintenance). You can install it from PyPI as envguard-tool — the CLI command is just envguard.

The Problem: Python Environments Are Fragile and Invisible

Python environment management is a solved problem in the same way that herding cats is a solved problem. Technically there are tools. Practically, things go wrong constantly.

Here’s what actually happens in the wild:

The architecture confusion. You install Python on an M1 Mac, but somehow you’re running the x86\_64 version under Rosetta 2. Everything works, but your Metal Performance Shaders (MPS) acceleration is silently disabled. PyTorch falls back to CPU. Your training takes 10x longer. You don’t notice until you check activity monitor and see no GPU usage.

The mixed ownership trap. You create a conda environment, but then you pip install something because the conda version is outdated. Then you conda install something else. Now you have packages owned by two different managers, and pip check is screaming about conflicts, but your code still runs so you ignore it until it doesn’t.

The CUDA delusion. You’re on macOS, but your requirements.txt includes torch==2.0.0+cu118 because you copied it from a Linux server. It installs fine. It even imports. But CUDA doesn’t exist on Apple Silicon, and your code fails three layers deep in a stack trace that mentions nothing about GPU compatibility.

The “it worked yesterday” mystery. Your environment was fine. Then you updated one package. Now something else is broken. You have no idea what changed or when.

These aren’t exotic edge cases. They’re daily experiences for Python developers working on ML, data science, or scientific computing projects. The existing tools — conda, pip, poetry, uv, pyenv — are great at creating environments. They’re terrible at validating them continuously.

The Solution: Preflight Validation Every Single Time

EnvGuard’s core idea is simple: instead of running python train.py and hoping, you run envguard run — python train.py. Before your code executes, EnvGuard runs a nine-step preflight pipeline:

Detect the host — OS version, architecture (native arm64 vs Intel vs Rosetta 2), available package managers, network connectivity
Discover the project — scan for pyproject.toml, requirements.txt, environment.yml, Pipfile, poetry.lock, etc.
Analyze intent — figure out what environment type (venv/conda/pipenv/poetry), Python version, and accelerator targets (CPU/MPS/CUDA) the project needs
Evaluate rules — run 15+ validation rules to catch problems
Fail-fast on critical issues — block execution if anything is unrecoverable
Create a resolution plan — determine exactly how to satisfy the environment requirements
Create or repair the environment — make sure the actual environment matches the requirements
Validate the environment — run pip check or equivalent to verify consistency
Smoke test — try importing key packages in an isolated subprocess to catch runtime failures

If any step fails with a CRITICAL finding, your command never runs. You get a clear error message explaining what went wrong and how to fix it. No cryptic tracebacks. No debugging environment issues at 2am.

What EnvGuard Actually Catches

The rules engine evaluates 15+ specific checks. Here are the ones that have saved me the most pain:

Rule	Severity	What it catches
`CUDA_ON_MACOS`	CRITICAL	Any CUDA dependency on macOS (hardware impossibility)
`ROSETTA_TRANSLATION_DETECTED`	WARNING	x86_64 Python running under Rosetta 2 on Apple Silicon (kills MPS performance)
`ARCHITECTURE_MISMATCH`	ERROR	Python architecture doesn't match project requirements
`MIXED_PIP_CONDA_OWNERSHIP`	WARNING	Packages installed by both pip and conda (dependency hell indicator)
`WHEEL_INCOMPATIBLE`	WARNING	Wheel file doesn't match current platform/architecture
`BROKEN_ENVIRONMENT`	ERROR	Active venv/conda is missing Python binary or critical files
`PYTHON_VERSION_BELOW_MINIMUM`	ERROR	Python version below `requires-python` in pyproject.toml
`MPS_NOT_AVAILABLE`	INFO	Apple Silicon present but MPS not available (usually means PyTorch wasn't built with MPS support)

The CUDA_ON_MACOS rule alone has probably saved me hours of debugging. Here’s what happens: you copy a requirements.txt from a Linux machine that specifies torch==2.1.0+cu118. You install it on your M1 Mac. It seems to work — pip doesn’t complain, the import succeeds. But when you actually try to move tensors to the GPU, you get a cryptic error about CUDA devices not being available. EnvGuard catches this at the dependency resolution stage and blocks execution with a message telling you to use mps or cpu targets instead.

The ROSETTA_TRANSLATION_DETECTED rule is subtler but equally important. If you’re running an x86_64 Python binary on an Apple Silicon Mac (usually because you installed it before Rosetta was properly configured, or you’re using an old pyenv), everything works — but MPS acceleration is silently disabled. Your ML training runs on CPU. Your inference is 10x slower than it should be. EnvGuard detects this via sysctl proc_translated and warns you that you’re leaving performance on the table.

The Technical Architecture: How It Actually Works

EnvGuard is built as a layered Python package with clear separation of concerns. The source is in the GitHub repo under src/envguard/.

CLI Layer (cli.py): Typer-based interface with 25 commands across environment management, dependency resolution, lock files, publishing, and self-updating. Every command supports — json output for CI/CD integration.

Orchestration Layer (preflight.py, doctor.py): The preflight engine runs the nine-step pipeline. The doctor runs standalone diagnostics without execution. Both use the same underlying detection and rules systems.

Domain Layer : The heavy lifting happens here:

detect.py — HostDetector class gathers OS, architecture, Python, shell, network, and permission facts
rules.py — RulesEngine evaluates all 15+ validation rules
repair.py — RepairEngine can automatically fix broken environments (recreate venvs, fix mixed ownership, switch Python versions)
models.py — Pydantic models for HostFacts, ProjectIntent, RuleFinding, ResolutionRecord, etc.
project/ — Discovery (scanning for project files), intent analysis (inferring requirements), resolution (dependency solving), and lifecycle management
resolver/ — Pluggable backends for PyPI (BFS resolution via JSON API), uv, pip, and conda
lock/ — Lock file generation and management (envguard.lock in TOML format with SHA-256 content hashes)
update/ — Self-updating mechanism with SHA-256 verification and rollback support

Platform Layer : macOS-specific code for permissions, Rosetta detection, Xcode CLI tools, and LaunchAgent management. Linux gets partial support here — core pipeline works, but no LaunchAgent, no MPS detection, no Rosetta checks.

All state files (in .envguard/) are written atomically using write-to-temp-then-rename to prevent corruption from interrupted writes. Every subprocess call has explicit timeouts. The security model is documented in detail — checksum verification for updates, no shell=True with string interpolation, path traversal protection for archive extraction.

Real Usage: What My Workflow Looks Like

I work on a lot of ML projects with different requirements. Some need PyTorch with MPS. Some need TensorFlow (which has its own special hell on macOS). Some are pure Python data pipelines. Here’s how I actually use EnvGuard day-to-day.

Starting a new project:

cd ~/projects/new-ml-experiment
envguard init

This creates a .envguard/ directory with state.json, envguard.toml (config), and subdirectories for snapshots, cache, logs, and backups. It scans my project files to figure out what I’m building.

Checking if everything is healthy:

envguard doctor

This runs 10 diagnostic checks: host detection, project discovery, Python environment, package manager health, dependency consistency, accelerator support, permissions, network connectivity, and environment ownership. It outputs a report showing what’s working and what’s not.

Running my actual code:

envguard run - python train.py - epochs 100 - batch-size 32

Before train.py executes, the preflight pipeline runs. If my environment has drifted — say, I updated PyTorch and now there’s a version conflict with torchvision — EnvGuard catches it and blocks execution. I can then run envguard repair to fix it automatically, or envguard lock sync to reinstall from my lock file.

Locking dependencies for reproducibility:

envguard resolve
envguard lock generate

resolve uses the PyPI JSON API to resolve my project dependencies to exact pinned versions. lock generate writes an envguard.lock file with SHA-256 content hashes. I commit this to git. When someone else clones the repo, they run envguard install — from-lock and get exactly the same environment I have.

Self-updating:

envguard update - dry-run # check if there's a new version
envguard update # actually update with SHA-256 verification and automatic rollback snapshot

EnvGuard can update itself. Before applying an update, it creates a rollback snapshot. If something goes wrong, envguard rollback restores the previous version.

The Lock File: Reproducibility Without the Pain

Python dependency management has a reproducibility problem. requirements.txt with loose version constraints means “install something that hopefully works.” requirements.txt with pinned versions means “this worked on my machine at one specific moment, but good luck if you’re on a different architecture or Python version.”

EnvGuard’s lock file (envguard.lock) tries to be smarter. It’s a TOML file that includes:

The exact resolved dependency graph with specific versions
SHA-256 content hashes for verification
Platform and Python version markers (so you can have different resolutions for macOS-arm64 vs Linux-x86_64 if needed)
The source files that contributed to the resolution (pyproject.toml, requirements.txt, etc.)

The lock file is human-readable but machine-generated. You don’t edit it manually. You regenerate it with envguard lock generate or update specific packages with envguard lock update — package <name>.

In CI, you can run envguard lock check to verify that the lock file is up-to-date with your source requirements. It exits with code 13 if stale, which you can use to fail builds that might have inconsistent dependencies.

What It Doesn’t Do (And Why)

EnvGuard has deliberate limitations that are worth understanding:

It doesn’t intercept unmanaged launches. If you run python train.py directly, EnvGuard doesn’t see it. Only commands routed through envguard run get validated. This is by design — EnvGuard is opt-in, not a system-wide interceptor that could break other workflows.

It doesn’t support Windows. The codebase uses POSIX-specific APIs throughout (os.access() for permissions, list-form subprocess arguments, /tmp paths). Adding Windows support would require a parallel implementation of the platform layer, and I don’t use Windows enough to maintain that. WSL2 works if you need it.

It doesn’t make CUDA work on macOS. Apple Silicon physically cannot run NVIDIA CUDA. EnvGuard detects CUDA dependencies and blocks them with a clear error, but it can’t magically add CUDA support where none exists.

It doesn’t auto-activate environments on directory change. If you want cd my-project to automatically activate the right venv, use direnv. EnvGuard’s shell hooks are minimal and opt-in — they just load the integration, not the environments themselves.

These limitations are documented in the repo’s docs/limitations.md and tracked as architectural decisions in docs/adrs/.

Installation and Getting Started

If you’re on macOS 12+ (Monterey) or Linux, installation is straightforward:

pip install envguard-tool

The PyPI package is named envguard-tool because envguard was taken. The CLI command and Python import are both just envguard.

For macOS, there’s also a bootstrap script that installs shell hooks and the LaunchAgent for automatic update checking:

git clone https://github.com/rotsl/envguard.git
cd envguard
bash scripts/bootstrap.sh

Once installed, verify it works:

envguard - version
envguard doctor

Then initialize any Python project:

cd /path/to/your/project
envguard init
envguard run - python your_script.py

Why I Built This (And Who It’s For)

I built EnvGuard for myself, primarily. I’m a researcher working on machine learning for biology — specifically computer vision for fungal pathogen analysis. I work on Apple Silicon Macs. I collaborate with people on Linux servers. I deal with PyTorch, TensorFlow, JAX, and a lot of scientific Python packages with complex native dependencies.

I was tired of debugging environment issues that had nothing to do with my actual research. I wanted a tool that would catch problems before they cost me hours of training time or corrupted experimental results.

EnvGuard is for Python developers who:

Work on macOS (especially Apple Silicon) and are tired of Rosetta/architecture surprises
Need MPS acceleration for PyTorch and want to know when it’s not actually available
Collaborate across different machines and need reproducible environments
Are tired of “works on my machine” and want validation that happens before execution
Prefer CLI tools that integrate into existing workflows rather than replacing them entirely

It’s not for everyone. If you’re a web developer working with simple Python environments, you probably don’t need this. If you’re on Windows, this won’t help you (yet). If you want a full IDE-like environment manager with GUI buttons, look elsewhere.

But if you’re doing scientific Python or ML and you’ve ever lost a day to a broken environment that you didn’t know was broken until it was too late — EnvGuard might save you some pain.

Python environment management has been broken for a long time. We’ve accepted “it works on my machine” as an inevitable part of the development experience. We’ve normalized spending hours debugging issues that have nothing to do with our actual code.

I don’t think it has to be this way. EnvGuard is my attempt to bring some of the safety and validation we expect from production systems (preflight checks, reproducible builds, clear error messages) to the messy world of Python development.

It’s not perfect. It’s alpha software with known limitations. But it’s already saved me hours of debugging, and I hope it can do the same for you.

If you’re tired of environment surprises, give it a try. Run pip install envguard-tool, run envguard doctor on your project, and see what it finds. You might discover that your “working” environment has been quietly broken in ways you never noticed.

Health AI on Notion with Tribe V2

RoTSL — Thu, 02 Apr 2026 15:16:29 +0000

Local-first Notion health tracker with TRIBEv2 brain analysis, AI health insights, symptom logging, goals, medications, appointments, and a browser UI

Notion MCP Challenge*

This was supposed to be a Notion challenge submission.

I built most of it close to the deadline, got something working, and then missed the window. No big failure story. Just underestimated how long the messy parts would take.

After that, keeping it private felt pointless. So I pushed it to GitHub.

Around the same time, I came across Tribe v2. That changed how I looked at this project. Instead of treating it like a failed submission, I started treating it like something that could keep evolving in public.

That is what this is now. Not finished. Still useful.

The actual problem I was trying to solve

I sometimes already track things in Notion:

• Sleep

• Workouts

• Random notes about how I feel

The problem is not tracking. It is what happens after.

Nothing.

No aggregation. No patterns. No feedback loop. Just logs sitting there.

Every week I would think I should look at it properly. I never did.

So this project is basically me outsourcing that thinking step.

System design

The architecture is simple on paper and annoying in practice.

Pipeline

• Fetch data from Notion databases

• Normalize it into a consistent structure

• Send it to an LLM

• Write the output back into Notion

That is it. No fancy orchestration.

The difficulty is everything in between.

Notion is not a real database

At first glance, Notion feels structured. It is not.

Things that break over time:

• Property names change

• Data types shift

• Fields get added or removed

If you build with fixed schemas, your system breaks quietly.

What I did instead

I treated Notion as semi structured data:

• Map fields dynamically instead of hardcoding

• Use fallback parsing when fields do not match

• Normalize everything into an internal schema

Example internal format:

{
. "date": "2026–03–20",
. "sleep_hours": 6.5,
. "workout": "strength",
. "mood": "low"
}

No matter how messy the source is, the model only sees this cleaned version.

Data normalization is the real system

Most of the work went here.

Steps

Extract raw values from Notion API
Convert them into usable types
Handle missing or inconsistent fields
Align everything by time

Examples:

• "6 hrs" becomes 6.0
 • Empty fields get dropped from inference
 • Mixed labels get standardized

If this layer is weak, everything downstream gets worse.

LLM layer

The model is not used as a general assistant.

It has a narrow job:

• Summarize recent data

• Spot simple patterns

• Suggest small adjustments

Input structure

Each run includes:

• Recent data window

• Aggregated values

• Instructions that limit scope

Example:

Sleep: [6, 5.5, 7, 6]
Workout: [yes, no, yes, yes]
Mood: [low, medium, medium, high]

Task:

Identify patterns

Avoid assumptions without enough data

State uncertainty clearly

The main issue: the model guesses

Even with weak data, it tries to sound confident.

That is a problem, especially for anything health related.

What I added

• Minimum data thresholds before running inference

• Prompts that force uncertainty

• Restrictions on long term claims

• Filtering outputs that sound too certain

It still makes mistakes. It just makes fewer confident ones.

Writing results back to Notion

Outputs are stored as:

• Daily summaries

• Weekly insights

• Separate logs for traceability

Each output includes:

• Timestamp

• Data window used

• Generated insight

This makes it easier to debug and iterate.

Why I stayed inside Notion

I considered building a separate app.

That would solve a lot of problems:

• Cleaner schema

• Better validation

• Fewer edge cases

But nobody wants another health app.

Notion already has the data. So I built on top of it instead.

The tradeoff is dealing with inconsistency.

Influence from Tribe v2

This project shifted direction after I came across Tribe v2.

The main idea that stuck:

You do not wait until something feels ready.

You ship it. Then improve it in the open.

That is exactly what this repo reflects. Some parts are solid. Some are clearly not. That is fine.

What is still broken

A few things are still rough:

• Sparse data leads to weak outputs

• The model confuses correlation with causation

• Some insights sound better than they are

• No feedback loop yet to measure usefulness

The system works. It just does not always matter.

What I would change

If I rebuilt/rework this:

• Define a stricter schema earlier

• Separate ingestion and AI layers properly

• Add better logging from day one

• Focus more on actionable insights, not just observations

Where this could go

A few directions that feel real:

• Long term memory instead of short windows

• Feedback loops to track if suggestions help

• Wearable integrations

• Confidence scoring for outputs

Or it might just stay like this. A small layer that makes Notion slightly smarter.

Closing

Missing the deadline changed the trajectory of this project.

If I had submitted it, I probably would have moved on.

Instead, it is now something I can keep improving without pretending it is finished.

Right now, it is useful enough to keep using.

That is enough.

Repo: https://github.com/rotsl/notion-Health-AI

☕ Pot.OF — AI-Powered HTCPCP Coffee Pot

RoTSL — Thu, 02 Apr 2026 09:03:18 +0000

This is a submission for the DEV April Fools Challenge

What I Built

Pot.OF is a playful HTCPCP/1.0 coffee pot simulator inspired by RFC 2324. It includes an interactive terminal, a full 418 I'm a Teapot tea-rejection flow, decaf kernel panic mode, and three optional AI features powered by Google Gemini: an AI Coffee Therapist, an AI Brew Critic, and an AI RFC Generator.

It solves no real problems, but it does let users argue with a coffee pot that has strong opinions.

Demo

Deployed app: pot-of
Video demo: Youtube

Code

Built with Next.js 16, TypeScript, Tailwind CSS 4, shadcn/ui, Framer Motion, Zustand, Prisma, and Google Gemini.

Repo link: Github

How I Built It

Built the app as a Next.js 16 App Router project with a single interactive coffee-pot interface and dedicated API routes for both protocol behavior and AI features.
Implemented 3 Gemini-powered AI endpoints:
- /api/htcpcp/ai-therapist — a sentient coffee pot therapist with a consistent personality, multi-turn chat, and coffee-themed advice
- /api/htcpcp/ai-critic — a dramatic coffee snob that generates absurd tasting notes and scores
- /api/htcpcp/ai-rfc — an RFC-style generator that creates fake HTCPCP protocol extensions with realistic formatting
Added a bring-your-own-key flow in the GUI so users can paste their own Gemini API key locally to unlock AI features without requiring a deployment-wide secret
Built 8 total API routes:
- 5 HTCPCP-inspired core routes for brewing, status, RFC display, teapot mode, and timing
- 3 AI routes for therapist, critic, and RFC generation
Added personality-driven UI behavior including pot moods like idle, brewing, happy, offended, existential, and decaf-panic
Implemented joke protocol interactions including:
- BREW tea -> full-screen 418 I'm a Teapot
- BREW decaf -> fake kernel panic
- RFC, STATUS, WHEN, PROPFIND, and other terminal commands
Used three generated visual assets for the coffee pot mascot, teapot artwork, and coffee cup imagery
Deployed it as a Vercel-friendly app with the AI key supplied by each user in the interface instead of hardcoding a shared secret

Prize Category

Best Google AI Usage

The app uses Google Gemini across three distinct feature types: conversational AI through the therapist, creative generation through the brew critic, and structured document generation through the RFC generator. AI is not a side widget here; it is part of the product’s personality.

Best Ode to Larry Masinter

The project is built around RFC 2324, including the legendary 418 I'm a Teapot, HTCPCP-style commands, and a coffee pot that takes the protocol far too seriously.

From Kidney Stones to Convergence

RoTSL — Sat, 28 Mar 2026 08:16:21 +0000

The strange path from ultrasound physics to rethinking how solvers move through space

I didn’t expect this to start with kidney stones, but that’s honestly where it began.

I was reading about ultrasound lithotripsy, how they break stones using focused waves, and I got stuck on the geometry of it. Ellipses, focal points, energy landing exactly where it needs to.

It is one of those cases where physics feels less like equations and more like choreography.

That idea just sat there for a while.

Then, separately, I was dealing with solver code. Big systems, messy residuals, the usual “why is this not converging” loop. At some point I stopped thinking in terms of matrices. The system started to feel like a place.

Some parts resisted everything, like trying to push something heavy across rough ground. Other parts moved too easily and felt unstable. Residuals stopped feeling abstract and started feeling like forces pushing things out of balance.

That is roughly where PICD came from.

PICD does not try to replace anything. It wraps what already works.

GMRES, CG, Newton – Krylov, BDF. They still do the actual solving. PICD just watches what is happening and keeps some memory: residual history, how the system is partitioned, how different parts relate to each other.

Then it adjusts the setup for the next solve. Preconditioners, damping, small corrections. Carefully.

There is a hard boundary it does not cross. If a step does not reduce the residual, it does not count. The usual acceptance rules still apply.

The “conic” part is just how the system gets split up.

Instead of one big vector, you break it into regions. Each one tracks its own behavior. Its residual pattern, its neighbors, what worked last time.

It sounds heavier than it feels. In practice it just gives the solver a bit of context it did not have before.

The unusual part is treating those regions like they have physical properties.

It sounds heavier than it feels. In practice it just gives the solver a bit of context it did not have before.

Underneath all that is a graph.

Connections between regions depend on how similar their residuals are, how often they activate together, and the actual structure of the problem. From that you get a Laplacian:

L = D -W

It does not replace the solver. It just helps decide what should be grouped together and what should be prioritized.

The solve loop itself is pretty normal:

Pick a solver, partition, build state, adjust preconditioner, run, accept or reject, update.

The results are interesting.

Everything in the current validation set runs. 98 tests, 22 examples.

On direct comparisons, same solver with and without PICD, the PICD version is faster in the published benchmark set and uses less memory there as well.

Linear problems stand out the most. Most cases improve, sometimes by a lot. There is a Helmholtz example that jumps by hundreds of times faster.

Nonlinear and time-dependent cases are less clean. Some improve. Some do not. There is a turbulence example that clearly gets worse, with more rejected steps and slower runtime.

That part I trust more than the wins.

If there is one thing I would keep in mind, it is that PICD is deliberately limited in what it claims.

It works well in same-method comparisons. Beyond that, it depends. It does not assume every physics-inspired term helps, and the controller can reduce or disable them when they start hurting convergence.

I still come back to that original picture of energy being guided instead of forced.

That is really what this is. Instead of brute-forcing convergence, you reshape the space a little so the solver has an easier path.

But it changes how you think about the problem. And for me, that shift was the interesting part.

Read more on my reasearch here and cite it if you find it useful : https://doi.org/10.13140/RG.2.2.10721.06243

Your LLM prompts are probably wasting 90% of tokens. Here’s how I fixed mine.

RoTSL — Sun, 22 Mar 2026 13:04:10 +0000

I keep running into the same problem with LLM apps.

This work is based on my previous article on dev.to https://dev.to/rotsl/contextfusion-the-context-brain-your-llm-apps-are-missing-2gkm

You build a retrieval pipeline, hook it up to an API, and then quietly ship prompts that are full of stuff the model doesn’t need. Extra chunks. Duplicates. Half-relevant context that just bloats everything.

And you pay for all of it.

CFAdv is basically an attempt to stop doing that.

It builds on context-fusion, but adds something that turns out to matter more than I expected: even if you pick the right context, you can still mess it up by putting it in the wrong place.

Most pipelines are still doing this

Let’s be honest about the default pattern:

chunks = retriever.top_k(query, k=5)
prompt = "\n\n".join(chunks)
response = llm(prompt)

That’s it.

No budget. No filtering beyond retrieval. No thought about ordering.

More context is assumed to be better. It often isn’t.

CFAdv splits the problem in two

Instead of one “context step”, it does two separate things:
1. Decide what gets in
2. Decide where it goes

That separation is the whole point.

Step 1: selecting context under a budget

Instead of top-k, CFAdv treats selection like an optimization problem.

Each chunk gets a score based on things like:
• relevance
• trust
• freshness
• diversity
• token cost

Then it tries to pick the best combination under a fixed token budget.

At a high level:

def value(chunk):
    utility = (
        0.25 * chunk.relevance +
        0.20 * chunk.trust +
        0.15 * chunk.freshness +
        0.15 * chunk.structure +
        0.15 * chunk.diversity
    )
    risk = (
        0.40 * chunk.hallucination +
        0.35 * chunk.staleness +
        0.25 * chunk.privacy
    )
    return utility - risk

Then rank by value density:

density = value(chunk) / max(chunk.tokens, 1)

And greedily pack until you hit the budget.

The small trick that makes a big difference

There’s a simple filter before any of that:

floor = max_score * 0.15
selected = [c for c in candidates if c.score >= floor]

Anything below 15% of the best chunk just gets dropped.

That sounds minor, but it changes behavior a lot.
• If your data is clean, everything stays
• If it’s noisy, most of it disappears

So you don’t fill your prompt with mediocre content just because you have space.

Step 2: ordering for attention

This is the part I underestimated.

Even if you pick the right chunks, models don’t treat all positions equally. Stuff at the start tends to get more attention than stuff buried in the middle.

So CFAdv reorders the selected chunks based on similarity to the query.

Basic version:

def cosine(a, b):
    return (a @ b) / (norm(a) * norm(b))

scores = [cosine(embed(query), embed(chunk)) for chunk in chunks]
weights = softmax(scores)

ordered = [chunk for _, chunk in sorted(
    zip(weights, chunks),
    reverse=True
)]

Higher weight goes earlier in the prompt.

No embeddings API required

Instead of calling an external model, it uses a simple hashed bag-of-words vector.

import hashlib
import numpy as np
import re

def embed(text, dim=64):
    vec = np.zeros(dim)
    tokens = re.findall(r"\b\w+\b", text.lower())

    for t in tokens:
        h = int(hashlib.sha256(t.encode()).hexdigest(), 16)
        vec[h % dim] += 1.0
        vec[(h >> 16) % dim] += 0.5

    return vec / (np.linalg.norm(vec) + 1e-8)

It’s not fancy. No positional info, no learned weights. But for short chunks it works surprisingly well.

Two levels of ordering

There’s also a second layer.

Instead of treating everything as one list, CFAdv groups context into blocks:
• system
• history
• retrieval
• tools

Then it does:
1. sort chunks inside each block
2. sort the blocks themselves

Sketch:

# intra-block
for block in blocks:
    block.chunks.sort(key=lambda c: similarity(query, c), reverse=True)

# cross-block
block_scores = {
    block: similarity(query, mean_embed(block.chunks))
    for block in blocks
}

ordered_blocks = sorted(blocks, key=lambda b: block_scores[b], reverse=True)

So you end up shaping the whole prompt, not just shuffling pieces.

The full pipeline

CFAdv is an 8-stage pipeline, but it’s easier to think of it like this:

docs = ingest(files)
blocks = normalize(docs)
variants = represent(blocks)

candidates = retrieve(query, variants)
selected = plan(candidates, budget=120)

ordered = attention_fuse(query, selected)
packet = assemble(ordered)

prompt = compile(packet, mode="qa")

Each step is stateless. That makes it easier to test and reason about.

What happens in practice

You can cut most of the prompt without losing the answer, as long as:
• retrieval pulls in some noise
• there is redundancy
• the query only needs a subset of the data

If everything is relevant, the system mostly leaves it alone.

If only one chunk survives selection, ordering doesn’t matter.

Where this actually helps

This kind of pipeline shines when:
• your retrieval step is messy
• you’re concatenating multiple documents
• prompts are long enough for attention effects to matter

If you already have clean, minimal context, you won’t see much change.

The part that stuck with me

This isn’t really about attention or embeddings.

It’s about treating prompt assembly as something worth optimizing.

Right now most systems act like prompts are just containers. You throw things in and hope the model figures it out.

CFAdv flips that.

It asks a simple question: what is the smallest amount of context that still works?

Then it enforces it.

And once you start thinking that way, it’s hard to go back to dumping chunks into a string and calling it a day.

Try it yourself

If you want to see how this works in practice or plug it into your own workflow:

GitHub repo
Contains the full Python library, CLI, benchmarks, and tests. You can run it locally, inspect the pipeline stages, or integrate it into your own RAG setup.
Live demo
Lets you compare raw prompts vs CFAdv-compiled prompts side by side. Useful for quickly seeing how much context gets removed and how ordering changes.

If you’re already using retrieval + concatenation, the repo is the easiest place to start. Swap your prompt assembly step with CFAdv’s planner + fusion stages and see what drops out.

DEV Community: RoTSL

Bypassing the OS to Run LLMs: What I Learned Building a Firmware-Centric Runtime

The Stack Is Thicker Than You Think

What “Firmware-Equivalent” Actually Means

The Hardware Reality

A Minimal Example

Why This Exists

Where It Goes From Here

Final Thought

Google I/O 2026 dropped a bomb on Android tooling, and nobody's talking about it (or maybe they are 😅)

The big picture: Agents aren't a demo anymore

Antigravity 2.0: What actually changed under the hood

Dynamic subagents

Asynchronous task management and scheduled tasks

JSON hooks

Projects, not repositories

Slash commands

Voice input

The SDK

Pricing and availability

Android CLI 1.0: The announcement that actually changes things

The android studio command

Why the token numbers are real

Agent Skills

Journeys: agent-driven UI testing

Installation

Getting started: what to try first

What to expect

What I actually think

The Ulysses Prediction Engine

The Ulysses Prediction Engine: How I Built a Self‑Optimizing, Noise‑Proof Oracle That Learns Almost Anything

The Big Idea: A Theorem Inside a Theorem Inside a Theorem

Layer 1 – The Inner Guarantee: Universal Bayesian Prediction

Layer 2 – Error Correction: Bayesian Filtering

Layer 3 – The Meta‑Optimiser: Never Stop Tuning

2. Behind the App: What You’re Actually Seeing

3. What Happens When You Point It at Real Data

Temperature Forecasting (Berlin‑Tegel, 10 years)

Financial Volatility (S&P 500, 2020 – 2024)

Epileptic Seizure Prediction (CHB‑MIT Database)

4. Why This Architecture Is Fundamentally Different

5. Limitations (Let’s Be Honest)

6. What’s Next

7. Try It Yourself

References

SMGP [Spectral Memory Graph Processor] : Building an AI That Actually Remembers What You Told It

SMGP: Building an AI That Actually Remembers What You Told It

How It’s Put Together

The Math (Or: Why This Actually Works)

Hyperdimensional Memory

Spectral Graph Processing

Topological Persistence for Not Forgetting Things

Category-Theoretic Graph Rewriting for Reasoning

The Software Side

The Hardware Accelerator (SMGPU)

Microarchitecture

Compute Engines

Memory Subsystem

Interconnect

Instruction Set

Benchmarks

Software (Pure Python)

Hardware Acceleration (FPGA @ 250 MHz)

FPGA Resource Utilization (Alveo U280, 16×16 systolic)

Testing

Running It in the Cloud

What’s Next

What Happens When You Try to Reverse Biology? A Deep Look at the Protein DNA Analysis Simulator

Most biology tools move in one direction. DNA becomes RNA. RNA becomes protein.

Why reverse biology is harder than it sounds

The simulator starts with a simple but useful idea

The protein analysis page adds a second layer

Protein sequences carry clues, not complete answers

The protein analysis page changes how users think

The live simulation version feels more interactive

The live version works like a live sequence interpreter

The live simulator combines several biological layers

Reverse translation exposes a hidden truth about biology

The project feels closer to computational biology than traditional teaching software

Inspiration from published Science research

The `android studio` command