DEV Community: Pavel Espitia

Building a Local-Only RAG System with Ollama and TypeScript

Pavel Espitia — Mon, 25 May 2026 14:47:05 +0000

Building a Local-Only RAG System with Ollama and TypeScript

Most RAG tutorials send your private documents to OpenAI. Here's how to keep them on your laptop.

This post walks through a complete Retrieval-Augmented Generation pipeline that runs entirely on your machine. No API keys, no third-party calls, no monthly bill. Two hundred lines of TypeScript and a single binary.

What you'll build

A command-line tool that:

Indexes a folder of .md or .txt files into a local vector store.
Answers questions about those files using a local LLM.
Cites which documents the answer came from.

By the end, you'll be able to point it at your engineering wiki, your personal notes, or your codebase, and ask questions in natural language without anything leaving your machine.

The stack

Ollama — runs the LLM and the embedding model.
@xenova/transformers — fallback embedding library if you don't want a second Ollama model.
sqlite-vec — SQLite extension that adds vector similarity search. Tiny, fast, no separate database server.
TypeScript + Node 22 — gluing it together.

Why SQLite over Chroma or Qdrant? For collections under a million chunks, SQLite is faster, simpler to deploy, and doesn't need a daemon. Your "vector database" is one file.

Setup

ollama pull nomic-embed-text       # the embedding model
ollama pull qwen2.5:7b             # the answer model

pnpm add better-sqlite3 sqlite-vec

Step 1: chunk and embed documents

import fs from "node:fs";
import path from "node:path";

function chunk(text: string, size = 800, overlap = 100): string[] {
  const sentences = text.split(/(?<=[.!?])\s+/);
  const chunks: string[] = [];
  let buffer = "";
  for (const s of sentences) {
    if ((buffer + " " + s).length > size && buffer) {
      chunks.push(buffer.trim());
      buffer = buffer.slice(-overlap) + " " + s;
    } else {
      buffer = buffer ? buffer + " " + s : s;
    }
  }
  if (buffer) chunks.push(buffer.trim());
  return chunks;
}

async function embed(text: string): Promise<number[]> {
  const r = await fetch("http://localhost:11434/api/embeddings", {
    method: "POST",
    body: JSON.stringify({ model: "nomic-embed-text", prompt: text }),
  });
  const json = await r.json();
  return json.embedding;
}

nomic-embed-text returns 768-dimensional vectors. Fast enough that you can re-index a thousand-document corpus in a few minutes.

Step 2: store in SQLite

import Database from "better-sqlite3";
import * as sqliteVec from "sqlite-vec";

const db = new Database("rag.db");
sqliteVec.load(db);

db.exec(`
  CREATE TABLE IF NOT EXISTS chunks (
    id INTEGER PRIMARY KEY,
    source TEXT NOT NULL,
    content TEXT NOT NULL
  );
  CREATE VIRTUAL TABLE IF NOT EXISTS vec_chunks USING vec0(
    id INTEGER PRIMARY KEY,
    embedding FLOAT[768]
  );
`);

async function indexFile(filePath: string) {
  const text = fs.readFileSync(filePath, "utf8");
  const pieces = chunk(text);
  for (const piece of pieces) {
    const insertChunk = db.prepare(
      "INSERT INTO chunks (source, content) VALUES (?, ?)"
    );
    const result = insertChunk.run(filePath, piece);
    const vec = await embed(piece);
    db.prepare(
      "INSERT INTO vec_chunks (id, embedding) VALUES (?, ?)"
    ).run(result.lastInsertRowid, JSON.stringify(vec));
  }
}

Step 3: search

async function search(query: string, k = 4) {
  const queryVec = await embed(query);
  const rows = db.prepare(`
    SELECT chunks.source, chunks.content, vec_chunks.distance
    FROM vec_chunks
    JOIN chunks ON chunks.id = vec_chunks.id
    WHERE vec_chunks.embedding MATCH ?
    ORDER BY distance
    LIMIT ?
  `).all(JSON.stringify(queryVec), k) as Array<{
    source: string;
    content: string;
    distance: number;
  }>;
  return rows;
}

MATCH triggers sqlite-vec's cosine similarity. Sub-millisecond on small corpora.

Step 4: ask the LLM

async function ask(question: string) {
  const matches = await search(question, 4);

  const context = matches
    .map((m, i) => `[${i + 1}] ${m.source}\n${m.content}`)
    .join("\n\n---\n\n");

  const prompt = `Answer the question using only the context provided.
If the answer is not in the context, say so.
Cite sources by their number in square brackets.

CONTEXT:
${context}

QUESTION: ${question}

ANSWER:`;

  const r = await fetch("http://localhost:11434/v1/chat/completions", {
    method: "POST",
    body: JSON.stringify({
      model: "qwen2.5:7b",
      messages: [{ role: "user", content: prompt }],
      stream: false,
    }),
  });
  const json = await r.json();
  return {
    answer: json.choices[0].message.content,
    sources: matches.map((m) => m.source),
  };
}

Putting it together

// Index a folder
const files = fs.readdirSync("./notes").map((f) => path.join("./notes", f));
for (const f of files) await indexFile(f);

// Ask
const result = await ask("What did we decide about the auth refactor?");
console.log(result.answer);
console.log("Sources:", result.sources);

Total runtime, indexing 500 markdown files: about three minutes on an M2 MacBook. Per-question latency: under two seconds.

Where this matters

If your team's documentation has grown past the point where anyone reads it cover to cover (about a hundred pages), local RAG turns that wiki back into something useful. Same applies to:

Codebases — answer "where is the rate limiter implemented?"
Customer support archives — answer "what's our refund policy?"
Research notes — answer "what did I write about X six months ago?"
Legal documents — answer "what does our MSA say about indemnification?"

Last bullet matters: every legal-tech startup right now is building a cloud version of this. Yours runs on your laptop.

Tuning that actually pays off

Chunk size 800-1200 chars is the sweet spot. Smaller chunks lose context. Larger ones dilute relevance.
Overlap 10-15 percent of chunk size catches sentences split mid-thought.
Re-rank top-k with a cross-encoder if precision matters more than speed. Adds 100ms but often jumps relevance from 70 to 90 percent.
Cache embeddings keyed by content hash so re-indexing is incremental.

What's next

The previous post in this series covered function calling. Combining function calling with RAG gives you a local agent that can read your documents and take actions: "draft an email to legal summarising what our MSA says about data residency" — read MSA chunks, compose draft, call the email tool.

That's a real assistant. And nothing leaves your machine.

Next post: streaming Ollama responses through Server-Sent Events in Next.js, the production pattern for live UIs.

Streaming Ollama Responses in Next.js: The SSE Pattern That Actually Works

Pavel Espitia — Mon, 18 May 2026 21:52:45 +0000

Streaming Ollama Responses in Next.js: The SSE Pattern That Actually Works

Most Next.js + Ollama tutorials show a single await fetch and call it a day. The user types a question, waits eight seconds, and a wall of text appears. That's a bad UX.

Real LLM apps stream tokens as they're generated. The user sees a response materialise word by word, just like ChatGPT. This post shows how to build that on Next.js 15 App Router with Ollama as the backend, using Server-Sent Events. Production-ready in under a hundred lines.

Why SSE and not WebSocket

The tradeoffs:

	SSE	WebSocket
One-way (server → client)	✓	also bi-directional
Auto-reconnect built in	✓	implement yourself
Plain HTTP, no upgrade	✓	requires upgrade handshake
Works through proxies	✓	sometimes blocked
Streaming overhead	minimal	small frame overhead

For LLM streaming, you only need server → client. SSE wins on simplicity. WebSocket is overkill until you need bidirectional streaming (voice, real-time collaboration, tool-call dialogues).

The architecture

Browser → /api/chat (Next.js Route Handler) → Ollama (localhost:11434)
                ↑
                emits SSE chunks back to the browser as Ollama produces tokens

Three pieces:

Server route — pipes Ollama's stream into the response.
Client hook — reads the stream and updates state.
UI — renders the materialising text.

Server: the route handler

app/api/chat/route.ts:

export async function POST(request: Request) {
  const { message } = await request.json();

  const ollama = await fetch("http://localhost:11434/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "qwen2.5:7b",
      messages: [{ role: "user", content: message }],
      stream: true,
    }),
  });

  if (!ollama.ok || !ollama.body) {
    return new Response("upstream error", { status: 502 });
  }

  const stream = new ReadableStream({
    async start(controller) {
      const reader = ollama.body!.getReader();
      const decoder = new TextDecoder();
      const encoder = new TextEncoder();
      let buffer = "";

      try {
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;
          buffer += decoder.decode(value, { stream: true });

          const lines = buffer.split("\n");
          buffer = lines.pop() ?? "";

          for (const line of lines) {
            if (!line.trim()) continue;
            try {
              const obj = JSON.parse(line);
              if (obj.message?.content) {
                const sseChunk = `data: ${JSON.stringify({
                  delta: obj.message.content,
                })}\n\n`;
                controller.enqueue(encoder.encode(sseChunk));
              }
              if (obj.done) {
                controller.enqueue(
                  encoder.encode(`data: ${JSON.stringify({ done: true })}\n\n`)
                );
              }
            } catch {
              // ignore non-JSON lines
            }
          }
        }
      } finally {
        controller.close();
      }
    },
  });

  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache, no-transform",
      Connection: "keep-alive",
      "X-Accel-Buffering": "no",
    },
  });
}

Two details that matter:

stream: true in the Ollama call. Without it, Ollama returns one big response after the whole generation finishes.
X-Accel-Buffering: no header. If you deploy behind nginx or a CDN that buffers responses, this disables it for SSE specifically. Without it, you'll see chunks arrive in a burst at the end.

Client: the hook

import { useState } from "react";

export function useChatStream() {
  const [response, setResponse] = useState("");
  const [loading, setLoading] = useState(false);

  async function send(message: string) {
    setResponse("");
    setLoading(true);

    const r = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message }),
    });

    if (!r.body) {
      setLoading(false);
      return;
    }

    const reader = r.body.getReader();
    const decoder = new TextDecoder();
    let buffer = "";

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split("\n\n");
      buffer = lines.pop() ?? "";

      for (const line of lines) {
        if (!line.startsWith("data: ")) continue;
        const json = JSON.parse(line.slice(6));
        if (json.delta) {
          setResponse((prev) => prev + json.delta);
        }
        if (json.done) {
          setLoading(false);
        }
      }
    }
  }

  return { response, loading, send };
}

That's it for the streaming logic. Calling send("hello") updates response token by token.

UI: the chat box

"use client";
import { useState } from "react";
import { useChatStream } from "./useChatStream";

export default function Chat() {
  const [input, setInput] = useState("");
  const { response, loading, send } = useChatStream();

  return (
    <div className="max-w-2xl mx-auto p-4 space-y-4">
      <div className="min-h-[200px] p-4 border rounded whitespace-pre-wrap">
        {response || (loading ? "thinking..." : "ask me anything")}
      </div>
      <form
        onSubmit={(e) => {
          e.preventDefault();
          send(input);
          setInput("");
        }}
        className="flex gap-2"
      >
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          className="flex-1 px-3 py-2 border rounded"
          placeholder="Ask Ollama..."
        />
        <button
          type="submit"
          disabled={loading || !input}
          className="px-4 py-2 bg-blue-600 text-white rounded disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </div>
  );
}

Run pnpm dev, hit the page, and watch tokens appear in real time.

Production-grade additions

The skeleton above works locally. To ship it:

Authentication. Add an auth check in the route handler before opening the upstream stream. Otherwise anyone with your URL can burn your local CPU.
Conversation history. The handler above takes a single message. Real chat sends the full history each time. Pass messages: ChatMessage[] and forward to Ollama.
Cancellation. When the user navigates away, abort the upstream fetch. Pass an AbortController.signal and call controller.abort() on disconnect.
Backpressure. If your client is slow, the controller's queue grows. Use controller.desiredSize to detect this and pause reading from Ollama.
Vercel deployment. Edge Runtime works for this pattern but has a 30-second function timeout. For longer generations, use Node Runtime or self-host. Local models running on your dev machine are obviously not callable from Vercel — for production, you'd swap Ollama for a managed inference endpoint.

Why this matters

Once tokens stream, your local LLM stops feeling like a slow API and starts feeling like a real assistant. The perceived latency goes from "did it crash?" to "natural conversation."

Combined with the function calling and RAG patterns from earlier in this series, this is the third piece of a real local AI stack. Streaming chat over local data with local tools, all on your laptop.

That stack didn't exist as a viable production option two years ago. In 2026 it's a hundred and fifty lines of TypeScript.

Foundry vs Hardhat in 2026: Which Solidity Toolchain Wins?

Pavel Espitia — Mon, 11 May 2026 12:55:13 +0000

Foundry vs Hardhat in 2026: Which Solidity Toolchain Wins?

Two toolchains. Same goal: write, test, and deploy Solidity. Different design philosophies, very different best-fits in 2026.

I've used both in production across smart-contract audits and protocol work over the last two years. Here's an honest comparison so you don't waste a week picking the wrong default for your team.

TL;DR

Foundry — best for security work, audits, protocol engineering, and anyone who values speed and Solidity-native tests. The default for serious DeFi.
Hardhat 3 — best when your contracts are tightly coupled to a TypeScript frontend or backend, when your team already lives in Node.js, or when you depend on plugins that haven't migrated.
Both at once — legitimate, common, not a smell. Many teams write tests in Foundry and deployments in Hardhat.

If you're starting a new protocol from scratch in 2026, default to Foundry. The rest of this post explains when the others are correct.

Installation and first impressions

Foundry

curl -L https://foundry.paradigm.xyz | bash
foundryup
forge init my-project

Three commands. Sub-30 seconds. You get forge (compiler / test runner), cast (CLI for chain interaction), and anvil (local node). All written in Rust. All single binaries. No package.json, no node_modules.

Hardhat 3

mkdir my-project && cd my-project
pnpm init
pnpm add -D hardhat
npx hardhat --init

Hardhat 3 (released late 2025) is a substantial rewrite. The new toolchain runs on a Rust-based execution layer (effectively REVM), bringing the test speed within striking distance of Foundry. It also natively supports Solidity tests, which were exclusive to Foundry until last year.

This matters: most "Foundry vs Hardhat" comparisons online are pre-Hardhat-3 and outdated.

Test speed (the headline metric)

Same Uniswap-V2-style contract, 80 unit tests, on an M2 MacBook:

Toolchain	Cold run	Warm run	Memory
Foundry	1.3 s	0.4 s	220 MB
Hardhat 3 (Solidity tests)	2.1 s	0.7 s	480 MB
Hardhat 2 (TypeScript tests via ethers)	18 s	9 s	1.2 GB

Hardhat 3 closed the gap. It's now within 2x of Foundry on equivalent test suites — versus the 10-20x penalty Hardhat 2 incurred. If your only reason for picking Foundry was speed, that argument is weaker in 2026.

Test ergonomics

Foundry — Solidity tests

// test/Counter.t.sol
pragma solidity ^0.8.20;
import "forge-std/Test.sol";
import "../src/Counter.sol";

contract CounterTest is Test {
    Counter c;

    function setUp() public {
        c = new Counter();
    }

    function test_increment() public {
        c.increment();
        assertEq(c.count(), 1);
    }

    function testFuzz_setNumber(uint256 x) public {
        c.setNumber(x);
        assertEq(c.count(), x);
    }
}

Tests live in the same language as the contracts. No type marshaling. No JS context switch. Fuzzing is a built-in keyword (testFuzz_) — no extra config.

Hardhat 3 — same test, two flavours

// test/Counter.t.sol — Solidity test
import "forge-std/Test.sol";
import "../src/Counter.sol";

contract CounterTest is Test {
    Counter c;
    function setUp() public { c = new Counter(); }
    function test_increment() public { c.increment(); assertEq(c.count(), 1); }
}

Hardhat 3 runs Foundry-compatible tests. Identical syntax. Either toolchain executes them.

// test/Counter.ts — TypeScript test (still supported)
import { expect } from "chai";
import { ethers } from "hardhat";

describe("Counter", () => {
  it("increments", async () => {
    const Counter = await ethers.getContractFactory("Counter");
    const c = await Counter.deploy();
    await c.increment();
    expect(await c.count()).to.equal(1);
  });
});

The TS path is for teams that want to integrate tests with frontend assertions or backend test suites. Slower to run, but the unified language can pay off in a full-stack codebase.

Plugin and ecosystem

	Foundry	Hardhat
Coverage	`forge coverage` (built-in)	`solidity-coverage` plugin
Verification	`forge verify-contract`	`hardhat-verify` plugin
Local node	`anvil` (built-in)	`hardhat-node` (built-in)
Mainnet forking	`forge --fork-url` (built-in)	`hardhat-network --fork` (built-in)
Contract upgrades	Manual proxies	`@openzeppelin/hardhat-upgrades`
Gas reports	`forge test --gas-report`	`hardhat-gas-reporter` plugin
Defender / monitoring	Manual scripting	`defender-cli` integration
Frontend type generation	None native (use `wagmi-cli` separately)	`typechain` (mature plugin)

Hardhat's plugin ecosystem is wider and more mature — particularly for OpenZeppelin upgrades and frontend integration. Foundry trades plugin breadth for built-in primitives that cover 80% of needs without extension.

Where each one wins

Use Foundry if

You're building a DeFi protocol, lending market, AMM, vault, or anything where a missed bug equals a wallet drain.
Your team's primary language is Solidity, not TypeScript.
You need fast invariant testing or stateful fuzzing as part of CI.
You write audits or work in security-adjacent roles. Auditors expect Foundry projects.
You don't want a node_modules graph.

Use Hardhat 3 if

Your contracts are part of a full-stack TypeScript app and your team lives in pnpm workspaces.
You depend on OpenZeppelin's upgrade plugin or other Hardhat-only tooling.
Your team has invested in TypeScript test patterns and the migration cost is high.
You ship to many networks with environment-specific deployment scripts and want the JS flexibility.

Use both (the senior move)

The pattern many serious teams adopt:

Tests in Foundry. Speed, fuzzing, invariant testing, Solidity-native ergonomics.
Deployments in Hardhat. Network-specific configs, OpenZeppelin upgrade plugin, TypeScript scripting.

This is supported by Hardhat 3 out of the box — it can read Foundry's foundry.toml, share build artifacts, and run mixed test suites. The boundaries between the two have softened significantly.

What changed in 2026

If you read a "Foundry vs Hardhat" post from 2024, it's probably out of date in three places:

Hardhat 3's Rust execution layer closed most of the speed gap.
Solidity tests in Hardhat removed the "Foundry is the only Solidity-native option" advantage.
Cross-tool interop — foundry.toml parsed by Hardhat, shared artifacts — turned the "pick one" decision into "pick which side of the workflow each tool handles."

The decision is no longer binary. It's about workflow phase.

My recommendation

Start with Foundry. It's the right default for the security-sensitive work that pays $10K+ a month. If your project grows into a full-stack codebase that needs Hardhat's plugin ecosystem, layer Hardhat 3 on top — they coexist cleanly.

The wrong move is picking Hardhat in 2026 because that's what 2022 tutorials taught, and then six months later trying to retrofit Foundry-level test speed into a deeply Hardhat-coupled codebase.

Pick the tool that matches the work you'll be doing in twelve months. For most protocol engineers in 2026, that's Foundry first.

Next post in the series: how to set up a CI pipeline for a Foundry project that runs invariant tests, gas reports, and coverage — under 50 lines of YAML.

Function Calling with Ollama: Make Your Local LLM Run Real Tools

Pavel Espitia — Fri, 01 May 2026 15:06:05 +0000

Function Calling with Ollama: Make Your Local LLM Run Real Tools

Most Ollama tutorials end at chat completion. The interesting stuff starts when the model can call your code.

Function calling is the protocol that lets an LLM say "I want to call getWeather(city: 'Bogotá')" instead of trying to fake the answer from training data. Cloud models like GPT and Claude have had it for over a year. Ollama supports it natively for compatible models. Almost nobody talks about it.

This post walks through a complete working example. End to end, two hundred lines of TypeScript.

Why function calling matters

Without it, your LLM is a closed system. It only knows what's in its training data. With it, the LLM becomes a planner that calls real APIs, queries databases, runs calculations, hits your code. That's the difference between "interesting demo" and "production-grade agent."

The trick: the LLM doesn't actually call your function. It returns a structured request, and your code decides whether to execute it. Always.

Which Ollama models support it

As of 2026:

qwen2.5:7b and larger — strong support
llama3.1:8b and larger — strong support
mistral-nemo — strong support
qwen2.5-coder:7b — works for technical functions
llama3.2:3b — limited, expect quirks

Pull the model:

ollama pull qwen2.5:7b

The minimum example

Define a tool. The schema follows the JSON Schema format the OpenAI API uses, so any existing tool you have works without translation.

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "Temperature unit",
          },
        },
        required: ["city"],
      },
    },
  },
];

Call Ollama with the tools attached:

const response = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5:7b",
    messages: [{ role: "user", content: "What's the weather in Bogotá?" }],
    tools,
    tool_choice: "auto",
  }),
});

const json = await response.json();
const message = json.choices[0].message;
console.log(message.tool_calls);

The response looks like this:

{
  "tool_calls": [
    {
      "id": "call_1",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"city\":\"Bogotá\",\"unit\":\"celsius\"}"
      }
    }
  ]
}

The model didn't fabricate a temperature. It told you exactly which function to call and with what arguments.

Closing the loop

Now your code executes the function and feeds the result back to the model so it can produce a natural-language answer.

async function getWeather(city: string, unit: string) {
  // Call your real weather API here. Returning a stub for the example.
  return { city, temperature: 19, unit, conditions: "partly cloudy" };
}

const toolCall = message.tool_calls[0];
const args = JSON.parse(toolCall.function.arguments);
const result = await getWeather(args.city, args.unit ?? "celsius");

// Send the result back to the model
const final = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5:7b",
    messages: [
      { role: "user", content: "What's the weather in Bogotá?" },
      message,
      {
        role: "tool",
        tool_call_id: toolCall.id,
        content: JSON.stringify(result),
      },
    ],
  }),
});

const finalJson = await final.json();
console.log(finalJson.choices[0].message.content);
// "The weather in Bogotá is currently 19°C and partly cloudy."

Two round trips, total. Local. Free.

Multiple tools, one prompt

Real agents use several tools. Define them all in the array; the model picks which to call.

const tools = [
  { type: "function", function: { name: "get_weather", ... } },
  { type: "function", function: { name: "search_web", ... } },
  { type: "function", function: { name: "send_email", ... } },
  { type: "function", function: { name: "query_database", ... } },
];

For "Email my team the weather forecast for tomorrow", the model will chain get_weather → send_email automatically. You get back two tool calls in the same response. Execute both, return the results, and the model produces the final summary.

Things that break, and how to handle them

The model invents argument values. Smaller models (3B and below) sometimes hallucinate fields. Defend against this with strict validation. I use Zod:

import { z } from "zod";

const WeatherArgs = z.object({
  city: z.string(),
  unit: z.enum(["celsius", "fahrenheit"]).default("celsius"),
});

const args = WeatherArgs.parse(JSON.parse(toolCall.function.arguments));

If parse throws, return the error to the model and let it retry. Surprisingly, it usually fixes itself on the second pass.

The model ignores the tool when it shouldn't. Some questions get a chat answer instead of a tool call ("What's the weather like?" without a city). Reword your prompt: "If the user asks about weather, always call get_weather. Ask the user for missing parameters before calling." Models follow this consistently.

The model calls a tool when it shouldn't. Less common but it happens. Add to the system prompt: "Only call a function if the user is asking for live data. For general questions, answer from your knowledge."

What this unlocks

You now have a local agent loop. The same pattern scales to:

File system tools — let the LLM read and write files in a sandbox.
Shell tools — execute commands and feed back the output.
Database tools — query and update your app's data.
API tools — wrap any REST endpoint as a function.

The LLM becomes the planner. Your code is the executor. Local LLMs are now legitimately useful for agent workflows, not just chat.

Next post in the series: building a local-only RAG system with Ollama, ChromaDB, and TypeScript. We'll combine retrieval with the function calling pattern from this post.

If you enjoy dissecting why systems break down, I make video case studies of historical engineering disasters at Why It Crashed — same first-principles approach, different domain. Latest: how five words on a foggy radio call killed 583 people on a runway in 1977.

Ollama vs LM Studio vs Jan: Which Local AI Runner Wins in 2026?

Pavel Espitia — Thu, 30 Apr 2026 20:30:33 +0000

Ollama vs LM Studio vs Jan: Which Local AI Runner Wins in 2026?

Three projects. Same goal: run LLMs on your laptop. Different design philosophies, very different best-fits.

I've used all three in production over the last six months. Here's an honest comparison so you don't waste a weekend picking the wrong one.

TL;DR

Ollama — best for developers who want a CLI and an HTTP API. The default for engineers.
LM Studio — best for non-developers and researchers who want a polished GUI.
Jan — best if open-source-everything matters and you want a ChatGPT-like UI you fully own.

If you're shipping code that calls a local LLM, pick Ollama. The rest of this post explains why and when the others are correct.

Installation

Ollama

curl -fsSL https://ollama.com/install.sh | sh

A single binary, runs as a background daemon, exposes a REST API on localhost:11434. Done in thirty seconds.

LM Studio

Desktop installer (.dmg / .exe / .AppImage). After install, open the app, click "Discover", search for a model, click download. GUI-first.

Jan

Desktop installer like LM Studio, but the UI is more chat-focused. After install, you import GGUF models manually or download from their model hub.

Model library

	Ollama	LM Studio	Jan
Models in registry	200+	1000+ via HuggingFace	Curated, smaller list
One-line pull	`ollama pull llama3.1`	UI search	UI search
Custom GGUF	`Modelfile` import	Drag-and-drop	File copy
Auto-quant selection	✓	Manual	Manual

Ollama's Modelfile system is the most developer-friendly way to package a model with its parameters and prompt template. It's roughly to local LLMs what Dockerfile is to containers.

LM Studio wins on raw breadth — it can run anything HuggingFace has, with manual quantisation. If you're researching obscure models, that matters.

API surface

Ollama

REST API, fully OpenAI-compatible:

const r = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  body: JSON.stringify({
    model: "qwen2.5-coder:7b",
    messages: [{ role: "user", content: "Hello" }],
    stream: true,
  }),
});

Drop-in for any OpenAI SDK by changing the base URL.

LM Studio

Also exposes an OpenAI-compatible server, but you have to start it manually from the GUI ("Local Server" tab → "Start Server"). Easier to forget.

Jan

OpenAI-compatible. Toggleable from the GUI.

If you're integrating into an existing app, all three speak the same dialect at the API layer. The difference is whether the server is a daemon (Ollama) or requires a GUI to be running (LM Studio, Jan).

Performance

I tested qwen2.5-coder:7b on a 32 GB MacBook M2 Pro. Same prompt, same temperature, same context window, three runs each, taking the median.

	Ollama	LM Studio	Jan
Tokens/sec	38	35	32
Time to first token	240 ms	280 ms	350 ms
Idle RAM	200 MB	1.4 GB	2.1 GB

Ollama is the lightest at idle because it can unload the model when not in use and reload on demand. LM Studio and Jan keep the model resident as long as the app is open.

For long-running coding sessions on a 16 GB machine, that idle-RAM difference matters. Ollama's lazy loading is the reason I run it on my older MacBook.

Where each one wins

Use Ollama if

You write code that calls local LLMs (90 percent of developers).
You want a single binary on a server, not a desktop app.
You're integrating with VS Code's Continue extension, LangChain, llama_index, or any OpenAI-compatible SDK.
You care about idle RAM.

Use LM Studio if

You want to chat with local models without writing code.
You need to test exotic models that aren't in Ollama's registry.
You like a polished UI for managing model files.
You're doing model research and need fine-grained control over quantisation.

Use Jan if

"Fully open-source, every dependency" is a hard requirement.
You want a ChatGPT-style chat UI that's yours forever, regardless of what OpenAI does.
You're building for users who want a desktop AI assistant they own, not a developer tool.

What about combining them?

This actually works. I run Ollama as my background daemon for code (Continue extension, local agents, scripts) and use LM Studio when I want to compare three models side by side on the same prompt. They don't conflict because LM Studio defaults to port 1234 and Ollama uses 11434.

You can also point LM Studio's UI at an Ollama backend if you really want — it's OpenAI-compatible. But at that point just use the Ollama CLI; the indirection isn't paying for itself.

My recommendation

Start with Ollama. It's the right default for ninety percent of developer workflows in 2026. If your needs grow into research or non-developer audiences, add LM Studio or Jan alongside. They coexist fine.

The wrong move is picking the GUI-first option because the install is friendlier, and then six months later trying to retrofit it into a programmatic pipeline. Pick the tool that matches what you'll be doing in twelve months.

Next post in this series: function calling with local models. Most Ollama tutorials skip it. We'll go through it end to end.

How to Replace GitHub Copilot with Ollama (Local AI Coding, Free Forever)

Pavel Espitia — Thu, 30 Apr 2026 20:30:32 +0000

Self-Hosted GitHub Copilot Alternative: Code with Ollama for Free

GitHub Copilot is ten dollars a month and your code goes to a third party. Both are fine until they're not. If you write proprietary code, work in a regulated industry, or just don't want to ship every keystroke to Microsoft, there's a free, local alternative that runs entirely on your laptop.

This post walks through the setup. End to end, twenty minutes.

The stack

Ollama — the local model runner.
Continue.dev — a VS Code extension that talks to Ollama.
A code-focused LLM — qwen2.5-coder:7b for speed, qwen2.5-coder:32b for quality.

That's it. No API keys, no monthly bill, no telemetry.

Why this actually works in 2026

Local code models have closed the gap. Qwen2.5-Coder 32B benchmarks within five points of Claude Sonnet on HumanEval. The 7B variant is fast enough to autocomplete in real time on a 16 GB MacBook M2. DeepSeek-Coder-V2 and CodeLlama 70B are also strong choices if you have more RAM.

The key shift: you no longer need a cloud GPU farm to get usable AI assistance.

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download the installer from ollama.com

Pull the code model:

ollama pull qwen2.5-coder:7b      # 4 GB, fast
ollama pull qwen2.5-coder:32b     # 19 GB, slower but smarter

Verify it's running:

ollama run qwen2.5-coder:7b "write a TypeScript debounce function"

It should respond in one to three seconds.

Install Continue

In VS Code, install the extension continue.continue.

Open Continue's settings (Cmd+L on macOS, then click the gear icon) and edit your ~/.continue/config.json to point at Ollama:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7B",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Tab Autocomplete",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

Pull the embedding model so codebase indexing works:

ollama pull nomic-embed-text

Reload VS Code. Open any TypeScript file and start typing. You should see grey ghost-text suggestions, just like Copilot.

Test it like a real user

Hit Cmd+I and ask:

Write a function that debounces a callback. Use TypeScript generics.

You should see a response in two to four seconds. The function should be correct on first attempt.

For inline edits, select code and hit Cmd+I:

Refactor this to use early returns.

The diff appears inline. Accept or reject with one keystroke.

Latency comparison

Action	Cloud Copilot	Local Ollama (7B)	Local Ollama (32B)
Tab autocomplete	200-400 ms	300-600 ms	1.5-3 s
Inline edit	1-2 s	2-4 s	8-15 s
Multi-file refactor	3-5 s	5-10 s	20-40 s

The 7B model is the right default for most flows. Bring out the 32B model when you're doing architecture work or asking for explanations.

What you actually give up

I don't want to oversell this. The 7B local model misses subtle bugs that Copilot's frontier model catches — null-check edge cases, Promise.all vs Promise.allSettled distinctions, that kind of thing. Multi-file context is also weaker. Continue indexes your repo locally, but it's not at the level of Copilot's whole-workspace awareness.

For senior engineers writing performance-sensitive code at the limit of the language, Copilot is still better. For everyone else doing 80 percent of normal day-to-day work, local Ollama is indistinguishable in quality and zero in cost.

What you save

A hundred and twenty dollars a year, plus the value of your code never leaving your machine. If you ship in a regulated industry where data sovereignty matters — health, finance, defense, legal — this is the difference between "no AI assistance allowed" and "AI assistance with a full local audit trail." That trade rarely makes sense at the cost of a single subscription.

What's next

If this works for you, the next post in this series goes deeper: comparing Ollama against LM Studio and Jan, the two other serious local AI runners. Different tradeoffs, different best-fits. Worth knowing before you commit a tool to your daily flow.

For now, you have a working local Copilot. Happy not-paying.

Solidity vs Vyper: Security Differences Every Auditor Should Know

Pavel Espitia — Thu, 30 Apr 2026 15:06:11 +0000

When I started building spectr-ai, one of the first decisions was which EVM languages to support. Solidity was obvious — it powers over 90% of deployed contracts. But Vyper kept showing up in DeFi protocols I was auditing, and the security differences between the two languages are more significant than most developers realize.

This post breaks down where each language helps (and hurts) your contract's security posture, with concrete code examples.

Solidity's Footgun Collection

Solidity gives you enormous power and enormous rope to hang yourself with. Here are the features that keep auditors employed.

delegatecall

delegatecall executes another contract's code in the context of the calling contract. This means the called contract can modify the caller's storage. It's the backbone of upgradeable proxies — and the source of hundreds of millions in losses.

// Dangerous: anyone can call this and change contract storage
contract Vulnerable {
    address public owner;

    function execute(address target, bytes memory data) public {
        (bool success, ) = target.delegatecall(data);
        require(success);
    }
}

An attacker deploys a malicious contract that sets owner to their address, then calls execute pointing to it. Game over.

tx.origin

tx.origin returns the original external account that initiated the transaction, not the immediate caller. This breaks when contracts call other contracts.

// Vulnerable to phishing attacks
function withdraw() public {
    require(tx.origin == owner, "Not owner");
    payable(msg.sender).transfer(address(this).balance);
}

If the owner interacts with a malicious contract, that contract can call withdraw and the tx.origin check passes because the owner initiated the transaction chain.

Inline Assembly

Solidity's assembly blocks give you raw EVM access. No type safety, no overflow checks, no guard rails.

function unsafeAdd(uint256 a, uint256 b) public pure returns (uint256) {
    assembly {
        mstore(0x0, add(a, b))  // No overflow check
        return(0x0, 32)
    }
}

selfdestruct

selfdestruct removes a contract from the blockchain and force-sends its ETH balance to any address. This bypasses receive() and fallback() functions, breaking contracts that rely on address(this).balance for logic.

// This invariant can be broken by selfdestruct
function isBalanceCorrect() public view returns (bool) {
    return address(this).balance == totalDeposits;
}

Note: selfdestruct behavior changed after EIP-6780 (Dencun upgrade), but force-sending ETH still works during the creation transaction.

Vyper's Safety-by-Design Philosophy

Vyper takes the opposite approach: remove dangerous features entirely. No inheritance, no operator overloading, no inline assembly, no function overloading, and bounded loops only.

Bounded Loops

Vyper requires loop bounds at compile time. You literally cannot write an unbounded loop.

# Vyper: must specify max iterations
@external
def sum_deposits(deposits: DynArray[uint256, 100]) -> uint256:
    total: uint256 = 0
    for deposit: uint256 in deposits:
        total += deposit
    return total

Compare that to Solidity, where an unbounded loop over a growing array is a classic gas griefing vector:

// Solidity: nothing stops you from iterating forever
function sumDeposits() public view returns (uint256) {
    uint256 total = 0;
    for (uint256 i = 0; i < deposits.length; i++) {
        total += deposits[i];  // Gas bomb if array grows large
    }
    return total;
}

No Inheritance

Vyper has no inheritance. This sounds limiting until you realize that inheritance is a major source of audit complexity. Diamond inheritance, storage layout conflicts between parent contracts, and shadowed functions have caused real exploits.

In Vyper, every contract is flat. What you see is what you get.

Default Overflow Protection

Both languages now have overflow protection by default (Solidity since 0.8.0, Vyper since inception), but Vyper had it from day one. In Solidity, developers can still opt out with unchecked blocks — and they do, often incorrectly, to save gas.

// Solidity: developers can bypass overflow checks
function riskyMath(uint256 a, uint256 b) public pure returns (uint256) {
    unchecked {
        return a - b;  // Wraps on underflow
    }
}

Vyper has no equivalent escape hatch.

Vyper Is Not Immune

Vyper's safety-first design reduces the attack surface, but it does not eliminate it.

raw_call

Vyper's raw_call is analogous to Solidity's low-level call. It gives you the same reentrancy and return-data risks.

# Vyper: raw_call is just as dangerous as Solidity's .call()
@external
def forward_call(target: address, data: Bytes[1024]):
    raw_call(target, data)  # No reentrancy guard

The Reentrancy Lock Bug (2023)

In July 2023, a compiler bug in Vyper versions 0.2.15, 0.2.16, and 0.3.0 broke the @nonreentrant decorator. The reentrancy lock was not properly enforced, leading to exploits on several Curve Finance pools and roughly $70M in losses.

This is a crucial lesson: language-level safety features are only as reliable as the compiler that implements them.

Storage Collisions in Older Versions

Before Vyper 0.4.0, storage slot assignments could collide when using certain patterns with DynArray and mappings. The compiler has since fixed this, but contracts deployed with older versions remain vulnerable.

Default Visibility

In Vyper, functions without a decorator default to @internal. In Solidity, functions default to public (prior to 0.5.0, they defaulted to public — a common footgun). However, Vyper's @external decorator is still easy to misapply:

# Vyper: accidentally exposing an admin function
@external
def set_fee(new_fee: uint256):
    # Forgot access control — anyone can call this
    self.fee = new_fee

The language does not enforce access control; that is still the developer's job.

The Same Vulnerability in Both Languages

Let's look at a classic reentrancy bug implemented in both languages.

Solidity:

contract VulnerableVault {
    mapping(address => uint256) public balances;

    function withdraw() external {
        uint256 amount = balances[msg.sender];
        (bool success, ) = msg.sender.call{value: amount}("");
        require(success);
        balances[msg.sender] = 0;  // State update AFTER external call
    }
}

Vyper:

balances: public(HashMap[address, uint256])

@external
def withdraw():
    amount: uint256 = self.balances[msg.sender]
    raw_call(msg.sender, b"", value=amount)
    self.balances[msg.sender] = 0  # Same bug: state update after call

Both are vulnerable to reentrancy. The fix is the same in both languages: update state before making external calls (checks-effects-interactions pattern), or use a reentrancy lock.

What This Means for Auditors

When auditing Solidity, your checklist is longer. You need to check for delegatecall misuse, selfdestruct edge cases, tx.origin phishing, inline assembly correctness, inheritance conflicts, and unchecked arithmetic.

When auditing Vyper, the attack surface is smaller, but you need to verify the compiler version (especially for the reentrancy lock bug), check raw_call usage, and still look for access control issues and logic errors that no language can prevent.

In spectr-ai, we weight findings differently based on the source language. A delegatecall in Solidity triggers a high-severity check. In Vyper, that pattern does not exist, so the engine focuses on raw_call patterns and compiler-version-specific issues instead.

The Takeaway

Vyper is genuinely safer by default. If your contract does not need Solidity's advanced features (upgradeable proxies, complex inheritance hierarchies, inline assembly optimizations), Vyper reduces the surface area an attacker can probe.

But "safer by default" is not "safe." The Curve exploit proved that compiler bugs can undermine language-level guarantees. No matter the language, the fundamentals still apply: checks-effects-interactions, access control, input validation, and — ideally — a thorough audit by both AI and human reviewers.

The best security comes from using the right tool for the job and understanding the specific risks of whichever language you choose.

How I Structured a TypeScript Monorepo with pnpm Workspaces

Pavel Espitia — Thu, 30 Apr 2026 15:06:10 +0000

When spectr-ai started as a single package, everything lived in one directory: the CLI engine, the web frontend, shared types, and configuration. It worked fine until it did not. The engine needed its own publish cycle. The web app had different build tooling. Shared types were copy-pasted between files. It was time for a monorepo.

This is the practical guide I wish I had. No theory — just the steps, the config files, and the problems I hit along the way.

Why pnpm Workspaces

I chose pnpm over npm workspaces or Turborepo for three reasons: strict dependency isolation (packages cannot access undeclared dependencies), disk efficiency through content-addressable storage, and the workspace:* protocol that makes cross-package references explicit. Turborepo is great for build orchestration, but pnpm workspaces handle dependency management better out of the box.

The Target Structure

Here is what spectr-ai's monorepo looks like after the migration:

spectr-ai/
  pnpm-workspace.yaml
  package.json
  tsconfig.base.json
  packages/
    engine/
      package.json
      tsconfig.json
      src/
    shared/
      package.json
      tsconfig.json
      src/
  apps/
    web/
      package.json
      tsconfig.json
      src/

Three workspace packages: @spectr-ai/engine (the CLI and analysis core), @spectr-ai/shared (types, constants, utilities), and the web app under apps/web.

Step 1: pnpm-workspace.yaml

This file tells pnpm where your packages live. Create it at the repo root:

packages:
  - "packages/*"
  - "apps/*"

That is the entire file. Every directory matching these globs that contains a package.json becomes a workspace package.

Step 2: Root package.json

The root package.json is not a publishable package. It holds shared dev dependencies and workspace-level scripts.

{
  "name": "spectr-ai",
  "private": true,
  "scripts": {
    "build": "pnpm --filter './packages/**' build",
    "dev": "pnpm --filter @spectr-ai/web dev",
    "lint": "pnpm -r lint",
    "test": "pnpm -r test",
    "typecheck": "pnpm -r typecheck"
  },
  "devDependencies": {
    "typescript": "5.7.3"
  }
}

Key details: "private": true prevents accidental publishing of the root. The -r flag runs a script recursively across all workspace packages. The --filter flag targets specific packages.

Step 3: Package-Level package.json

Each package declares its own name, dependencies, and entry points. Here is the engine package:

{
  "name": "@spectr-ai/engine",
  "version": "0.3.0",
  "type": "module",
  "exports": {
    ".": {
      "import": "./dist/index.js",
      "types": "./dist/index.d.ts"
    },
    "./analyzers": {
      "import": "./dist/analyzers/index.js",
      "types": "./dist/analyzers/index.d.ts"
    }
  },
  "scripts": {
    "build": "tsc --build",
    "lint": "oxlint src/",
    "test": "vitest run",
    "typecheck": "tsc --noEmit"
  },
  "dependencies": {
    "@spectr-ai/shared": "workspace:*"
  }
}

The workspace:* protocol is the critical piece. It tells pnpm to resolve @spectr-ai/shared from the local workspace instead of the registry. When you publish, pnpm automatically replaces workspace:* with the actual version number.

The exports field replaces the old main/types fields and gives you fine-grained control over what consumers can import. Without it, any file in dist/ is importable — which leaks internal implementation details.

Step 4: TypeScript Configuration

A base tsconfig.base.json at the root defines shared compiler options:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "declaration": true,
    "declarationMap": true,
    "sourceMap": true,
    "strict": true,
    "noUncheckedIndexedAccess": true,
    "exactOptionalPropertyTypes": true,
    "noImplicitOverride": true,
    "verbatimModuleSyntax": true,
    "isolatedModules": true,
    "skipLibCheck": true
  }
}

Each package extends it and adds its own paths:

{
  "extends": "../../tsconfig.base.json",
  "compilerOptions": {
    "rootDir": "src",
    "outDir": "dist",
    "composite": true
  },
  "include": ["src"],
  "references": [
    { "path": "../shared" }
  ]
}

The composite: true and references fields enable TypeScript's project references. This gives you incremental builds: changing a file in shared only rebuilds shared and its dependents, not the entire repo.

Step 5: Cross-Package Imports

With the setup above, importing from a sibling package looks like importing from any npm package:

import { AuditResult, Severity } from "@spectr-ai/shared";
import { analyzeContract } from "@spectr-ai/engine/analyzers";

No relative paths crossing package boundaries. No path aliases that break at runtime. The exports field in each package's package.json controls what is importable.

Step 6: CI with --filter

In CI, you do not want to rebuild and test everything when only one package changed. pnpm's --filter flag handles this:

jobs:
  test-engine:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
        with:
          persist-credentials: false
      - uses: pnpm/action-setup@a7487c7e89a18df4991f7f222e4898a00d66ddda  # v4.1.0
      - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4.4.0
        with:
          node-version: 22
          cache: pnpm
      - run: pnpm install --frozen-lockfile
      - run: pnpm --filter @spectr-ai/engine build
      - run: pnpm --filter @spectr-ai/engine test
      - run: pnpm --filter @spectr-ai/engine typecheck

The --filter flag also resolves dependencies automatically. If engine depends on shared, running pnpm --filter @spectr-ai/engine build builds shared first.

For change-based filtering, you can use --filter ...[origin/main] to only run against packages that changed since the main branch:

pnpm --filter '...[origin/main]' test

Problems I Hit

Problem 1: Phantom dependencies. A package imported zod without declaring it in its own package.json. It worked locally because another package had zod installed. pnpm's strict mode caught this immediately — npm workspaces would not have.

Problem 2: Build order. Running pnpm -r build without project references built packages in alphabetical order. The web app tried to build before shared was compiled. Adding references to each tsconfig.json and using tsc --build fixed the ordering.

Problem 3: Type resolution during development. Before building, TypeScript could not resolve types from sibling packages because the dist/ directories did not exist yet. The fix was adding declarationMap: true to the base config and ensuring the references array was correct. With project references, tsc resolves types from source during development.

Problem 4: IDE performance. VS Code's TypeScript server struggled with three projects. The solution was adding a root tsconfig.json with only references (no include), which tells the language server about the project structure:

{
  "references": [
    { "path": "packages/engine" },
    { "path": "packages/shared" },
    { "path": "apps/web" }
  ],
  "files": []
}

Was It Worth It

Yes. Dependency boundaries are enforced by tooling instead of convention. Each package has its own test suite, lint config, and build step. The engine can be published to npm independently. CI runs are faster because only changed packages are tested.

The migration took about a day. Most of that time was moving files and fixing import paths. The configuration itself — once you know what each field does — is maybe 30 minutes of work.

If your project has two or more distinct concerns sharing code, a pnpm monorepo is worth the setup cost. Start with the structure above and adjust as your project grows.

AI Won't Replace Smart Contract Auditors — But Auditors Using AI Will Replace Those Who Don't

Pavel Espitia — Thu, 30 Apr 2026 15:06:08 +0000

Every few months, someone on Twitter declares that AI will make smart contract auditors obsolete. I have been building spectr-ai — an AI-powered smart contract analysis tool — for the past several months, and I can tell you definitively: that take is wrong. But so is the opposite claim that AI is useless for security work.

The truth is more nuanced, more interesting, and has real implications for anyone building or auditing smart contracts.

What AI Actually Does Well

AI excels at pattern recognition at scale. Feed a language model a Solidity contract, and it will reliably catch:

Known vulnerability patterns. Reentrancy, unchecked return values, tx.origin authentication, uninitialized storage pointers, integer overflow in pre-0.8.0 contracts — these are well-documented patterns that appear in training data thousands of times. AI catches them fast and consistently.

Style and hygiene issues. Missing events on state changes, functions that should be view/pure but are not, overly permissive visibility, missing zero-address checks. These are not exploits, but they indicate sloppy code that often harbors deeper issues.

Deviation from known-safe patterns. When a contract implements a DEX but deviates from the Uniswap V2/V3 pattern in a specific function, AI can flag the deviation. It may not know why the deviation is dangerous, but it knows it is unusual.

Speed. An AI can analyze a 1,000-line contract in seconds. A human auditor spends hours or days. For a first pass, that speed difference is transformative.

When I run spectr-ai against contracts with known vulnerabilities, it catches the obvious stuff with high reliability. Reentrancy in a withdraw function? Flagged immediately. Missing access control on an admin function? Caught every time. These are the bread-and-butter findings that make up the majority of audit reports.

What AI Cannot Do

Here is where the hype falls apart.

Novel attack vectors. The most devastating hacks in DeFi history exploited logic that no one had seen before. The Euler Finance donation attack, the Mango Markets oracle manipulation, the Cream Finance flash loan chain — these required creative reasoning about how multiple systems interact under adversarial conditions. AI cannot reason about attacks that do not exist in its training data.

Cross-contract economic modeling. DeFi protocols are composable. A lending protocol interacts with an AMM, which interacts with an oracle, which interacts with a bridge. Understanding how a price manipulation in one protocol cascades through this stack requires modeling economic incentives, game theory, and multi-step attack paths. Current AI models can follow these chains if you lay them out, but they cannot discover them independently.

Business logic validation. A contract might be technically secure — no reentrancy, no overflow, proper access control — but implement the wrong business logic. If a governance contract lets a proposal execute immediately instead of after a timelock, AI might not flag it unless you tell it the intended behavior. AI does not know what the contract is supposed to do; it only knows what the code actually does.

Subtle storage layout issues. Upgradeable proxy contracts have strict requirements about storage slot ordering across implementations. AI can check basic rules, but complex storage layouts with inherited contracts and gap variables require careful manual analysis.

Context about deployment. A contract might be safe in isolation but dangerous in the context of its deployment. Who are the privileged roles? What is the expected call flow? Which external contracts will it interact with? AI does not have this context unless you provide it.

The Economics Are Reshaping the Market

Here is what matters most: the cost structure of security is changing.

A traditional smart contract audit from a top firm costs $50,000 to $200,000 and takes 2-4 weeks. This means only well-funded projects get audited. The long tail of smaller contracts — the ones deployed by indie developers, small DAOs, and experimental protocols — ship unaudited because the economics do not work.

An AI-powered first pass costs essentially nothing. Running a contract through spectr-ai or similar tools takes seconds and catches the most common issues. This does not replace a professional audit, but it does catch the low-hanging fruit that accounts for a large percentage of real exploits.

The market is not shrinking. It is widening. Projects that could never afford an audit now have access to automated analysis. Projects that can afford an audit get a faster, more thorough review because the auditor spends less time on obvious issues and more time on complex logic.

The Hybrid Model

The winning approach — and what spectr-ai is building toward — is a hybrid pipeline:

Layer 1: Automated static analysis. Traditional tools like Slither and Mythril catch known patterns through formal methods. They are deterministic and fast.

Layer 2: AI-powered analysis. LLMs analyze the code with broader context, catching patterns that static analysis misses and providing natural-language explanations of findings. This is where spectr-ai operates.

Layer 3: Human expert review. Auditors review the AI's findings, investigate flagged areas in depth, and focus their time on business logic, economic modeling, and novel attack surfaces.

Each layer filters noise for the next. By the time a human auditor sits down, the obvious issues are already documented, and they can focus on the work that actually requires human judgment.

What This Means for Auditors

If you are a smart contract auditor, AI is not your replacement. It is your leverage.

The auditors who will thrive are the ones who use AI to handle the repetitive work and redirect their attention to higher-value analysis. An auditor who reviews AI findings and spends their time on economic attack modeling, cross-protocol interaction analysis, and business logic validation will deliver more value in less time than one who manually checks for reentrancy.

The auditors who will struggle are the ones whose primary skill is recognizing known vulnerability patterns. That skill is being commoditized. If your audit reports mostly contain findings that Slither or an LLM could have caught, the market will adjust your pricing accordingly.

What This Means for Developers

If you are deploying smart contracts, run automated tools before hiring an auditor. This is not optional anymore — it is table stakes. Use Slither for static analysis. Use an AI tool for a broader review. Fix everything they find.

Then, if your contract handles significant value or has complex logic, hire a human auditor. They will be more effective because they are not wasting time on issues you could have caught yourself.

The security gap in Web3 is not going to be closed by AI alone or by humans alone. It is going to be closed by making basic security analysis accessible to every project and reserving expert human attention for the contracts that need it most.

That is what spectr-ai is building toward. Not a replacement for auditors, but a tool that makes the entire ecosystem more secure by meeting developers where they are.

I Analyzed 5 Famous Hacked Contracts with AI — Here's What It Found

Pavel Espitia — Mon, 27 Apr 2026 12:29:11 +0000

I fed the vulnerable code patterns from five of the most devastating DeFi hacks into spectr-ai to see what an AI auditor would catch — and what it would miss. The results were both encouraging and humbling.

For each hack, I reconstructed the vulnerable code pattern (simplified for clarity), ran it through the AI analysis pipeline, and recorded the findings. No cherry-picking. Here is what happened.

1. The DAO — Reentrancy ($60M, June 2016)

What happened: The DAO's splitDAO function sent ETH to users before updating their balance. An attacker called the function recursively through a fallback function, draining funds repeatedly before the balance was set to zero.

The vulnerable pattern:

function withdraw(uint amount) public {
    require(balances[msg.sender] >= amount);

    // ETH sent before state update
    (bool success, ) = msg.sender.call{value: amount}("");
    require(success);

    // State updated after external call
    balances[msg.sender] -= amount;
}

What the AI found: Flagged immediately. High severity. The finding identified the external call before state update, correctly described the reentrancy attack vector, and recommended the checks-effects-interactions pattern. It also suggested adding a reentrancy guard modifier.

Verdict: Caught. This is the canonical example of a known vulnerability pattern. Any tool worth its salt catches this one.

2. Parity Multisig — delegatecall + selfdestruct ($280M frozen, November 2017)

What happened: Parity's multisig wallet used a library contract via delegatecall. The library contract had an initWallet function that was left unprotected after deployment. An attacker called initWallet on the library itself, became its owner, then called kill() which executed selfdestruct. Since every wallet delegated to this library, all of them became nonfunctional — $280M in ETH was permanently frozen.

The vulnerable pattern:

contract WalletLibrary {
    address public owner;
    bool public initialized;

    function initWallet(address _owner) public {
        // No check if already initialized on the library itself
        require(!initialized);
        owner = _owner;
        initialized = true;
    }

    function kill(address to) public {
        require(msg.sender == owner);
        selfdestruct(payable(to));
    }
}

contract Wallet {
    address public library;

    fallback() external payable {
        (bool success, ) = library.delegatecall(msg.data);
        require(success);
    }
}

What the AI found: It flagged two issues. First, the selfdestruct usage was flagged as high severity with a note about permanent contract removal. Second, the open delegatecall in the fallback was flagged as a proxy pattern requiring careful access control review. However, it did not connect the dots — it did not identify that the library contract itself could be initialized by anyone because it was deployed as a standalone contract with no constructor protection.

Verdict: Partially caught. The individual dangerous primitives were flagged, but the compound attack — that the library was a standalone contract whose initialization could be hijacked — required understanding the deployment context that the AI did not have.

3. Ronin Bridge — Compromised Validators ($625M, March 2022)

What happened: The Ronin bridge required 5 of 9 validator signatures to approve withdrawals. The attacker compromised 4 validator private keys belonging to Sky Mavis and one third-party validator (Axie DAO). With 5 signatures, they approved fraudulent withdrawals of 173,600 ETH and 25.5M USDC.

The vulnerable pattern:

function withdrawERC20(
    uint256 id,
    address token,
    uint256 amount,
    address recipient,
    bytes[] calldata signatures
) external {
    require(signatures.length >= threshold, "Not enough sigs");

    bytes32 hash = keccak256(
        abi.encodePacked(id, token, amount, recipient)
    );

    uint256 validSigs = 0;
    for (uint256 i = 0; i < signatures.length; i++) {
        address signer = ECDSA.recover(hash, signatures[i]);
        if (isValidator[signer]) {
            validSigs++;
        }
    }

    require(validSigs >= threshold, "Invalid signatures");
    IERC20(token).transfer(recipient, amount);
}

What the AI found: It flagged a missing duplicate-signer check (the same validator signature could potentially be submitted multiple times depending on the implementation). It also noted that the threshold of 5/9 was relatively low for a bridge holding hundreds of millions. But fundamentally, the code logic was correct — the vulnerability was operational, not in the smart contract.

Verdict: Missed (correctly). This was not a code vulnerability. It was a key management failure. No static analysis or AI review of the contract source code could have caught this. The lesson here is that smart contract security is necessary but not sufficient — operational security matters just as much.

4. Cream Finance — Flash Loan + Oracle Manipulation ($130M, October 2021)

What happened: The attacker used a flash loan to manipulate the price of crYUSD (Cream's yUSD lending token), then used the inflated collateral value to borrow all available assets across Cream's lending markets. The attack exploited how Cream calculated the value of crYUSD as collateral — it relied on the token's exchange rate, which could be manipulated through large deposits.

The vulnerable pattern (simplified):

function getCollateralValue(
    address token,
    uint256 amount
) public view returns (uint256) {
    // Exchange rate can be manipulated via flash loan
    uint256 exchangeRate = ICToken(token).exchangeRateStored();
    uint256 underlyingAmount = amount * exchangeRate / 1e18;
    uint256 price = oracle.getPrice(token);
    return underlyingAmount * price / 1e18;
}

function borrow(
    address collateralToken,
    uint256 collateralAmount,
    address borrowToken,
    uint256 borrowAmount
) external {
    uint256 collateralValue = getCollateralValue(
        collateralToken, collateralAmount
    );
    uint256 borrowValue = borrowAmount
        * oracle.getPrice(borrowToken) / 1e18;
    require(
        collateralValue >= borrowValue * collateralFactor / 1e18
    );
    // ... execute borrow
}

What the AI found: It flagged the use of exchangeRateStored() instead of exchangeRateCurrent() as a potential stale-data issue. It also noted that the collateral valuation was susceptible to price manipulation if the underlying exchange rate could be moved within a single transaction. The flash loan attack vector was mentioned as a possibility.

Verdict: Partially caught. The AI identified the right area of concern — manipulable exchange rates used for collateral valuation — but did not construct the full multi-step attack path involving flash loans, cross-market borrowing, and the specific economic conditions needed for profitability.

5. Euler Finance — Donation Attack ($197M, March 2023)

What happened: The attacker exploited Euler's donateToReserves function, which allowed users to inflate their debt without a corresponding health check. By donating to reserves, the attacker made their own position liquidatable, then used a liquidation mechanism that was more favorable than it should have been given the manipulated state. The interaction between donateToReserves, the health check bypass, and the liquidation bonus created an extraction path.

The vulnerable pattern (simplified):

function donateToReserves(
    address subAccount,
    uint256 amount
) external {
    // Increases the donor's debt token balance
    // WITHOUT checking if the position remains healthy
    debtBalances[subAccount] += amount;
    reserveBalance += amount;
    // Missing: health check after debt increase
}

What the AI found: It flagged the missing health check after the debt increase. The finding noted that any function that modifies a user's debt-to-collateral ratio should verify the position remains solvent afterward. This was rated high severity.

However, the AI did not identify the full exploit chain — how the donation attack combined with the liquidation discount to create a profitable extraction. It caught the entry point but not the economic reasoning.

Verdict: Partially caught. The root cause (missing health check) was identified. The complete attack economics were not.

The Scorecard

Hack	Root Cause	AI Caught It?
The DAO	Reentrancy	Yes
Parity Multisig	Unprotected init + selfdestruct	Partial
Ronin Bridge	Key compromise	No (not a code bug)
Cream Finance	Oracle manipulation	Partial
Euler Finance	Missing health check	Partial

Full catches: 1/5. Partial catches: 3/5. Misses: 1/5.

What I Learned

The AI reliably catches known vulnerability patterns — reentrancy, missing access control, dangerous opcodes. That first finding from The DAO analysis would have saved $60M in 2016. That is not nothing.

But the most devastating modern hacks exploit economic logic, cross-protocol interactions, and deployment context. AI flags the ingredients (a manipulable exchange rate, a missing health check) without assembling them into the full recipe.

This confirms the hybrid model. AI as the first pass catches the known patterns quickly and cheaply. Human auditors then focus their expensive time on the economic modeling and novel attack surfaces that AI cannot reason about.

The goal of spectr-ai is not to produce a final audit report. It is to give the human auditor a head start — flagging the obvious issues so they can spend their time on the hard problems. Based on these results, that approach is working, but the gap between "flagging ingredients" and "identifying complete attack chains" remains wide.

That gap is where human expertise lives. And for now, it is not going anywhere.

Building a Chat Interface Over Any API with TypeScript

Pavel Espitia — Mon, 27 Apr 2026 12:29:10 +0000

Most AI chat interfaces do the same thing: the user types a message, the LLM generates text, and it appears on screen. But the interesting pattern is when the LLM does something — calls an API, queries a database, runs a command — and then explains what happened.

I built this pattern into AbiLens, a tool that lets you chat with any EVM smart contract. But the architecture generalizes to any external API. You can use the same approach to build a chat interface over a REST API, a database, a CLI tool, or any service with a programmatic interface.

Here's how it works.

The Architecture

The flow has four steps:

User sends a message
LLM decides what API call to make (or responds directly if no call is needed)
Your code executes the API call
LLM explains the result to the user

The key insight: the LLM never calls the API directly. It outputs a structured JSON object describing the call it wants to make. Your code validates and executes it. This keeps the LLM in a sandbox — it can only do what you allow.

User Message
    ↓
System Prompt (with available functions)
    ↓
LLM Response (JSON function call OR plain text)
    ↓
Your Code (validates, executes the call)
    ↓
API Result
    ↓
LLM Explanation (human-readable summary)

The System Prompt Template

The system prompt is where you define what the LLM can do. You list every available function with its name, description, and parameters.

function buildSystemPrompt(
  functions: FunctionDefinition[]
): string {
  const functionList = functions
    .map((fn) => {
      const params = fn.parameters
        .map((p) => `  - ${p.name}: ${p.type} — ${p.description}`)
        .join("\n");
      return `### ${fn.name}\n${fn.description}\n${params}`;
    })
    .join("\n\n");

  return `You are an assistant that helps users interact with an API.

When the user asks a question that requires data, respond with a JSON function call:
\`\`\`json
{ "function": "functionName", "args": { "key": "value" } }
\`\`\`

When you can answer directly without data, respond in plain text.

Available functions:
${functionList}

Always explain the results in plain language after receiving them.`;
}

For AbiLens, the functions are dynamically generated from the contract's ABI. For a REST API, you'd define them from your OpenAPI spec. For a database, they'd map to common queries.

interface FunctionDefinition {
  name: string;
  description: string;
  parameters: ParameterDefinition[];
}

interface ParameterDefinition {
  name: string;
  type: string;
  description: string;
  required: boolean;
}

Extracting Function Calls from the LLM Response

The LLM's response is either plain text or contains a JSON function call. You need to detect which one it is and extract the structured data.

import { z } from "zod";

const FunctionCallSchema = z.object({
  function: z.string(),
  args: z.record(z.unknown()),
});

type FunctionCall = z.infer<typeof FunctionCallSchema>;

function extractFunctionCall(
  response: string
): FunctionCall | null {
  const jsonMatch = response.match(
    /```
{% endraw %}
json\s*([\s\S]*?)
{% raw %}
```/
  );
  if (!jsonMatch?.[1]) return null;

  const parsed = FunctionCallSchema.safeParse(
    JSON.parse(jsonMatch[1].trim())
  );
  if (!parsed.success) return null;

  return parsed.data;
}

Zod validation here is not optional. LLMs produce malformed JSON, hallucinate function names, and invent parameters. Parse and validate before you execute anything.

Executing the Calls

Map function names to actual implementations. Each handler receives validated arguments and returns a result.

type FunctionHandler = (
  args: Record<string, unknown>
) => Promise<unknown>;

class FunctionRouter {
  private handlers = new Map<string, FunctionHandler>();

  register(
    name: string,
    handler: FunctionHandler
  ): void {
    this.handlers.set(name, handler);
  }

  async execute(call: FunctionCall): Promise<unknown> {
    const handler = this.handlers.get(call.function);
    if (!handler) {
      throw new Error(
        `Unknown function: ${call.function}`
      );
    }
    return handler(call.args);
  }
}

For a REST API wrapper, registration looks like this:

const router = new FunctionRouter();

router.register("getUser", async (args) => {
  const id = z.string().parse(args.id);
  const response = await fetch(`/api/users/${id}`);
  return response.json();
});

router.register("listOrders", async (args) => {
  const status = z.string().optional().parse(args.status);
  const url = new URL("/api/orders", baseUrl);
  if (status) url.searchParams.set("status", status);
  const response = await fetch(url);
  return response.json();
});

Feeding Results Back

After executing the function, send the result back to the LLM for explanation. The conversation history now includes the user's question, the LLM's function call, and the raw result.

async function handleMessage(
  userMessage: string,
  history: Message[],
  router: FunctionRouter,
  llm: LLMClient
): Promise<string> {
  history.push({ role: "user", content: userMessage });

  const response = await llm.chat(history);
  const functionCall = extractFunctionCall(response);

  if (!functionCall) {
    history.push({ role: "assistant", content: response });
    return response;
  }

  const result = await router.execute(functionCall);
  const resultText = JSON.stringify(result, null, 2);

  history.push({ role: "assistant", content: response });
  history.push({
    role: "user",
    content: `Function result:\n${resultText}\n\nExplain this result to the user.`,
  });

  const explanation = await llm.chat(history);
  history.push({
    role: "assistant",
    content: explanation,
  });
  return explanation;
}

The second LLM call is where the value lives. Raw API responses are JSON blobs. The LLM transforms them into answers: "The user has 3 pending orders totaling $142.50, the most recent one placed yesterday."

Error Handling

Things go wrong. The API returns 500. The LLM hallucinates a function that doesn't exist. The arguments are the wrong type. Handle all of these gracefully by feeding the error back to the LLM.

try {
  const result = await router.execute(functionCall);
  // ... feed result back
} catch (error) {
  const errorMessage =
    error instanceof Error
      ? error.message
      : "Unknown error";

  history.push({
    role: "user",
    content: `The function call failed: ${errorMessage}. Let the user know and suggest alternatives.`,
  });

  return llm.chat(history);
}

This creates a self-correcting loop. The LLM sees the error, explains what went wrong, and often suggests a different approach.

Where This Pattern Works

This same architecture applies beyond smart contracts:

Database explorer: Define functions for common queries (getTableSchema, runQuery, listTables). The LLM translates natural language into SQL and explains the results.
DevOps assistant: Functions for getDeployStatus, listPods, getLogsTail. Chat with your infrastructure.
API documentation: Point it at any REST API and let users explore endpoints conversationally.
CLI wrapper: Functions map to CLI commands. The LLM picks the right flags and explains the output.

The pattern always looks the same: define available functions, let the LLM choose which to call, execute in a sandbox, explain the results.

Practical Tips

Keep the function list short. More than 15-20 functions degrades LLM accuracy. Group related operations or use a two-step approach where the LLM first picks a category, then a specific function.

Include examples in function descriptions. "Returns the user's order history. Example: { 'userId': '123', 'limit': 10 }" helps the LLM format arguments correctly.

Log every function call. You want a complete audit trail of what the LLM asked for, what you executed, and what came back. This is essential for debugging and for trust.

Rate limit aggressively. The LLM doesn't know about your API quotas. Add rate limiting in your router, not in the LLM prompt.

The full AbiLens source is on my GitHub if you want to see this pattern applied to smart contract interaction. The core chat loop is under 200 lines — most of the complexity lives in the function definitions, not the orchestration.

RWA Tokenization in 2026: What Developers Need to Know

Pavel Espitia — Mon, 27 Apr 2026 12:29:07 +0000

Real World Asset tokenization crossed $36 billion on-chain in early 2026. That number was under $2 billion two years ago. BlackRock's BUIDL fund alone holds over $5 billion in tokenized US Treasuries. Franklin Templeton's BENJI tokens represent money market fund shares on Stellar and Polygon. Ondo Finance, Centrifuge, and Maple are tokenizing everything from government bonds to trade receivables.

This is not speculative DeFi. These are regulated financial instruments, on-chain, earning real yield. And the infrastructure to build them is still being figured out.

If you're a developer in the blockchain space, RWA tokenization is where the jobs and opportunities are heading. Here's what you need to understand.

Why Tokenize Real Assets?

The traditional financial system runs on T+2 settlement, paper-based custody chains, and business-hours-only operations. Tokenization replaces that with:

Instant settlement: Transfer ownership in a single transaction, 24/7
Fractional ownership: A $100M building becomes 100 million tokens at $1 each
Programmable compliance: Enforce transfer restrictions, KYC, and jurisdiction rules in code
Composability: Tokenized assets plug into DeFi lending, collateral, and yield protocols

The value proposition for institutions is clear: lower costs, faster settlement, broader distribution. For developers, this creates an entirely new infrastructure layer to build.

ERC-3643: The Compliance Token Standard

You can't tokenize a security with a standard ERC-20. Securities have transfer restrictions — you can't sell them to unaccredited investors, in sanctioned jurisdictions, or beyond ownership caps.

ERC-3643 (formerly T-REX) is the standard that solves this. It adds an identity and compliance layer on top of ERC-20 transfers.

The architecture has three components:

Identity Registry: Maps wallet addresses to on-chain identity claims. Before any transfer, the token contract checks if both sender and receiver have valid identity claims.

Compliance Module: Encodes transfer rules. Maximum holder count, country restrictions, lock-up periods, ownership caps. These are modular — you compose the rules your asset requires.

Trusted Issuers Registry: Lists which identity providers are trusted. A KYC provider issues a claim to a wallet, the claim is stored on-chain (or referenced via a hash), and the token contract verifies it during transfers.

// Simplified ERC-3643 transfer check
function _beforeTokenTransfer(
    address from,
    address to,
    uint256 amount
) internal override {
    require(
        identityRegistry.isVerified(to),
        "Recipient not verified"
    );
    require(
        compliance.canTransfer(from, to, amount),
        "Transfer not compliant"
    );
}

Every transfer goes through these checks. If the recipient isn't KYC'd, the transfer reverts. If the transfer would violate a compliance rule, it reverts. This is securities regulation enforced at the protocol level.

KYC/AML Hooks and Identity

The identity layer is where most of the complexity lives. On-chain identity for financial compliance requires:

Claim-based identity: Wallets hold verifiable claims (accredited investor, jurisdiction, tax status) issued by trusted third parties
Privacy-preserving verification: You need to prove "this wallet belongs to a KYC'd US accredited investor" without revealing their name or social security number
Revocability: When someone fails an AML check, their claims get revoked, and their tokens become non-transferable until resolved

Most production systems use a hybrid approach: identity verification happens off-chain with a traditional KYC provider, and the result is posted on-chain as a hash or a zero-knowledge proof.

// Identity claim structure
struct Claim {
    uint256 topic;      // e.g., 1 = KYC, 2 = accredited
    address issuer;     // trusted KYC provider
    bytes signature;    // issuer's signature over claim data
    bytes data;         // claim details (often a hash)
    string uri;         // off-chain data reference
}

Privacy with Zero-Knowledge Proofs

Here's the tension: financial regulations require knowing who holds what, but blockchain transactions are public. If BlackRock tokenizes a fund, they don't want the world seeing every investor's position in real-time.

ZK-proofs solve this. A holder can prove "I am KYC'd and accredited" without revealing identity. A compliance check can verify "this transfer doesn't violate any rules" without exposing the rule parameters.

Midnight, a privacy-focused blockchain from the Cardano ecosystem, is built specifically for this use case. It supports confidential smart contracts where the logic executes in a shielded environment — the chain verifies the proof of correct execution without seeing the data.

Other approaches include:

zkKYC protocols: Prove identity claims without revealing underlying data
Confidential ERC-20s: Token balances and transfers are encrypted, with ZK-proofs ensuring correctness
Selective disclosure: Reveal only what's needed for a specific compliance check

This is still early. The tooling is immature and the standards are evolving. But privacy-preserving compliance is where the industry is heading, and developers who understand both ZK and securities regulation will be in high demand.

The Opportunity for Tooling Builders

The RWA stack has massive gaps. Here's where developers can build:

Compliance SDKs: ERC-3643 is a standard, but deploying a compliant token still requires deep knowledge of the spec. A developer-friendly SDK that handles identity registry setup, compliance module configuration, and trusted issuer management would be valuable.

Tokenization platforms: Think "Stripe for tokenizing assets." Upload your legal documents, configure compliance rules, deploy to your target chain. Companies like Securitize and Tokeny are building this, but the space is far from consolidated.

Audit tools for RWA contracts: Standard smart contract auditors don't understand securities compliance. An audit tool that checks not just for reentrancy and overflow but also for compliance gaps — missing transfer restrictions, incorrect identity checks, inadequate access controls on admin functions — is a real product opportunity. This is exactly the kind of domain-specific analysis that AI auditors like spectr-ai could be extended to handle.

Cross-chain bridges for regulated assets: Moving tokenized securities between chains while maintaining compliance state is an unsolved problem. The identity claims on Ethereum don't automatically exist on Polygon.

Reporting and analytics: Regulators want reports. Token issuers need dashboards showing holder distribution by jurisdiction, transfer volumes, compliance violations. The data is all on-chain — someone needs to build the indexing and visualization layer.

What to Learn

If you want to work in RWA tokenization, here's a practical learning path:

Understand ERC-3643 deeply. Read the spec, deploy a test token, try transferring between verified and unverified wallets.
Learn the regulatory basics. You don't need a law degree, but you need to understand what "accredited investor," "Reg D," "Reg S," and "MiFID II" mean. The rules determine the code.
Study ZK fundamentals. Circom, Noir, or Halo2 — pick one and build a simple proof. Understand what can be proven and what the computational costs are.
Build something. A toy tokenization platform that mints compliant tokens with mock KYC. The best way to learn the ERC-3643 flow is to implement it.

The RWA space is hiring and the talent pool is thin. Most blockchain developers understand DeFi but not securities compliance. Most fintech developers understand compliance but not smart contracts. The intersection is where the opportunity lives.