DEV Community: Pavel

I Was Asked to Add a Simple Classifier to a Website. Then I Saw the 250 MB Download.

Pavel — Wed, 03 Jun 2026 09:52:35 +0000

A client asked me for a simple thing.

Not ChatGPT.

Not an agent.

Not a multimodal assistant that can explain invoices, generate React components, and write poetry in three languages.

Just a small classifier embedded into a website.

The job sounded boring in the best possible way:

take some text, classify it, return a result, keep it fast.

So I started looking at the usual solutions.

And then I had one of those moments where you stop reading documentation, lean back, and ask:

Are we seriously doing this?

Because the answer I kept running into looked like this:

download a huge runtime

download a huge model

initialize a big ML stack

then classify one small piece of text

In one setup, the path was getting close to something like 250 MB per user.

For a simple classifier.

On a website.

From a server.

Every time.

No. Sorry. That is insane.

The problem

The web has a strange habit now.

You ask for one small AI feature, and the answer is often:

bring the entire construction company.

But sometimes I do not need a construction company.

I need one person on the construction site.

One task.

One tool.

One result.

This is especially true for simple classification, embeddings, semantic search, routing, filtering, ranking, small local decisions.

Not every AI problem needs an LLM.

Not every website needs a full inference engine.

Not every user should pay a 250 MB download tax because we were too lazy to think smaller.

So I started digging

I wanted something simple:

runs in the browser
does not require a server for inference
small enough to actually ship
works with transformer-style models
can tokenize text
can run BERT-like forward inference
can produce embeddings or classification input
does not bring ONNX Runtime, Candle, ndarray, or half the internet with it

At first I thought:

“Surely someone already made the tiny version.”

There are great tools out there.

Transformers.js is powerful.

ONNX Runtime Web is powerful.

Candle is powerful.

But that was exactly the problem.

They are powerful because they are general.

I did not need general.

I needed narrow.

I needed small.

So I built one.

Introducing wasmicro

wasmicro is my attempt at a tiny transformer inference runtime for the web.

Current WASM bundle size:

~94 KB

Not 94 MB.

94 KB.

The project is here:

https://github.com/Xzdes/wasmicro

Live demo:

https://xzdes.github.io/wasmicro/

It is not perfect.

It is not finished.

It is still being tested.

But it already does the thing I needed: run a small transformer-style pipeline in the browser without shipping a giant runtime.

What it does today

Right now wasmicro supports:

tiny owned tensors
safetensors loading
WordPiece tokenizer
BERT encoder forward pass
mean pooling for embeddings
WASM bindings
SIMD128 matmul path
i8/u8/q4 quantized weight types
converter tool for HuggingFace models

The design rule is simple:

if it does not make the WASM bundle smaller, faster, or useful for a transformer architecture, it probably does not belong.

No training.

No autograd.

No optimizer.

No general tensor framework.

No “module zoo”.

Just forward inference.

Why not just use a big runtime?

Because sometimes the runtime is bigger than the problem.

If I am building a serious AI application, yes, I will use serious AI infrastructure.

If I need WebGPU, many architectures, image models, audio models, generation, pipelines, fallback backends, and broad model support, then I should use the big tools.

But if I need a small classifier embedded into a website?

I do not want the entire AI ecosystem.

I want the smallest useful thing.

The difference is like hiring:

a whole construction company
or one worker with the correct tool

The web keeps giving me the company.

I needed the worker.

The current numbers

The WASM bundle is currently around:

94 KB after wasm-opt -Oz

The hard size ceiling I set for myself is:

250 KB

The default library dependency set is intentionally tiny.

The project avoids pulling in things like:

ndarray
candle
rayon
serde_json
chrono
getrandom

The converter CLI can use heavier dependencies, because it runs on a desktop machine and never ships to the browser.

The browser runtime stays small.

Is it faster?

That depends what “fast” means.

If you mean maximum throughput on GPU against a fully optimized WebGPU runtime, probably not.

That is not the fight.

The fight I care about is:

cold start
first useful result
small runtime
simple embedding/classification tasks
CPU/WASM path
no huge framework download

I want to compete on:

time to load
time to first embedding
runtime size
simple integration

Not on “who supports 200 model architectures”.

That is a different game.

What is missing

A lot.

This is still early.

The project still needs:

better public benchmarks
easier model download story
more optimized attention path
less allocation during forward
better q4 loading for BERT
cleaner zero-config examples
more browser measurements
more real-world classification demos

Also, the live demo currently expects you to provide model files.

That is not ideal.

But it is already enough to prove the point:

You do not always need a massive runtime to do a small AI job.

The real lesson

This started as a simple client task.

“Add a classifier to a website.”

Then I looked at the common path and saw the cost.

And I could not accept that the answer to a small feature was:

ship hundreds of megabytes and hope nobody notices.

Users notice.

Browsers notice.

Mobile connections notice.

Cold start notices.

So I built the smaller thing.

Not because it is perfect.

Because the alternative felt wrong.

Final thought

AI tooling is amazing right now.

But we are also getting lazy.

We reach for the biggest tool because it is convenient.

Sometimes that is correct.

Sometimes it is absurd.

If the job needs a crane, use a crane.

If the job needs one person with a hammer, do not send a construction company.

wasmicro is my attempt at the hammer.

Small.

Narrow.

Still rough.

But already useful.

GitHub:

https://github.com/Xzdes/wasmicro

Demo:

https://xzdes.github.io/wasmicro/

I built a deep learning framework in Rust from scratch — Part 3: the road to crates.io

Pavel — Sat, 25 Apr 2026 07:39:04 +0000

In Part 1 I argued why a graph-based DL framework in pure Rust was a project worth doing.

In Part 2 I wrote the GPU backend on wgpu and figured out how to make TransformerBlocktrain on it. Both posts ended with the same honest admission: the code was fine, but the project wasn't ready for other humans.

This post is about closing that gap. Six phases of work, a v0.2.0 → v0.3.1
bump, and a crate that now looks like something you'd actually reach for.

Here's the plan I committed to at the start:

Phase 1: cleanup and consistency — get to 0 warnings.
Phase 2: API reliability — declarative layer API so users don't
hand-manage HashMap<String, Shape>.
Phase 3: GPU completeness — every CPU op should have a WGSL twin.
Phase 5: ecosystem — RoPE done properly, Slice/Concat primitives, CI.
Phase 6: pre-release polish — fmt, clippy, docs.rs, CNN example.

(Phase 4 — performance — is intentionally deferred to v0.5. I'll explain.)

Phase 1 — 121 warnings to zero

The first thing I did was run cargo build --release --all-targets. The
output scrolled for a full screen:

warning: `rustyasg` (bin "rustyasg") generated 121 warnings (14 duplicates)
warning: `rustyasg` (lib) generated 21 warnings

142 warnings feels like somebody stopped caring somewhere, but what I
actually found was amusing: src/main.rs started with

mod analysis;
mod asg;
mod autograd;
mod gui_viewer;
// ... seven more lines

That's the binary recompiling the entire library as a separate crate. So
every pub struct in nn/ that main.rs didn't touch became "never used" —
and there are a lot of them. Fix: replace those mod declarations with
use rustyasg::*. ~100 false-positive warnings gone in one diff.

The rest was actual cleanup: deprecated rand::thread_rng → rand::rng,
unused imports, a let minus_one = lit_scalar(-1.0) that was never read, and
the rotate_half function in RoPE that turned out to be a stub that added
cos as a bias. I couldn't fix it at this point (needed Slice/Concat ops)
but I added a big doc-comment calling it out so nobody would accidentally ship
it to production:

/// **STUB IMPLEMENTATION.** Full RoPE requires Slice/Concat operations
/// to split head_dim into pairs. Current code just adds `cos` as a bias —
/// this is **not** mathematically correct RoPE and must be fixed before
/// production use.
fn rotate_half(&self, x: &Tensor, _seq_offset: usize) -> Tensor { ... }

At the end of Phase 1 cargo build --release --all-targets was clean and I
had one open wound in RoPE to close later.

Phase 2 — the declarative layer API

This was the change I was most worried about because it was a breaking
change for every v0.2 user. Look at the main.rs I inherited:

let layer1 = Linear::new(&ctx, "layer1");
// ...
let mut initial_shapes = HashMap::new();
initial_shapes.insert("x".to_string(), (vec![4, 2], DType::F32));
initial_shapes.insert("layer1.weights".to_string(), (vec![2, 8], DType::F32));
initial_shapes.insert("layer1.bias".to_string(),    (vec![1, 8], DType::F32));
// ...
if name.contains("w_q.weights") || name.contains("w_k.weights") || ... {
    shape = vec![embed_dim, embed_dim];
} else if name.contains("linear1.weights") {
    shape = vec![embed_dim, ff_hidden_dim];
}

That's string-matching to figure out parameter shapes. When I saw
name.contains("w_q") in the binary I knew what Phase 2 had to be.

The fix: layers should own their shape information. Every nn::*
constructor takes dimensions, and the layer self-registers with
GraphContext:

pub fn new(ctx: &Rc<RefCell<GraphContext>>, name: &str, in_f: usize, out_f: usize) -> Self {
    let weights = Tensor::new_parameter_with_shape(
        ctx,
        &format!("{name}.weights"),
        vec![in_f, out_f],
        Initializer::XavierUniform,
    );
    // bias too...
}

Behind the scenes GraphContext got a new parameter_meta: HashMap<String, ParameterMeta> where ParameterMeta carries shape, dtype, and an
Initializer. Then two helper methods close the loop:

ctx.build_shape_map(&input_shapes)     // → feeds ShapeInference
ctx.init_parameters(&mut runtime_data) // → samples weights

I also wrote nn::init with nine standard initializers — Zeros, Ones,
Constant, Uniform, Normal, Xavier (uniform/normal), Kaiming (uniform/normal) —
because if layers are going to pick initializers on the user's behalf, the
defaults should be good defaults. Xavier-uniform for Linear, Kaiming-uniform
for Conv2d, Ones/Zeros for LayerNorm gamma/beta, Normal(0, 0.02) for
embeddings (GPT-2 conventions).

The user-facing difference is shocking. main.rs used to be 270 lines with
a hand-rolled 50-line shape-dispatch block. After Phase 2 it's 225 lines,
and the shape dispatch is one line:

ShapeInference::run_with_context(&mut graph, &ctx.borrow(), &input_shapes)?;

The XOR example dropped from 275 lines to 190 lines with no loss of clarity.
The string-matching in the binary is gone entirely.

Obviously this broke every single example, every single test. I rewrote them
all. That's what v0.3.0 is: one coherent breaking change, one SemVer bump.

Phase 3 — GPU completeness

At the start of Phase 3, if you ran cargo run --release -- --gpu on the
TransformerBlock demo, you got:

thread 'main' panicked:
  UnimplementedOperation("node type not supported on GPU:
                          LayerNorm { input: 0, gamma: 10, beta: 11, eps: 1e-5 }")

The README had been saying "✅ GPU backend" for months. It was lying by
omission. LayerNorm is not a composite op on GPU — it's a specialized
NodeType — and nobody had written the WGSL shaders.

So I wrote them. Four shaders for LayerNorm alone:

LayerNorm (forward): one worker per row.
LayerNormBackward (∂L/∂x): the formula is dx = inv_std * (dy·γ - mean(dy·γ) - x_norm·mean(dy·γ·x_norm)) — two reduction passes per row before the final output.
LayerNormGradGamma: parallelize over columns, each worker scans all rows.
LayerNormGradBeta: simple column-wise sum.

I added a helper dispatch_rowwise next to the existing dispatch_shader,
because LayerNorm's "one thread = one row, iterate the reduction axis
internally" pattern keeps showing up (and it did — in EmbeddingGrad later).

Then came the avalanche. Each took roughly the same pattern:

Operation	Shaders added
`Conv2dBackwardInput`, `Conv2dBackwardWeight`	2
`MaxPool2d`, `MaxUnpool2d` (backward)	2
`AvgPool2d`, `AvgUnpool2d`, `AdaptiveAvgPool2d`	3
`Embedding`, `EmbeddingGrad`	2
`ConvTranspose2d`	1 (+ bias shader)

Each with a matching CPU-vs-GPU parity test. The trickiest was
MaxUnpool2d: naïvely you'd scatter from grad_output back to the positions
of max values, but WGSL doesn't have atomic f32, so concurrent scatter is a
data race. I worked around it by parallelizing over input positions — for
each (n, c, ih, iw), find which windows cover it, recompute argmax in each
of those windows, and accumulate only if we're the argmax. O(kH²·kW²) per
element, which is fine for typical 2×2 / 3×3 kernels.

After Phase 3:

Epoch  1, Loss: 9.347217
Epoch  2, Loss: 1.233081
...
Epoch 15, Loss: 0.000002
--- TRAINING COMPLETE in 576ms ---

TransformerBlock trains on GPU end-to-end. 42 parity tests (now 46) verify
every GPU op matches the CPU reference to 1e-5.

Phase 5 — closing the RoPE wound, and ecosystem

The stub RoPE from Phase 1 had been eating at me. Fixing it required adding
three new primitives:

NodeType::Slice { input, axis, start, end } — the obvious building block.
NodeType::Concat { inputs, axis } — the dual.
NodeType::SliceBackward { grad_output, axis, start, full_size } — for the gradient of Slice. Concretely: zero-pad grad_output back to the original shape.

Full coverage means: NodeType + shape inference + CPU impl + GPU WGSL +
autograd. For Concat I punted on a pure-GPU implementation (would need a
multi-input kernel with dynamic strides) — the GPU path reads the inputs
back to CPU, concatenates via ndarray, and re-uploads. It's slow, but
correct, and RoPE only concatenates a few tensors of moderate size.

With Slice/Concat in place, rotate_half becomes the textbook split-half:

let x1 = x.slice(3, 0, half_dim);
let x2 = x.slice(3, half_dim, self.head_dim);

let rot1 = &(&x1 * &cos_tensor) - &(&x2 * &sin_tensor);
let rot2 = &(&x1 * &sin_tensor) + &(&x2 * &cos_tensor);

rot1.concat(&[&rot2], 3)

Mathematically correct. End-to-end differentiable. Zero stubs.

Also in Phase 5: GitHub Actions CI, CHANGELOG.md in Keep-a-Changelog
format, and CONTRIBUTING.md documenting every place you have to edit when
adding a new NodeType (six, it turns out).

Phase 6 — the polish that makes or breaks a release

This is the unglamorous phase, but it's the one that separates "published a
crate" from "published a crate that people use."

cargo fmt --all -- --check: 28 files reformatted.

cargo clippy --all-targets -- -D warnings: 33 warnings in lib alone
when I started. Most were mechanical (div_ceil, assign_op), and
cargo clippy --fix --allow-dirty chewed through them. For three of them —
too_many_arguments, type_complexity, should_implement_trait — the
clippy suggestion was worse than the original, so I allow'd them at crate
level with a comment explaining why. The library is now clippy-clean under
-D warnings, and so are the tests, binary, and examples (modulo one
#![allow(clippy::if_same_then_else)] in the MNIST example where the
"identical blocks" are part of an 0–9 pattern-generation lookup).

Strict rustdoc: RUSTDOCFLAGS="-D rustdoc::broken_intra_doc_links" caught
ten broken links, all of the form [N, C_in, H, W] in doc-comments where
I'd written tensor shape notation inside markdown link syntax. Fix: wrap
shape notations in backticks: `[N, C_in, H, W]`.

Cargo.toml for docs.rs:

[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "docsrs"]

[profile.release]
opt-level = 3
lto = "thin"
codegen-units = 1
strip = "debuginfo"

exclude = ["logo.png", "target/*", ".github/*", "*.log"]

The exclude is important — the published crate is 120 KB instead of 1.3 MB.
Users don't need the logo to compile.

A real CNN example. Up to this point every example was a MLP, which made
all the Conv2d work feel theoretical. I wrote examples/cnn_classifier.rs:

Input [N, 1, 8, 8]
  → Conv2d(1→8, 3×3, pad=1) → ReLU → AvgPool2d(2×2)    → [N, 8, 4, 4]
  → Conv2d(8→16, 3×3, pad=1) → ReLU → AdaptiveAvgPool2d → [N, 16, 1, 1]
  → reshape → Linear(16→8) → ReLU → Linear(8→3)         → [N, 3]

Trained with Adam on a 3-class synthetic dataset. Converges to 100% test
accuracy in under a second. First real exercise of Conv2dBackwardInput,
Conv2dBackwardWeight, AvgUnpool2d, AdaptiveAvgPool2d as an actual
training loop.

The last boss: CI on three platforms

After all of that, I pushed to GitHub and got this from Actions:

✅ cargo fmt (Ubuntu)
❌ cargo clippy (-D warnings) (Ubuntu) — exit code 101
✅ cargo doc (Ubuntu)
✅ test (ubuntu-latest)
✅ test (macos-latest)
❌ test (windows-latest) — exit code 1

The clippy one was predictable in retrospect: my local rustc was 1.89.0,
but dtolnay/rust-toolchain@stable on CI was grabbing whatever stable
pointed at on the morning of the build. New rustc, new clippy lints, new
-D warnings failures. Fix: pin the toolchain via env var:

env:
  RUST_TOOLCHAIN: "1.89.0"

And reference @master with the pinned version in every job. Now CI is
deterministic.

The Windows failure was more interesting. Two tests passed on Ubuntu and
macOS but failed on Windows:

#[test]
fn test_save_load_checkpoint() {
    let path = "test_checkpoint_dir";
    save_checkpoint(path, &checkpoint).unwrap();
    // ...
    fs::remove_dir_all(path).ok();
}

cargo test runs tests in parallel by default. Two tests opening the same
relative path in the same process's working directory is a race.
Windows' filesystem is stricter than Linux about concurrent deletion — Linux
will usually let you remove a directory while another handle is open, Windows
tells you to go away. The test passed on Ubuntu because the race was benign;
it failed on Windows because it wasn't.

Fix: temp directory with a unique suffix.

let path = std::env::temp_dir().join(format!(
    "rustyasg_ckpt_{}_{}",
    std::process::id(),
    std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .unwrap()
        .as_nanos()
));

Plus --test-threads=1 in CI as defence-in-depth. Now all three platforms
pass.

Going international: two READMEs

Until last week the README was in Russian, because I'm Russian and I never
expected anyone else to read it. That's the kind of default that quietly
keeps a project from getting discovered.

Fix: make the primary README.md English — docs.rs, crates.io, GitHub's
front page. Then mirror it as README.ru.md for Russian readers. Both
reference each other in the first lines.

Same treatment for every file in src/. I delegated the code translation
to a subagent with explicit instructions — preserve formatting, preserve
identifiers, only rewrite strings and comments — and verified with

grep -rP "[\x{0400}-\x{04FF}]" --include="*.rs" .

The only Cyrillic left in the repo is in README.ru.md (intentional) and
a stale target/package/0.2.0/ artifact (not published, not tracked).

docs.rs docs are now professional English. That matters more than I
expected it would.

The numbers

	Before	After
Warnings	142	0
Tests	85 + 26 + 8 = 119	87 + 46 + 8 = 141
GPU ops supported	~25 basic + Conv2d fwd	+11 (LayerNorm, Conv2d bwd, pool, embedding, ConvTranspose2d, Slice/Concat/SliceBackward)
Lines in `main.rs`	270	225
String-matching in binary	yes	zero
Layer constructors	`Linear::new(ctx, name)`	`Linear::new(ctx, name, in, out)`
RoPE	stub (`+ cos as bias`)	correct split-half
CI	none	4 jobs, 3 OSes, strict fmt/clippy/doc
README	Russian only	English + Russian mirror
Published crate size	1.3 MB (with logo)	~120 KB
TransformerBlock loss (15 epochs)	—	0.000002 on GPU
CNN example accuracy	(none existed)	100% on 3-class synthetic

Publishing

I'm tagging this one v0.3.1 and pushing to crates.io. The command list is
underwhelming given how much work led up to it:

# Ensure everything is clean.
cargo fmt --all -- --check
cargo clippy --release --all-targets -- -D warnings
cargo test --release --lib --tests --test grad_check -- --test-threads=1
RUSTDOCFLAGS="-D rustdoc::broken_intra_doc_links" cargo doc --lib --no-deps

# Commit + tag.
git add -A
git commit -m "Release v0.3.1"
git tag -a v0.3.1 -m "v0.3.1"
git push origin master --tags

# Dry-run — builds the crate archive and compiles it in isolation.
# Catches anything that works on your machine but would fail from scratch.
cargo publish --dry-run

# Ship.
cargo publish

Then docs.rs picks it up automatically and rebuilds documentation at
https://docs.rs/rustyasg/0.3.1.

What's next (v0.5 — performance)

I deliberately deferred Phase 4. The reason is boring and correct: you can't
optimise what you haven't measured, and you shouldn't measure what isn't
correct yet. v0.3.1 is correct. v0.5 will be the performance release:

GPU buffer pool. Currently every training step allocates fresh buffers for every intermediate tensor. An arena that reuses allocations should be a big win — but I want to benchmark before and after with criterion so I can quote real numbers instead of vibes.
Kernel fusion. MatMul + Bias + Activation is a three-kernel dance that should be one WGSL shader. Detection + code generation for the fusion pass is a chunk of compiler work.
Mixed precision (f16) with loss scaling.
Inference-only mode. Skip autograd graph construction when the user just wants predictions. The graph-to-graph design makes this almost free — just don't build the gradient graph.
Tiny GPT example. Needs causal masking + proper multi-batch, which needs a working inference path first.

That's v0.5. And then v1.0 adds ONNX export, WebAssembly, a real model zoo,
and a much-improved visualiser.

Reflection

The thing I keep relearning: the distance between "code that works" and
"code someone else can use" is enormous and almost entirely about things
nobody celebrates. Renaming warnings. Writing CONTRIBUTING.md. Choosing a
temp path instead of a hardcoded one. Pinning a toolchain version.
Translating a README.

If you're thinking about open-sourcing a project, my honest advice: do Phase 6 first.
Get the strict CI green, get docs.rs building, get the README
to be the first thing you actually want someone to read. Then write the
code. Your future self will thank you — and so will everybody who finds the
crate on a search.

Code is at https://github.com/Xzdes/RustyAsg. Crate is at
https://crates.io/crates/rustyasg. Issues and PRs welcome.

Part 4 will be the performance push. See you there.

Synapse v1.0 — A Programming Language for AI, Not Humans

Pavel — Fri, 30 Jan 2026 21:18:37 +0000

It’s finally ready.

After six months of work and 16,500 lines of code, I’m introducing Synapse — a programming language designed specifically for LLMs, not for programmers.

Why another language?

Every existing language was built for humans: Python is for readability, Rust is for safety, and JavaScript is for ubiquity. But when Claude or GPT generates code, they often stumble over syntactic sugar, ambiguities, and complex indentation rules.

Synapse is a language that LLMs understand better than you do.

And that’s not a bug — it’s a feature.

Key Advantages

1. S-Expression Syntax — 0% Ambiguity

(fn factorial (n)
  (if (<= n 1) 1
      (* n (factorial (- n 1)))))

No if x: ... elif: ... else:. No debates over tabs vs. spaces. One bracket equals one operation. LLMs generate this perfectly on the first try.

2. ASG instead of AST — Code as a Graph
Traditional compilers build a tree. Synapse builds a graph (Abstract Semantic Graph). This means:

The AI sees all dependencies simultaneously.
Code transformations are seamless.
It’s perfectly optimized for analysis and refactoring.

3. Three Backends — One Codebase

Interpreter: Instant execution.
LLVM: Compilation into native code.
WebAssembly: Run it in the browser. Write once, deploy anywhere.

4. LSP Out of the Box
Autocompletion, hover hints, and error diagnostics. Just plug it into VSCode and it works.

5. Built-in Type Inference
Powered by Hindley-Milner typing. Write code without annotations — the compiler figures it out for you.

What’s Under the Hood?

✅ Full S-expression parser

✅ Interpreter with 60+ operations

✅ Pattern matching

✅ Lazy sequences

✅ Dicts / HashMaps

✅ Pipe operator |>

✅ Try/catch error handling

✅ WASM compilation

✅ Module system

✅ Robust Standard Library

✅ 48/48 core tests passing

Who is this for?

Not for you. Seriously.

Synapse was created so you could tell Claude:

"Write me a web server in Synapse."

And it will. Without errors. On the first attempt. Because the syntax is architected for how an AI "thinks."

What’s Next (v1.1)

Official VSCode Extension Marketplace launch.
LLVM Closures.
Comprehensive tutorial with real-world examples.
Benchmarks vs. Python/Lua/Node.js.

Try It Now

git clone https://github.com/Xzdes/synapse.git
cd synapse
cargo build --release
cargo run --bin synapse

Inside the REPL:

synapse> (print "Hello from the future!")
Hello from the future!

synapse> (|> (range 1 10) 
             (filter even?) 
             (map square) 
             sum)
120

This is just the beginning. There is a lot of work ahead, but it’s already stable enough to touch, try, and break.

The language not for humans. The language for AI. The future.

⭐ GitHub: github.com/Xzdes/synapse

P.S. Yes, I know Lisp-like syntax isn't new. But no one has built it specifically for LLMs with a native WASM backend and built-in LSP from day one.

Synapse 0.2: REPL, S-Expressions & Working Loops

Pavel — Sun, 25 Jan 2026 18:50:45 +0000

Hey everyone!

It's been a while since the last update on Synapse — the AI-first programming language built on Abstract Syntax Graphs. Today I'm excited to share version 0.2.0 with some major new features!

What's New in 0.2.0

1. S-Expression Syntax (Lisp-like)

I chose S-Expressions for maximum LLM-friendliness:

; Factorial using a loop
(let n 5)
(let result 1)
(let i 1)
(while (<= i n)
  (do
    (set result (* result i))
    (set i (+ i 1))))
result  ; => 120

Why S-Expressions?

✅ Unambiguous structure — every expression in parentheses
✅ No whitespace dependency — unlike Python
✅ Easy for AI to parse and generate
✅ Homoiconicity — code is data

2. Interactive REPL 🖥️

$ cargo run --bin synapse

Synapse 0.2.0 - AI-friendly language
Type :help for commands, :quit to exit.

synapse> (+ 1 2)
3
synapse> (let x 10)
()
synapse> (* x x)
100
synapse> :quit

REPL Features:

Command history (persisted)
:help, :quit, :ast, :clear commands
Execute files: synapse examples/demo.syn
One-liners: synapse -e "(* 6 7)"

3. Working Loops & Blocks

I have updated the interpreter to fully support while loops with the (do ...) construct for multiple statements:

(let sum 0)
(let i 1)
(while (<= i 5)
  (do
    (set sum (+ sum i))
    (set i (+ i 1))))
sum  ; 15

Benchmarks

Operation	Time
Simple arithmetic	~1.4 µs
Nested expressions	~1.8 µs
Conditionals	~2.1 µs
JSON serialization	~1.6 µs

Updated Roadmap

[x] S-Expression Parser (logos)
[x] Type Checker (Hindley-Milner)
[x] LLVM Backend
[x] Interpreter (30+ node types)
[x] REPL & CLI ✨ NEW
[x] While loops & blocks ✨ NEW
[ ] Recursive functions
[ ] Standard Library
[ ] Pattern Matching
[ ] VSCode Extension

Try It Now

git clone https://github.com/Xzdes/synapse.git
cd synapse
cargo run --bin synapse

Get Involved

⭐ GitHub
Open issues, PRs welcome!

What features would you like to see next? Drop a comment below!

I Built a Deep Learning Framework in Rust from Scratch - Part 2: GPU Backend and the Lonely Debugging Journey

Pavel — Tue, 20 Jan 2026 15:02:21 +0000

In Part 1, I showed the basic architecture of RustyASG - my attempt to build a deep learning framework in Rust. Since then, a lot has happened. I want to share both the technical progress and the reality of working on such a complex project alone.

The Struggle Was Real

After publishing Part 1, I spent weeks battling errors. The thing is - nobody really helps with projects like this. The topic is too specialized. I asked questions on forums, searched through issues in similar projects... crickets.

Neural networks (the AI assistants) helped somewhat - not to solve specific bugs, but to organize my thoughts and break down the chaos into manageable pieces. When you're staring at 36 compilation errors and your brain is fried, sometimes you just need someone (or something) to help you see the structure.

The hardest part wasn't writing new features - it was fixing subtle type mismatches and reference issues in Rust's strict type system.

What Actually Got Done

GPU Backend with wgpu

The main achievement of this period - a working GPU backend. Not a toy, but real GPU acceleration using wgpu (WebGPU API that works on Vulkan, Metal, DX12).

Implemented operations:

Element-wise: Add, Sub, Mul, Div, Neg, Abs, Exp, Log, Sqrt
Activations: ReLU, Sigmoid, Tanh, GELU, SiLU, LeakyReLU, Softmax
Matrix operations: MatMul (including batched)
Reductions: Sum, Mean, Variance
Convolutions: Conv2d with stride and padding
Shape operations: Transpose, Reshape, Broadcast

Each operation has its own WGSL shader generated at runtime.

Proper Error Handling

In the first version, the GPU code was littered with .unwrap() - the Rust equivalent of "crash and pray." I replaced all of that with proper Result types:

// Before (bad)
let node = graph.nodes.get(&node_id).unwrap();

// After (good)  
let node = graph.get_node(node_id)?;

Sounds simple, but this change touched hundreds of lines and uncovered many hidden issues. The fun part: get_node() returns Result, not Option, so I had to use map_err instead of ok_or_else. Rust teaches you to read type signatures carefully.

149 Tests Pass

The test suite now includes:

77 unit tests for core functionality
38 integration tests for neural network layers
26 GPU tests that compare CPU and GPU results
8 autograd tests for backpropagation

Every GPU operation is tested by running the same computation on CPU and GPU, then comparing results with tolerance for floating-point differences.

RustyASG vs Competitors: Honest Assessment

Let me be straight about where RustyASG stands compared to other options.

Compared to PyTorch/TensorFlow

Don't even compare. These are battle-tested frameworks with thousands of contributors, CUDA optimization, and years of development. RustyASG is an educational/experimental project.

Compared to tch-rs (Rust bindings to LibTorch)

tch-rs wins for production use.

tch-rs: Full PyTorch functionality, CUDA support, maintained by the community
RustyASG: Pure Rust, no external dependencies, but limited operations

Compared to Burn

Burn is more mature.

Burn: Multiple backends, WebGPU support, active development, production focus
RustyASG: Simpler codebase, easier to understand and modify

Compared to Candle (by Hugging Face)

Candle is better for inference.

Candle: Optimized for inference, quantization support, GGUF models
RustyASG: Training-focused with autograd, but slower execution

Honest Pros and Cons

What RustyASG Can Actually Do

✅ Train simple neural networks (MLP, small CNNs)
✅ Run on GPU via WebGPU (cross-platform: Windows, Linux, Mac, even web)
✅ Automatic differentiation for backpropagation
✅ Load/save models in SafeTensors format
✅ PyTorch-style API for datasets and dataloaders
✅ Full transformer building blocks (MultiHeadAttention, positional encodings)

What RustyASG Cannot Do (Yet or Ever)

❌ Run large models efficiently (no memory optimization)
❌ Match CUDA performance (WebGPU has overhead)
❌ Support all PyTorch operations (maybe 30% coverage)
❌ Run in production (not battle-tested)
❌ Distributed training
❌ Mixed precision training

When to Consider RustyASG

Learning how deep learning frameworks work internally
Need pure Rust without external dependencies
Experimenting with custom operations
Want to contribute to a young project

The repository includes examples that actually work:

Linear Regression - Basic gradient descent
XOR Problem - Simple MLP with backpropagation
MNIST - Digit classification (if you supply the dataset)

// Simple training loop
for epoch in 0..epochs {
    for (batch_x, batch_y) in loader.iter() {
        // Forward pass
        let output = model.forward(&batch_x);
        let loss = mse_loss(&output, &batch_y);

        // Backward pass
        loss.backward();

        // Update weights
        optimizer.step();
        optimizer.zero_grad();
    }
}

Code Quality

Published on crates.io: cargo add rustyasg
Documentation with rustdoc
No unsafe code (pure safe Rust)
Clean module structure

Conclusion

RustyASG is not a PyTorch killer. It's not trying to be. It's an honest attempt to understand how deep learning frameworks work by building one from scratch in Rust.

If you want to learn about computational graphs, automatic differentiation, and GPU programming - the source code is open. If you want to train GPT-4 - use PyTorch.

Links:

GitHub: https://github.com/xzdes/RustyASG
crates.io: https://crates.io/crates/rustyasg
Part 1: I Built a Deep Learning Framework in Rust from Scratch - Part 1

I Built a Deep Learning Framework in Rust from Scratch. Here’s How It Works.

Pavel — Tue, 02 Sep 2025 21:37:07 +0000

I've just published RustyASG to crates.io, a deep learning framework I built from the ground up in Rust. The project's goal was to explore a graph-based architecture, moving away from the eager execution model common in frameworks like PyTorch.

This post is a technical breakdown of its architecture, core design, and the most critical bug I had to fix to make it work correctly.

The Core Idea: Define-then-Run

RustyASG does not compute operations immediately. Instead, every operation constructs a node in an Abstract Semantic Graph (ASG). This graph serves as a complete blueprint of the entire computation before any numbers are crunched.

Once the graph is fully defined, it can be:

Analyzed: Statically infer tensor shapes to catch dimension mismatches before execution.
Differentiated: Automatically generate a new graph that calculates the gradients of any node.
Executed: Run the optimized graph on a backend, such as a CPU or GPU.

This approach provides a high degree of control and enables global optimizations like kernel fusion and static memory planning.

Architecture Overview

The framework's components are strictly separated. This flowchart shows how the pieces fit together:

tensor: A lightweight, symbolic handle to a node in the graph. All operations on it modify the graph.
asg: The core data structures that define the graph (Node, NodeType). This layer only describes computation.
autograd: The reverse-mode automatic differentiation engine. It transforms a forward-pass graph into a new graph that computes gradients.
runtime: Contains the Backend trait and its concrete implementations for CPU (ndarray) and GPU (wgpu).

Getting Started: Training a Transformer Block

An example is included to demonstrate a full training loop on a Transformer block.

Clone the repository:

git clone https://github.com/Xzdes/RustyAsg.git
cd RustyAsg

Run the demo (GPU is the default backend):
```
cargo run --example transformer_demo --release
```
To use the CPU, set the use_gpu flag to false in examples/transformer_demo.rs.

The output shows the training loss decreasing, which confirms that the entire stack—from graph construction to backpropagation on the GPU—is functioning correctly.

--- TRAINING LOOP START ---
Epoch: 1 , Loss: 3.081630
Epoch: 2 , Loss: 2.840065
...
Epoch: 15, Loss: 0.982813
--- TRAINING FINISHED IN 1.34s ---

The Hardest Bug: A Flaw in the Gradient Logic

The framework appeared stable until I implemented a gradient checker. The tests compared the analytical gradients produced by the autograd engine against numerical estimates. A key test, which emulated a LayerNorm operation, was consistently failing with an error margin of nearly 60%.

The root cause was a subtle but critical flaw in the backpropagation logic for division, specifically when broadcasting was involved.

When a vector like [10, 20] is divided by a scalar 2, the result is [5, 10]. During backpropagation, the gradient for the scalar 2 receives contributions from both elements of the output. Therefore, the incoming gradients must be summed to produce the final, correct gradient for the original scalar.

My autograd implementation for Add and Subtract handled this, but I had overlooked it for Divide.

// The incorrect autograd logic for the divisor 'b'
// grad_b = ... // Calculation was mathematically correct but missed a step.
// self.accumulate_grad(b_id, grad_b)?;

// The corrected logic
let b_shape = self.source_asg.get_node(b_id)?.shape.as_ref().unwrap().clone();
// ... calculate grad_b ...
if b_shape != grad_shape {
    // If 'b' was broadcast, its gradient must be summed to match its original shape.
    grad_b = self.grad_asg.add_node(None, NodeType::Sum(grad_b));
}
self.accumulate_grad(b_id, grad_b)?;

This conditional sum was the missing piece. Adding it fixed the LayerNorm test and validated the entire autograd engine.

What's Next

The foundation is now stable. The immediate roadmap is focused on expanding core functionality:

Implement gradients for MatrixMultiply and Softmax.
Add more optimizers, starting with AdamW.
Develop a memory-recycling buffer allocator for the wgpu backend to improve performance.
Implement model serialization to save and load trained weights.

Contributions are welcome. Please feel free to open Issues or Pull Requests.

You can find the project at:

GitHub: https://github.com/Xzdes/RustyAsg
Crates.io: https://crates.io/crates/rustyasg

I've Seen the Future of UI Development. It's Insane, Written in Rust, and Rendered by an AI.

Pavel — Thu, 28 Aug 2025 07:42:14 +0000

Hey dev.to community!

We’re all used to thinking about UI frameworks in the same terms: React, Vue, Svelte, Flutter, SwiftUI... We argue about the Virtual DOM, reactivity, and performance. But what if we're all looking in the wrong direction? What if the next big leap isn't about how we render, but who does the rendering?

I decided to test a crazy idea. What if, instead of writing code to draw every pixel of a button, we just described that button with words and let an AI draw it for us?

And after a debugging marathon that nearly broke my brain, I did it. Meet the Shadowin AI-Render Engine, a working prototype of a UI framework where the visuals are generated by Stable Diffusion in real-time.

The Concept: A Skeleton of Rust, A Skin of AI

The entire idea behind Shadowin is built on a simple but powerful separation of concerns:

The Logic is Rust. My code, written from scratch in Rust, is only responsible for the "physics" of the interface. It knows there's a button with ID=0, its dimensions are 200x60, it's located at coordinates (50, 50), and it's currently being hovered over. It knows that a click should increment a counter. But it has absolutely no idea what this button looks like.
The Visuals are AI. When it's time to draw, my Rust code doesn't touch the pixel buffer. Instead, it forms a text prompt and sends it to a locally running Stable Diffusion server:

"a crisp UI button with the text 'Submit', photorealistic, octane render, trending on artstation, dark sci-fi style, neon blue highlights, glowing, hovered state"

The neural network generates an image, and my engine simply "stamps" it onto the screen.

Here's what it looks like in action:

When you hover or click, it's not just the color that changes—the entire generated texture of the button swaps out!

How It Works Under the Hood

It's not magic, it's Rust and a bit of madness.

The Engine: Written from scratch in Rust, using winit for windowing and pixels for direct framebuffer access. No browsers, no Electron.
The AI Communicator: A module that talks to the stable-diffusion-webui API via HTTP requests using reqwest.
Caching: Image generation is slow. That's why the engine caches every generated asset. The first launch "warms up" the cache by generating all necessary states for all widgets. After that, the interface runs smoothly.
Synchronicity: After some painful experiments with async, I settled on a simple, blocking approach. Yes, the application freezes during generation, and that's honest. Make it work first, then make it fast.

Why Is This a Glimpse into the Future?

Imagine the possibilities:

Natural Language Theming: A user could type in the settings, "I want an interface in the style of a Fallout terminal" or "make everything look like a watercolor painting," and the entire UI would instantly transform.
Next-Level Adaptive Design: A "Delete" button could become visually more "alarming" based on the importance of the data being deleted. The interface could change its "mood" depending on the time of day.
Infinite Uniqueness: Your instance of the application will look different from anyone else's.

Yes, this approach has huge downsides. Accessibility is a nightmare. Performance needs serious work. And as you can see from the demo, getting Stable Diffusion v1.5 to render text clearly and consistently is a challenge in itself. I chose not to compromise by rendering text on top of a generated background; I forced the neural network to draw the button entirely, text included, to prove the purity of the concept. Using more modern models specifically trained for text rendering (like DeepFloyd-IF or a fine-tuned SDXL) would solve this.

But that doesn't matter. What matters is that it works. It's proof that we can create interfaces in a completely new way.

I believe the future lies in declarative, context-aware, and personalized interfaces. And AI is the key to unlocking that future.

The entire project is up on GitHub. Check it out, try it, break it. Let's brainstorm together where this crazy idea could lead.

What do you think? Is this a breakthrough technology or just a fun toy? Let me know in the comments

I Supercharged My Browser GPT with Rust and WebAssembly: The Journey to a Dual-Engine AI

Pavel — Tue, 26 Aug 2025 09:02:31 +0000

(This is Part 2 of my journey. If you haven't read it yet, start with Part 1 here!)

Last time, I performed a small miracle: I built, trained, and got a GPT model to generate text using only pure JavaScript. I went through the trenches of debugging gradients, battled NaN values, and ultimately ended up with a working model. It was proof that the fundamental concepts of AI are accessible right in the browser.

But a question lingered in the back of my mind: "What if...?"

What if I needed performance comparable to a native application?
What if I wanted the strict typing and safety that JavaScript can't offer?
What if I wanted to turn this educational project into a true high-performance experimental testbed?

The answer to all these questions was the same: Rust and WebAssembly.

Chapter 1: The Birth of the Second Engine — `RustyGradients`

Before bringing anything into WebAssembly, I needed the engine itself. Simply porting slmnet.js to Rust felt uninspired. Instead, I decided to build a complete, strictly-typed, and modular deep learning framework from scratch, heavily inspired by PyTorch.

And so, RustyGradients was born.

This is not just a port. It's a different beast entirely, built on the principles of Rust:

Safety and Reliability: No more undefined is not a function. The Rust compiler became my best friend, catching errors long before they could ever reach the browser.
Performance: Rust compiles down to blazingly fast machine code. All those endless loops in matrix multiplications and backpropagation now execute at near-native speed.
Robust Architecture: A Module trait (similar to torch.nn.Module) and a clear separation of operations (ops), optimizers (optim), and layers (nn) makes the code clean and extensible.

Building RustyGradients was an adventure in itself, but the goal was always clear: to forge a powerful engine ready to be installed in my "race car."

Chapter 2: The Bridge to the Future — Integration via WebAssembly

WebAssembly (WASM) is the magic that allows code written in languages like Rust, C++, or Go to run right in the browser. It's not a replacement for JavaScript, but its perfect companion: JS handles the UI, while WASM crunches the heavy numbers.

Using the wasm-pack tool, I compiled the entire RustyGradients framework and the GPT model written with it into a single, compact .wasm file.

For JavaScript, the complexity of the Rust code was hidden behind a simple and elegant API:

// JS code in index.html
import init, { WasmGptTrainer } from './pkg/rusty_gradients.js';

// ... initialization ...

// I can create an instance of the entire Rust model in one line!
let trainer = new WasmGptTrainer(config...);

// A single training step is just one function call,
// all the magic happens inside WASM
const loss = trainer.train_step(x_batch, y_batch); 

// Text generation is also a single call
const generated_ids = trainer.generate(prompt_ids, 300, 0.8, 10);

All the math, all the tensors, all the gradients now live and breathe inside the lightning-fast WASM module.

Chapter 3: The Ultimate Testbed — Choice and Persistence

Just replacing the engine wasn't interesting enough. I wanted to create a platform where I could compare these two worlds. And frankly, I was tired of losing my trained model every time I refreshed the page.

This led to the final version of slmnetGPT, which now includes:

1. The Engine Switch:
The UI now features a choice: run the model on pure slmnet.js or on RustyGradients (WASM). This allows for a real-time comparison of the training speed and performance of the two approaches.

2. Long-Term Memory with IndexedDB:
localStorage was fine for a start, but it's too small and slow. I integrated IndexedDB, a full-fledged NoSQL database in the browser. Now you can:

Save the state of a trained model after a long session.
Load it at any time, even after a browser restart.
Delete the saved weights to start fresh.

Crucially, the models for the JS and WASM engines are saved separately. You can train both and switch between their "brains."

3. Localization:
As a final touch, I added a simple language switcher (RU/EN) to make the testbed even more user-friendly.

Conclusion: Two Worlds, One Mission

This project grew from a simple "What if?" into a comprehensive "Yes, and here's how." It has become a living demonstration of a powerful synergy:

JavaScript remains the king of the user interface. Its flexibility and simplicity make building interactive applications a joy.
Rust + WebAssembly is the turbocharger for your web app. They take over the heavy lifting, providing a level of performance that was once only a dream in the browser.

Now, slmnetGPT is more than just a model. It's a laboratory in your browser.

Come in, experiment, and compare:

The JavaScript version: https://github.com/Xzdes/slmnetGPT
The Rust engine: https://github.com/Xzdes/RustyGradients

Run the training on both engines. Save the result. See how the Rust-trained model generates text. Feel the difference.

I Taught My JavaScript AI to Rewrite Its Own Code

Pavel — Sat, 23 Aug 2025 17:45:54 +0000

What if the AI systems we build could take an active role in their own evolution? This is the question that drove my latest experiment.

Many of you are familiar with my from-scratch neural network library, slmnet. After porting it from the browser to Node.js, I set myself a challenge: to stay within a pure JavaScript and Node.js environment. No Python wrappers, no C++ bindings. This constraint was intentional—it forced a deeper understanding and ensured that every part of the system was transparent and malleable.

This foundation allowed me to explore a fascinating, advanced AI concept: creating a system that could learn to optimize itself.

I architected a high-level supervisor, the Meta-Controller, designed to guide an AI agent (powered by slmnet itself) through a cycle of self-improvement. The process is a fully automated loop:

Benchmark: The cycle begins by measuring the current performance of the slmnet library. A standardized script runs a short training session and outputs two key metrics: final loss (accuracy) and execution time (speed). This gives us a hard, numerical baseline.
Analyze & Decide: The Meta-Controller then presents the AI agent with a choice. It shows it the current, stable code of a core file (e.g., Ops.js) and a pre-written "mutation"—an alternative version containing a potential optimization. The AI's task is not to generate complex code from nothing, but to act as an engineer in a code review. It must analyze the difference, form a hypothesis, and make a simple, critical decision: is this change worth testing?
Automated Testing: If the AI agrees, the Meta-Controller temporarily overwrites the live code with the proposed mutation. It then immediately re-runs the exact same benchmark to measure the impact of the change.
Evolve or Revert: This is the core of the feedback loop. The new performance metrics are compared to the baseline.
- If the change was beneficial—faster execution or a lower loss—it's deemed a successful evolution. The new code becomes the new standard.
- If the change was detrimental or made no difference, it’s a failure. The Meta-Controller instantly reverts the code back to its last known stable version.

The first time I ran this loop, the AI—being very new—produced a nonsensical answer. But the system itself performed flawlessly. It correctly identified the invalid response, logged the failure, and maintained the stability of the codebase without human intervention. The architecture for self-optimization was proven to work.

What this experiment demonstrates is that the core concepts of meta-learning and self-improving systems are not exclusively the domain of massive, closed-off models. We can build, test, and understand these feedback loops using accessible tools. By confining the project to pure JavaScript, I built a transparent testbed where the process of an AI reasoning about its own code is not a black box, but a clear, observable engineering cycle.

https://github.com/Xzdes/reasoningHeuristicAI

I Gave My LLM a Promotion: Now It Delegates Its Own Work

Pavel — Fri, 15 Aug 2025 20:41:21 +0000

Large Language Models are powerful, but they're also resource-intensive. Every query, no matter how simple, consumes expensive computational cycles. I realized a huge chunk of my server costs was from the LLM repeatedly answering "hello," "thanks," and "ok."

These queries are a waste. They don't teach the model anything new. They don't require complex reasoning. They are pure resource drain.

My first thought was to filter them out on the client-side. But that creates a manual chore—I'd have to constantly update the list of simple phrases. That approach doesn't scale.

So, I flipped the problem on its head. What if the LLM could solve this problem itself?

The core idea is this: I created a system where the LLM itself decides which queries are too simple for its attention and teaches a client-side helper to handle them in the future.

The Architecture: Self-Delegation

I built a project around this single concept, stripping away everything non-essential from a previous version. It has only two key parts:

The Server-Side "Teacher" (The LLM): The main, powerful model. Its job is to handle complex tasks and—crucially—to identify low-value, repetitive queries.
The Client-Side "Gatekeeper" (The Helper): A tiny, zero-dependency JavaScript agent in the browser. It intercepts all user input and asks the LLM for help only when it encounters something it hasn't been taught to handle.

How the LLM Offloads Its Work

The first time a user sends a simple query like "thx", the Gatekeeper doesn't recognize it and forwards it to the LLM.

The LLM knows "thx" is simple. Instead of just sending back a text answer, it sends back a special JSON payload containing a direct order:

{
  "userResponse": "You're welcome!",
  "learningInstruction": {
    "command": "LEARN_SIMPLE_PHRASE",
    "query": "thx",
    "response": "You're welcome!"
  }
}

This learningInstruction is the key. It’s the LLM telling its Gatekeeper:

"I've answered this for you once. Now learn it. From now on, you handle this query yourself. Do not send it to me again."

The Gatekeeper receives this command, saves the new rule to the browser's localStorage, and delivers the response to the user. The user sees nothing but a fast response.

But in the background, the system just became smarter and more efficient. The next time "thx" is sent, the Gatekeeper handles it instantly, and the server is never bothered.

The LLM is actively, automatically, and invisibly making its own job easier. It's not being trained by a human; it's training its own assistant to filter out the noise.

This project was an exercise in minimalism. I threw out a complex neural network library and other unnecessary features to focus solely on perfecting this self-delegation loop. The result is a lean, powerful system that demonstrates a smarter way to build AI applications.

We don't just need bigger models; we need smarter architectures where models can work together and optimize their own workflows.

Check out the full implementation on GitHub and see the self-delegation in action.

https://github.com/Xzdes/slmnet-Hybrid

I Built an Honest AI Agent to Fight Hallucinations. Here's How It Works.

Pavel — Thu, 14 Aug 2025 21:43:15 +0000

Hello, everyone!

We've all been amazed by Large Language Models. They can generate code, poetry, and coherent essays seemingly out of thin air. But I’ve always been bothered by their dark side: a tendency to "hallucinate." Ask a question about an obscure topic, and a model might confidently invent a non-existent term or "fact."

I decided to tackle this problem head-on and came to a simple conclusion: the process is more important than size, and honesty is more important than omniscience.

Instead of relying on a giant model that knows everything, I created WeblookAI—an autonomous agent that starts by knowing nothing. Its only power lies in a clever algorithm I designed, which allows it to find, analyze, and synthesize information, all while remaining intellectually honest.

This post is the story of how I built it. I believe this project will be especially interesting for learners, as it clearly demonstrates how you can achieve impressive results with minimal resources. And yes, I asked my local neural network what it thinks of the agent—it's a fan ;)

The Problem: A Passive "Know-it-all" vs. My Active "Researcher"

The standard approach to AI is to build a colossal library of knowledge and force a model to memorize it. My approach is inverted. I didn't want to build another librarian; I wanted to build a detective with nothing but a notebook, a magnifying glass, and endless curiosity.

I built WeblookAI on three core principles that make it unique:

1. Intellectual Honesty — My Guiding Principle

This isn't just a feature; it's the foundation. I designed the agent's main advantage to be its ability to minimize hallucinations. I achieved this through a strict, final-step prompt: "Answer the question based only on the provided facts. If a direct answer isn't available, say so."

The results are fantastic. When I asked it about "Wise JSON," it didn't invent a new framework. Instead, it honestly reported that it found separate data on "WISE" and "JSON." It just so happens that Wise JSON is another project of mine, but the internet doesn't know much about it yet—and my agent correctly reflected that reality.

2. My Homegrown Web Agent — The "Magnifying Glass and Notebook"

Instead of paying for search APIs, I built a unique data-gathering engine from scratch using Puppeteer. It doesn't just "search"; it conducts a genuine, almost human-like investigation I programmed it to perform:

Fault Tolerance: If one "fishing spot" (Brave Search) is empty, the agent doesn't give up. It automatically moves to the next one (DuckDuckGo, StartPage).
"Human-like" Emulation: I made the agent present itself to websites as a standard Chrome browser on Windows, allowing it to bypass basic anti-bot defenses.
"Smart" Filtering: I abandoned fragile CSS selectors that break with every site redesign. Instead, I programmed my agent to grab all links on a page and apply its own heuristic algorithm: it discards ads and junk using a domain blacklist, analyzes URLs for "usefulness," and assigns a "relevance score" to each link.

3. 100% Local and Autonomous — My Commitment to Independence

A Local Brain: The entire "thinking" process—from planning the search to synthesizing the final answer—is handled by a language model (like llama3:8b) that I run on my own machine via Ollama.
Privacy: Your queries and the data my agent finds never leave your computer.
Zero Cost: The project I built requires no API keys, no subscriptions, and is completely independent of the whims of cloud services.

How I Made It Work in Practice

I broke down the entire process into three crystal-clear steps:

DECOMPOSE: The local LLM receives the user's question and breaks it down into 3-4 simple search queries, creating a research plan.
INVESTIGATE: My WebAgent on Puppeteer heads into the field—methodically cycling through search engines, filtering links, visiting websites, and extracting clean, core text content.
SYNTHESIZE: The local LLM receives the original question and the collected "evidence" with one ironclad rule I gave it: do not invent.

Who I Built This Project For

First and foremost, for learners. For anyone who wants to understand that AI is not just about giant models but also about elegant, effective algorithms. I've open-sourced it on GitHub so you can download it, run it with a single command, and see firsthand how you can achieve impressive results by making a small model "think" the right way.

⚠️ Important Notice:
This project and its web scraping agent are intended for educational purposes only. Please be respectful of the websites the agent visits. You are solely responsible for your actions and any use of this tool.

You can find the entire project on my GitHub:
https://github.com/Xzdes/weblookai

Thanks for reading! I hope this story inspires you to start your own experiments. Feel free to ask questions in the comments.

I Built a "GPT" in My Browser in One Evening. The Journey from Amnesia to Stable Learning with Pure JS.

Pavel — Tue, 12 Aug 2025 15:45:55 +0000

Hello, community!

https://github.com/Xzdes/slmnetGPT

Sometimes, the best projects are born from a simple "What if...?" question. One evening, I was looking at some of my old code—a tiny neural network library in JS I had written for fun, called slmnet. A thought struck me: what if, instead of just solving toy problems, I could use it to build a real interactive organism that lives and learns right in the browser?

And so, Project "Living Brain" was born.

The Goal: Create a chatbot that:

Runs entirely on the client-side, with no servers.
Learns in real-time from user conversations.
Saves its knowledge in LocalStorage so it doesn't suffer from amnesia after a page refresh.

The Tools:

My self-made library, slmnet (Tensors, Dense/ReLU layers, an SGD optimizer).
Pure JavaScript.
The browser's LocalStorage as its hippocampus.

A Journey of Pain and Discovery: The Three Stages of Failure

I thought it would be easy. I was wrong. The bot went through three evolutionary stages, and each one was a classic problem from the world of AI.

Stage 1: The Echo Bot
The first version was terrible. It would simply memorize the last answer it was taught and repeat it for every single question. Boring.

Stage 2: The Bot with Catastrophic Forgetting
I solved problem #1, only to create a new one. The bot would perfectly learn a new lesson (e.g., "How are you?" -> "I'm great!"), but in the process, it would completely forget everything it knew before ("Hello" -> "Greetings"). This is a classic AI problem where new knowledge completely overwrites the old. I was literally forcing it to cram for one test question, wiping its entire memory clean.

Stage 3: The Bot with an Identity Crisis
I taught it to stop forgetting old lessons. But as soon as I added a new word to its vocabulary, its brain (the network architecture) had to be rebuilt. My code would just create a new, empty brain. And although it retrained on all the old examples, the random initialization of its weights meant its "personality" completely changed. It would start answering "Hello" with "See you later." There was no stability.

The Final Insight: Thinking Like Real AI

The solution came when I stopped thinking like a programmer and started thinking like... a trainer.

Experience Replay: Instead of cramming one lesson, I created a "memory bank" to store all past conversations. Now, during training, the bot runs through its entire history, gently adjusting its weights and reinforcing old knowledge along with the new.
Transfer Learning: When new words appear, I stopped "demolishing the house." Instead, I implemented a "brain transplant": I create a new, larger model and carefully copy all the weights from the old one into it. This way, its personality is preserved, and there's new space for new knowledge.

Let's Be Honest: This Isn't ChatGPT

My project is a "little" language model, not a large one.

It doesn't generate text; it performs classification by selecting the most appropriate response from those it already knows.
It uses a simple "Bag-of-Words" model, not complex transformers.
Its "understanding" is a statistical correlation, not semantic awareness.

But you know what? That doesn't matter. This one-evening project allowed me to experience the same journey AI researchers go through: from the simplest mistakes to implementing fundamental concepts. And it all happens in your browser window.

It was an incredibly fascinating ride. Sometimes, old code is the best playground for new ideas.

DEV Community: Pavel

I Was Asked to Add a Simple Classifier to a Website. Then I Saw the 250 MB Download.

The problem

So I started digging

Introducing wasmicro

What it does today

Why not just use a big runtime?

The current numbers

Is it faster?

What is missing

The real lesson

Final thought

I built a deep learning framework in Rust from scratch — Part 3: the road to crates.io

Phase 1 — 121 warnings to zero

Phase 2 — the declarative layer API

Phase 3 — GPU completeness

Phase 5 — closing the RoPE wound, and ecosystem

Phase 6 — the polish that makes or breaks a release

The last boss: CI on three platforms

Going international: two READMEs

The numbers

Publishing

What's next (v0.5 — performance)

Reflection

Synapse v1.0 — A Programming Language for AI, Not Humans

Why another language?

Key Advantages

What’s Under the Hood?

Who is this for?

What’s Next (v1.1)

Try It Now

Synapse 0.2: REPL, S-Expressions & Working Loops

What's New in 0.2.0

1. S-Expression Syntax (Lisp-like)

2. Interactive REPL 🖥️

3. Working Loops & Blocks

Benchmarks

Updated Roadmap

Try It Now

Get Involved

I Built a Deep Learning Framework in Rust from Scratch - Part 2: GPU Backend and the Lonely Debugging Journey

The Struggle Was Real

What Actually Got Done

GPU Backend with wgpu

Proper Error Handling

149 Tests Pass

RustyASG vs Competitors: Honest Assessment

Compared to PyTorch/TensorFlow

Compared to tch-rs (Rust bindings to LibTorch)

Compared to Burn

Compared to Candle (by Hugging Face)

Honest Pros and Cons

What RustyASG Can Actually Do

What RustyASG Cannot Do (Yet or Ever)

When to Consider RustyASG

Code Quality

Conclusion

I Built a Deep Learning Framework in Rust from Scratch. Here’s How It Works.

The Core Idea: Define-then-Run

Architecture Overview

Getting Started: Training a Transformer Block

The Hardest Bug: A Flaw in the Gradient Logic

What's Next

I've Seen the Future of UI Development. It's Insane, Written in Rust, and Rendered by an AI.

The Concept: A Skeleton of Rust, A Skin of AI

How It Works Under the Hood

Why Is This a Glimpse into the Future?

I Supercharged My Browser GPT with Rust and WebAssembly: The Journey to a Dual-Engine AI

Chapter 1: The Birth of the Second Engine — RustyGradients

Chapter 2: The Bridge to the Future — Integration via WebAssembly

Chapter 3: The Ultimate Testbed — Choice and Persistence

Conclusion: Two Worlds, One Mission

I Taught My JavaScript AI to Rewrite Its Own Code

I Gave My LLM a Promotion: Now It Delegates Its Own Work

The Architecture: Self-Delegation

How the LLM Offloads Its Work

I Built an Honest AI Agent to Fight Hallucinations. Here's How It Works.

The Problem: A Passive "Know-it-all" vs. My Active "Researcher"

How I Made It Work in Practice

Who I Built This Project For

Chapter 1: The Birth of the Second Engine — `RustyGradients`