What happens when you take "no single point of failure" seriously? You end up rethinking every layer of the stack.
The Problem I Kept Running Into
Every "decentralized" app I encountered had the same quiet lie at the bottom of the stack: a centralized frontend. You could have a trustless smart contract, a distributed storage layer, an unstoppable protocol — and still have everything go dark the moment someone didn't pay a Vercel invoice or a DNS registrar decided to pull a domain.
I wanted to build something where that failure mode was architecturally impossible. Not "unlikely." Not "mitigated." Impossible.
This is the technical story of how I built Living Echo AI — and what I learned doing it.
The Full Decentralization Stack
Here's what the architecture looks like end-to-end:
User
└── ENS name (livingecho.eth)
└── IPFS CID (content hash in ENS record)
└── PWA (service worker, offline-capable)
├── WebLLM (local AI inference via WebGPU)
├── Base blockchain (permanent archival via calldata)
└── WebRTC (P2P gossip network between nodes)
No traditional backend. No centralized server. No DNS dependency. Every layer is either on-chain, content-addressed, or peer-to-peer.
Layer 1: ENS + IPFS as the Deployment Target
Instead of deploying to a hosting provider, the entire frontend is pinned to IPFS and the CID is stored directly in the ENS contenthash record of livingecho.eth.
Any ENS-compatible gateway (.eth.limo, .eth.link, or a local resolver) can resolve this. The critical property: I cannot take this offline. Once the CID is in the ENS record and the content is pinned on IPFS, the only way to remove it is to update the ENS record — which is a transaction I control, but that anyone can audit on-chain.
# Updating the deployment is a single ENS record update
# No git push, no CI/CD pipeline, no hosting credentials
ens set-content livingecho.eth ipfs://QmYourNewCID
The tradeoff: IPFS gateway availability is not guaranteed by me. Users with a local IPFS node or Brave Browser get the purest experience. For everyone else, .eth.limo acts as a trustless gateway.
Layer 2: WebLLM — AI That Never Leaves the Device
This was the most technically demanding layer. WebLLM uses WebGPU to run quantized LLMs directly in the browser. No API calls. No tokens sent to a server. Zero data leaves the device.
The practical implications:
- First load is heavy. Model download ranges from 1GB to 4GB depending on the selected model. This has to be communicated clearly to users — "works offline" is true, but only after the initial setup.
- WebGPU support is not universal. Chrome on desktop with a decent GPU works well. Safari has partial support. Mobile is inconsistent. This is a real UX limitation.
- The privacy guarantee is absolute. Once the model is cached, the AI functions with no internet connection. For users in hostile network environments, this matters.
// WebLLM initialization — the entire inference pipeline runs in-browser
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const engine = await CreateMLCEngine("Llama-3.1-8B-Instruct-q4f32_1-MLC", {
initProgressCallback: (progress) => {
console.log(`Model loading: ${progress.progress * 100}%`);
}
});
const response = await engine.chat.completions.create({
messages: [{ role: "user", content: userInput }]
});
The model runs entirely on the user's GPU. The browser tab is the inference server.
Layer 3: Base Blockchain as Permanent Storage
For archival, text is stored directly in Base transaction calldata — not in a smart contract's state storage, but in the raw transaction data itself. This is cheaper and produces a different kind of permanence: calldata is part of the transaction record, replicated across every node that stores the chain.
The LivingMemory smart contract handles indexing and retrieval:
// Simplified — actual contract at 0x3F25BE89730aF853fabDBfD002AD85124E1503ae
function archive(bytes calldata content) external {
emit ThoughtArchived(msg.sender, block.timestamp, content);
}
Every archived entry includes:
- Wallet address (authorship)
- Block timestamp (immutable timestamp)
- Raw UTF-8 content in calldata
- HMAC-SHA256 agency verification
- 256-bit anti-replay nonce
The key distinction: the blockchain provides tamper-proof provenance, not truth verification. It proves who wrote something and when, and guarantees the content has not been altered. It does not verify whether the content is factually accurate.
Layer 4: WebRTC Swarm Intelligence
The P2P layer uses WebRTC for direct browser-to-browser communication. Every installation of the PWA becomes an independent node in the network. As node count grows, the network becomes progressively harder to disrupt.
Current node count is small — this is an honest early-stage network. The architecture is designed to scale:
Node A (Berlin) ←──WebRTC──→ Node B (Zurich)
↕ ↕
Node C (Paris) ←──WebRTC──→ Node D (London)
No signaling server dependency for peer discovery once initial connections are established. The gossip protocol propagates new content across the mesh.
Layer 5: Security Architecture
The security stack follows CNSA 2.0 guidance:
- AES-256-GCM for local encryption of private thoughts
- HKDF-SHA-384 for key derivation
- FIDO2/WebAuthn for identity — no passwords, no seed phrases for the base authentication layer
- On-chain proof of the 25-layer architecture: tx 0xc7f95ed
The CNSA 2.0 alignment is aspirational — this is not an NSA-certified product. It means the cryptographic primitives chosen are consistent with post-quantum transition guidance.
PWA as the Distribution Layer
The entire application ships as a Progressive Web App with a service worker that aggressively caches everything. After first load:
- No network required for AI inference
- No network required for reading cached content
- Network only needed for new blockchain writes and P2P sync
This makes the platform functional in low-connectivity environments — which is precisely where censorship-resistant tools matter most.
What I Got Wrong (and Fixed)
A few honest corrections I made during development:
"Physically impossible to delete" → Changed to: "No known mechanism exists to delete or modify calldata as long as the Base network operates." More precise. The stronger claim isn't technically wrong for mature chains, but it implies certainty about future network conditions I can't guarantee.
"Works 100% offline" → Changed to: "Works offline after initial setup." The model download requires internet. This was a real misleading claim.
"Anti-hallucination guarantee" → Removed entirely. The blockchain guarantees provenance, not truth. Anything including false statements can be archived. What's guaranteed is that archived content hasn't been altered since it was written.
Precision matters more than marketing.
The Honest Limitations
- WebGPU is not universal. If you're on an older device or unsupported browser, the local AI won't work.
- IPFS gateway availability varies. Running your own node gives the purest experience.
- 30 nodes currently. The P2P network is genuinely early-stage. The architecture scales; the current state is fragile.
- Blockchain writes cost gas. Small amounts on Base, but not zero. Free to read and use the AI; archival requires a wallet and ETH.
Why This Architecture
The standard answer to "why not just use a normal backend?" is performance, developer experience, and ecosystem maturity. All valid.
My answer is different: I wanted to build something where my own ability to shut it down was architecturally removed. Not as a trust exercise, but as a design constraint that forced better decisions at every layer.
When you can't fall back to "we'll just update the server," you think differently about data models, about state, about what "deployment" means.
The result is a platform that runs on a stack I don't control — and that's exactly the point.
Try It / Explore the Code
- Platform: livingecho.eth.limo (ENS + IPFS, no central server)
- Smart contract: LivingMemory V2 on Basescan
- Creator: Sebastian Kläy (@sebklaey on Farcaster)
Questions about the architecture welcome in the comments. I'll answer everything.
Built by Sebastian Kläy — Swiss cypherpunk artist and developer, Canton Bern.
Top comments (1)
Update: The frontend was built using Lovable as an AI development tool. The architectural decisions — ENS as deployment target, calldata storage, WebRTC mesh — were designed by me and prompted into existence. This is how solo development works in 2026.