DEV Community: Arshdeep Singh

SentrySearch: Semantic Search Over Dashcam Footage Using Gemini Embedding 2

Arshdeep Singh — Thu, 26 Mar 2026 09:59:49 +0000

SentrySearch: Semantic Search Over Dashcam Footage Using Gemini Embedding 2

Written by Arshdeep Singh

Scrubbing through hours of dashcam footage to find one specific moment is exactly as tedious as it sounds. You remember something happened — a car cut you off, someone ran a red light — but now you're stuck fast-forwarding through gigabytes of MP4 files like it's 2003.

SentrySearch solves this. It's an open-source Python CLI that lets you search raw video files in plain English. Type what you're looking for, get a trimmed clip back.

sentrysearch search "red truck running a stop sign"

That's it.

What Is SentrySearch?

SentrySearch is a command-line tool built by @ssrajadh that brings semantic search to any folder of MP4 files. It was originally built for Tesla Sentry Mode footage (hence the name), but it works with any dashcam or video library.

The core idea:

Index your footage once
Search it with natural language queries
Get back an auto-trimmed clip of the matching moment

No transcriptions. No frame captioning. No OCR. Just raw video → vectors → search.

How It Works: The Technical Core

The secret sauce is Google's Gemini Embedding 2 — the first natively multimodal embedding model that maps text, images, audio, and video into a single unified vector space.

Here's what that means in practice:

When you search for "car cutting me off at an intersection", Gemini converts that text into a 768-dimensional vector. It can also convert a 30-second video clip into a vector in that same space. So text and video become directly comparable — no intermediate step required.

The Pipeline

MP4 Files → ffmpeg chunking → Gemini video embeddings → ChromaDB (local vector store)
                                                              ↓
                                              Text query → Gemini text embedding → cosine similarity → top match → trimmed clip

Step by step:

Chunking — ffmpeg splits each MP4 into overlapping 30-second chunks (configurable). Overlap ensures events that span chunk boundaries aren't missed.
Still-frame skipping — chunks with no meaningful visual change (parked car, nothing happening) are skipped automatically. This saves API calls and reduces cost.
Embedding — each chunk is uploaded to the Gemini Embedding API, which processes exactly 1 frame per second and returns a dense vector.
Storage — vectors are stored in a local ChromaDB database alongside metadata (source file, timestamp offset).
Search — your query is embedded as text into the same vector space, matched via cosine similarity, and the top result is trimmed from the original file.

Gemini Embedding 2: The Breakthrough That Makes This Possible

Before Gemini Embedding 2, building something like SentrySearch would require:

Running a vision model on each frame to generate captions
Embedding those captions as text
Hoping the captions captured what you actually care about

That's slow, lossy, and expensive.

Gemini Embedding 2 eliminates the middleman. It's Google's first model where video, text, images, audio, and PDFs all project into a single joint embedding space. A text query is directly comparable to a video clip at the vector level.

This is what makes sub-second semantic search over hours of footage practical.

Key specs:

768-dimensional vectors (search mode)
Native video support — 1 frame/second processed regardless of source FPS
Available via Gemini API and Vertex AI
Works with LangChain, LlamaIndex, Haystack, ChromaDB, Qdrant, Weaviate

Getting Started

Prerequisites

Python 3.10+
ffmpeg (or let it use bundled imageio-ffmpeg)
Gemini API key (free tier available at aistudio.google.com)

Install

git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
python -m venv venv && source venv/bin/activate
pip install -e .

Setup

sentrysearch init
# Prompts for your Gemini API key, writes to .env, validates with a test embedding

Index Your Footage

sentrysearch index /path/to/dashcam/footage

Output:

Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 1/4]
Indexing file 1/3: front_2024-01-15_14-30.mp4 [chunk 2/4]
...
Indexed 12 new chunks from 3 files. Total: 12 chunks from 3 files.

Search

sentrysearch search "red truck running a stop sign"

Output:

#1 [0.87] front_2024-01-15_14-30.mp4 @ 02:15-02:45
#2 [0.74] left_2024-01-15_14-30.mp4 @ 02:10-02:40
#3 [0.61] front_2024-01-20_09-15.mp4 @ 00:30-01:00

Saved clip: ./match_front_2024-01-15_14-30_02m15s-02m45s.mp4

Tesla Dashcam Overlay (Bonus Feature)

For Tesla owners, SentrySearch can burn speed, GPS location, and timestamp directly onto your trimmed clips:

sentrysearch search "car cutting me off" --overlay

This reads SEI metadata embedded in Tesla dashcam files and renders a HUD showing:

Speed (MPH)
Date and time
City and road name (via OpenStreetMap reverse geocoding)

Requires Tesla firmware 2025.44.25+ and HW3+.

pip install -e ".[tesla]"

Pricing: Is It Practical?

Indexing 1 hour of footage costs approximately $2.84 with Gemini's embedding API at default settings (30s chunks, 5s overlap).

Breakdown:

1 hour = 3,600 seconds → 3,600 frames at $0.00079/frame = ~$2.84
Still-frame skipping can cut this significantly for parked/security footage
Search queries cost almost nothing (text embeddings only)

Cost optimization levers:
| Option | Effect |
|--------|--------|
| --chunk-duration 60 | Fewer chunks = fewer API calls |
| --overlap 0 | No overlap = minimum chunks |
| Still-frame skipping (default ON) | Skips idle footage = direct savings |
| --no-preprocess | Raw chunks (no ffmpeg downscaling) |

Limitations & Honest Caveats

Chunk boundary problem — events that span two chunks may not match perfectly. Overlapping windows help, but aren't perfect.
Gemini Embedding 2 is in preview — API behavior and pricing may change.
No local model option — currently requires Gemini API. The community is watching for open-source multimodal embedding models to reach this quality level.
Driving footage only for Tesla overlay — SEI telemetry isn't present in parked/Sentry Mode clips.

The Bigger Picture: Multimodal RAG Is Here

SentrySearch is a clean, practical example of what becomes possible when embedding models go truly multimodal.

The same architecture can apply to:

Security camera footage — search hours of CCTV with natural language
Sports video — find specific plays or moments
Meeting recordings — semantic search without transcription
Medical imaging — cross-modal retrieval across reports and scans

We're entering an era where the traditional RAG pipeline (chunk text → embed → retrieve) expands to cover every modality. Gemini Embedding 2 is the first production model that makes this real with video.

SentrySearch is a sharp, well-executed proof of concept. And it ships today.

Resources

GitHub: github.com/ssrajadh/sentrysearch
Gemini Embedding 2 docs: ai.google.dev/gemini-api/docs/models/gemini-embedding-2-preview
HN discussion: news.ycombinator.com/item?id=47427193
Gemini API key: aistudio.google.com/apikey

Written by Arshdeep Singh

Real-Time Block Computing: Track Physical Objects & Bounce Digital Elements Off Them

Arshdeep Singh — Tue, 24 Mar 2026 18:43:07 +0000

Real-Time Block Computing: Track Physical Objects & Bounce Digital Elements Off Them

How @bongyunng's viral OpenCV demo works — real-time physical object tracking with digital physics simulation. Full code walkthrough + the spatial computing context.

Inspired by @bongyunng's viral Instagram demo

Introduction

What if your screen wasn't a window into a digital world — but a surface where digital and physical coexist, interact, and respond to each other in real time?

That's exactly what developer @bongyunng demonstrated in a recent viral reel: a real-time "Block Computing" programme built from scratch that tracks physical objects through a camera feed and bounces digital elements off them — live, frame by frame.

No AR headset. No Unity engine. Just OpenCV, Python, and a deep understanding of how digital and physical can meet.

This post breaks down every concept behind that demo: how real-time object tracking works, how physics simulation is layered on top, and why this sits at the cutting edge of spatial computing in 2026.

What Is "Block Computing" in This Context?

The term block computing here refers to treating physical objects as computational blocks — discrete, trackable units that the system processes frame-by-frame. Each physical object becomes a block of data: its position, velocity, bounding box, and surface normal.

The programme computes:

Where the object is (detection + tracking)
How it's oriented (surface normal estimation)
What digital element should collide with it
How that element should react (physics response — bounce, deflect, slide)

This is fundamentally different from traditional AR, which overlays digital elements. Here, the digital elements have physics awareness — they respond to physical geometry.

Core Technology: OpenCV for Real-Time Object Tracking

OpenCV (Open Source Computer Vision Library) is the backbone. Here's what the pipeline looks like:

1. Object Detection

Using background subtraction or YOLOv8, the programme identifies physical objects in each frame:

import cv2

bg_subtractor = cv2.createBackgroundSubtractorMOG2()
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    fg_mask = bg_subtractor.apply(frame)
    contours, _ = cv2.findContours(fg_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    for cnt in contours:
        if cv2.contourArea(cnt) > 500:
            x, y, w, h = cv2.boundingRect(cnt)
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

2. Object Tracking

Once detected, OpenCV's CSRT tracker maintains identity across frames without re-running expensive detection every frame:

tracker = cv2.TrackerCSRT_create()
tracker.init(frame, bounding_box)

# In loop:
success, box = tracker.update(frame)

CSRT = best for accuracy. KCF = best for speed. For multi-object, use DeepSORT.

3. Real-Time Performance

Key optimizations to hit 30+ FPS:

Downscale input for detection, upscale for render
Skip detection every N frames (track-only between detections)
GPU acceleration via CUDA-enabled OpenCV builds

The Physics Layer: Making Digital Elements Bounce

Tracking is step one. Making digital elements react to physical objects is where it gets interesting.

Collision Detection + Response

Each frame:

Update digital element: pos += velocity * dt
Check collision with physical object bounds
Compute reflection vector
Apply: velocity = velocity - 2 * dot(velocity, normal) * normal

import numpy as np

def reflect_velocity(velocity, surface_normal):
    normal = np.array(surface_normal, dtype=float)
    normal = normal / np.linalg.norm(normal)
    dot = np.dot(velocity, normal)
    return velocity - 2 * dot * normal

ball_velocity = np.array([3.0, -2.0])
surface_normal = np.array([0.0, 1.0])  # upward surface
new_velocity = reflect_velocity(ball_velocity, surface_normal)

Rendering Digital Elements

# Digital ball with glow effect
cv2.circle(frame, (int(ball_x), int(ball_y)), 15, (0, 100, 255), -1)

overlay = np.zeros_like(frame)
cv2.circle(overlay, (int(ball_x), int(ball_y)), 25, (0, 60, 150), -1)
blurred = cv2.GaussianBlur(overlay, (21, 21), 0)
frame = cv2.addWeighted(frame, 1.0, blurred, 0.6, 0)

Why This Matters: Spatial Computing in 2026

This demo is a hands-on proof of concept for the convergence of physical and digital worlds — one of the defining tech trends of 2026.

The Bigger Trend

Physical AI: AI systems that understand and operate in 3D physical environments
AR/MR headsets: Apple Vision Pro, Meta Quest making spatial interaction mainstream
Real-time physics: Digital objects that cast accurate shadows, occlude behind physical surfaces, respond to real materials

What's remarkable: this achieves the essence of spatial computing with just a webcam and Python.

Real Applications

Domain	Use Case
Education	Physics simulations with physical desk props
Gaming	No-controller games using body + real objects
Design	Visualize digital components on physical prototypes
Robotics	Navigation pipelines using the same tracking stack
Industrial AR	Overlay instructions onto physical machinery

Full Minimal Demo: Build It Yourself

pip install opencv-python numpy
# Optional for better detection:
pip install ultralytics

import cv2
import numpy as np

ball_pos = np.array([320.0, 100.0])
ball_vel = np.array([4.0, 2.0])
ball_radius = 15

cap = cv2.VideoCapture(0)
bg_sub = cv2.createBackgroundSubtractorMOG2(history=500, varThreshold=50)

while True:
    ret, frame = cap.read()
    if not ret: break
    h, w = frame.shape[:2]

    fg = bg_sub.apply(frame)
    fg = cv2.morphologyEx(fg, cv2.MORPH_OPEN, np.ones((5,5), np.uint8))
    contours, _ = cv2.findContours(fg, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    obstacles = []
    for cnt in contours:
        if cv2.contourArea(cnt) > 1000:
            x, y, cw, ch = cv2.boundingRect(cnt)
            obstacles.append((x, y, x+cw, y+ch))
            cv2.rectangle(frame, (x,y), (x+cw, y+ch), (0,255,0), 2)

    # Update physics
    ball_pos += ball_vel
    if ball_pos[0] <= ball_radius or ball_pos[0] >= w - ball_radius: ball_vel[0] *= -1
    if ball_pos[1] <= ball_radius or ball_pos[1] >= h - ball_radius: ball_vel[1] *= -1

    # Obstacle collision
    for (x1, y1, x2, y2) in obstacles:
        bx, by = int(ball_pos[0]), int(ball_pos[1])
        if x1 - ball_radius < bx < x2 + ball_radius and y1 - ball_radius < by < y2 + ball_radius:
            ball_vel[1] *= -1

    cv2.circle(frame, (int(ball_pos[0]), int(ball_pos[1])), ball_radius, (0, 100, 255), -1)
    cv2.imshow('Block Computing', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'): break

cap.release()
cv2.destroyAllWindows()

From here, layer in YOLOv8 for precise detection, multiple physics objects, surface normal estimation with depth sensors, and GPU rendering via OpenGL.

Conclusion

@bongyunng's demo is more than a cool visual trick. It's a proof of concept for accessible spatial computing — you don't need a $3,500 headset to make digital and physical worlds interact meaningfully.

With OpenCV, Python, and physics simulation, you can build systems where the digital world knows about the physical world and responds in real-time.

Start with a webcam. Start with OpenCV. Start with a bouncing ball.

Written by Arshdeep Singh

RuFlow (Ruflo): The Multi-Agent Claude AI Orchestrator That Slashes API Costs by 75%

Arshdeep Singh — Mon, 23 Mar 2026 13:40:11 +0000

RuFlow (Ruflo): The Multi-Agent Claude AI Orchestrator That Slashes API Costs by 75%

TL;DR: Ruflo (formerly Claude Flow) is an open-source platform that turns Claude Code into a 60+ agent swarm — one agent plans, another codes, another tests, another checks security — all running in parallel, sharing memory, and cutting Claude API costs by up to 75%.

What Is Ruflo?

Ruflo (also called RuFlow or Claude Flow) is an enterprise-grade, open-source AI agent orchestration platform built specifically for Claude. It transforms a single Claude Code instance into a distributed multi-agent development environment — a swarm of specialized AI agents coordinating in parallel to build, test, and ship software faster than any single-agent workflow.

Built by ruvnet on GitHub, Ruflo has rapidly become one of the most powerful tools for Claude power users and AI-heavy development teams.

GitHub: github.com/ruvnet/ruflo

The Core Idea: Parallel Agent Swarms

Traditional Claude usage is linear: one prompt → one response → next prompt. That's sequential and slow for complex tasks.

Ruflo changes the model entirely:

One agent plans the architecture
One agent writes the code
One agent runs tests
One agent reviews security
All of them run simultaneously, sharing a common memory layer

This isn't just faster — it's a fundamentally different way to build software with AI.

Key Features

⚡ 60+ Specialized Agents

Ruflo ships with over 60 pre-built agents, each tuned for a specific role: researcher, coder, tester, security reviewer, DevOps engineer, data analyst, and more. You can also define custom agents with specialized system prompts and tool access.

🔀 Parallel Swarm Coordination

Agents don't work one at a time — they fan out across tasks simultaneously using swarm topologies (mesh, hierarchical, pipeline). Complex workflows that would take hours of back-and-forth prompting complete in a fraction of the time.

🧠 Shared Memory + Self-Learning (SONA)

All agents share a persistent memory layer. Decisions, patterns, and learnings from previous sessions are stored and reused. The SONA (Self-Organizing Neural Architecture) system means the platform actually improves over time based on what works.

💰 75% API Cost Reduction

This is the headline number — and it's real. Ruflo uses a 3-tier intelligent model routing system:

Tier 1: Lightweight models for simple subtasks
Tier 2: Mid-tier models for moderate complexity
Tier 3: Full Claude for high-complexity reasoning

In practice, ~75% of tasks are handled by Tiers 1–2, dramatically reducing API spend. Real-world testing reports 2.5x improvement in effective Claude subscription capacity.

🗄️ RuVector — Built-in Vector DB

A native vector database for semantic search across your codebase, documents, and agent memory. No external Pinecone/Weaviate setup needed.

🔧 170+ MCP Tools

Native support for 170+ Model Context Protocol tools — giving agents access to file systems, databases, APIs, browsers, and more.

📊 84.8% SWE-Bench Score

Ruflo v3 hits 84.8% on SWE-Bench — one of the most demanding software engineering benchmarks. For context, this puts it among the top-performing autonomous coding systems.

⚡ 352x Faster WASM Execution

Compiled to WebAssembly for near-native execution speed on critical path operations.

How It Works in Practice

Here's a real example of spinning up a swarm to build a REST API:

# Initialize a mesh swarm with 8 agents
npx claude-flow coordination swarm-init --topology mesh --max-agents 8

# Spawn specialized agents
npx claude-flow coordination agent-spawn --type researcher --name "API Specialist"
npx claude-flow coordination agent-spawn --type coder --name "Backend Dev"
npx claude-flow coordination agent-spawn --type tester --name "QA Engineer"

# Orchestrate the task across all agents in parallel
npx claude-flow coordination task-orchestrate \
  --task "Build REST API with auth, CRUD endpoints, and tests" \
  --strategy parallel

The agents divide the work, execute in parallel, and merge results — all with shared context about what each other is doing.

Map-Reduce Pattern for Large Tasks

# Analyze 1000 files in parallel, then consolidate
npx claude-flow task orchestrate \
  --task "Analyze 1000 code files" \
  --strategy parallel \
  --pattern map-reduce \
  --map-agents "code-analyzer:10"

Installation

npx claude-flow@latest

That's it. Ruflo is fully npm-based and requires no additional infrastructure setup for basic usage. Docker Compose is available for advanced/production deployments.

Requirements: Claude Code (Claude subscription), Node.js

Ruflo vs. Single-Agent Claude

	Single Agent Claude	Ruflo Multi-Agent
Execution	Sequential	Parallel
Context window	Single instance	Distributed across agents
Memory	Session-only	Persistent + shared
API cost	Full price per call	~75% reduction
Complex tasks	Multiple back-and-forth	Autonomous multi-agent
Learning	None	Self-improving (SONA)
Scale	One prompt at a time	60+ concurrent agents

Why This Matters for Developers

If you're building anything serious with Claude Code, single-agent workflows are a bottleneck. Ruflo solves:

Context limits — Distributed agents can work on larger codebases than a single context window allows
Speed — Parallel execution cuts wall-clock time dramatically
Cost — 75% API savings is significant at scale
Quality — Specialized agents (one for code, one for tests, one for security) produce better output than a generalist doing everything
Continuity — Persistent memory means no re-explaining context each session

For DevOps and platform engineering specifically, the swarm model maps naturally to CI/CD thinking: parallel jobs, specialized stages, shared state. It's the same mental model, applied to AI-assisted development.

DevOps Use Cases

Infrastructure as Code: One agent writes Terraform, another validates against compliance rules, another runs terraform plan — all in parallel
CI/CD Pipeline Generation: Swarm designs, implements, and tests entire GitLab/Jenkins pipelines
Security Review: Dedicated security agent runs in parallel with code generation — not as an afterthought
Code Modernization: Fan out across services, migrate each in parallel, consolidate and test
Documentation: Separate documentation agent runs alongside development, keeping docs in sync automatically

The Numbers

60+ specialized agents out of the box
170+ MCP tools supported
84.8% SWE-Bench score (v3)
75% API cost reduction via 3-tier model routing
2.5x effective subscription capacity
352x faster WASM execution vs. interpreted baseline

Get Started

GitHub: github.com/ruvnet/ruflo
Website: claude-flow.ruv.io
Install: npx claude-flow@latest
npm: npmjs.com/package/claude-flow

Final Take

Ruflo is what happens when you stop treating Claude as a chatbot and start treating it as an infrastructure component. Swarm intelligence, parallel execution, persistent memory, and aggressive cost optimization — it's a production-grade system built by someone who clearly uses Claude Code at serious scale.

If you're on a Claude subscription and not using multi-agent workflows yet, you're leaving a lot on the table. The 75% cost reduction alone is worth the exploration time.

Single agent is the past. Swarm is the default.

Written by Arshdeep Singh

GitHub: ruvnet/ruflo | npm: npx claude-flow@latest

Project N.O.M.A.D.: The Open-Source Offline Survival Computer That Runs Without the Internet

Arshdeep Singh — Mon, 23 Mar 2026 12:20:08 +0000

Project N.O.M.A.D.: The Open-Source Offline Survival Computer That Runs Without the Internet

TL;DR: Someone just open-sourced a fully self-contained offline server — AI, Wikipedia, maps, Khan Academy, medical references, and data tools — zero internet required after setup. And it's completely free.

What Is Project N.O.M.A.D.?

N.O.M.A.D. stands for Node for Offline Media, Archives, and Data. It's a free, open-source, self-contained offline knowledge and education server built by Crosstalk Solutions.

The concept is simple but powerful: everything you need to stay informed, educated, and operational — even when the internet is completely gone.

Grid failure? No signal in the mountains? Rural area with no connectivity? NOMAD doesn't care. Once installed, it works forever without the internet.

Similar commercial products cost hundreds of dollars. NOMAD is free.

What's Inside the Box?

NOMAD ships as a Docker-based system managed through a central web dashboard called the "Command Center." Here's everything you get out of the box:

🤖 Local AI Chat (Ollama + Qdrant RAG)

A fully local AI assistant powered by Ollama — no API keys, no cloud, no data sent anywhere. You can upload your own documents and get semantic search (RAG via Qdrant). Think ChatGPT, but running entirely on your hardware.

📚 Offline Information Library (Kiwix)

Full Wikipedia archives — searchable and browsable offline. Plus medical references, survival guides, and ebooks. Terabytes of human knowledge available with zero internet.

🎓 Education Platform (Kolibri)

Khan Academy courses with progress tracking and multi-user support via Kolibri. Math, science, programming, history — all downloadable and available offline.

🗺️ Offline Maps (ProtoMaps)

Download regional maps from OpenStreetMap. Navigate and plan routes with zero cellular connectivity.

🔐 Data Tools (CyberChef)

Encryption, encoding, hashing, and data analysis — built in. Useful for security, forensics, or just working with data in isolated environments.

📝 Local Note-Taking (FlatNotes)

Markdown-based local notes. Everything stays on your machine.

📊 System Benchmark

A built-in hardware scoring tool with a community leaderboard so you can see how your setup stacks up.

How to Install It

NOMAD runs on any Debian-based OS (Ubuntu recommended). Installation is fully terminal-based:

sudo apt-get update && sudo apt-get install -y curl && \
curl -fsSL https://raw.githubusercontent.com/Crosstalk-Solutions/project-nomad/refs/heads/main/install/install_nomad.sh \
  -o install_nomad.sh && sudo bash install_nomad.sh

After installation, open a browser and go to http://localhost:8080 (or http://DEVICE_IP:8080). Done.

The Command Center handles all installation, configuration, and updates — no manual Docker config needed (unless you want advanced control).

Hardware Requirements

NOMAD itself is lightweight. The AI tools are what demand serious hardware.

Minimum (Core Features Only)

CPU: 2 GHz dual-core
RAM: 4 GB
Storage: 5 GB free
OS: Debian-based (Ubuntu)
Internet: Required only during install

Recommended (For Local AI / LLMs)

CPU: AMD Ryzen 7 or Intel Core i7+
RAM: 32 GB
GPU: NVIDIA RTX 3060 or AMD equivalent (more VRAM = larger models)
Storage: 250 GB+ SSD
OS: Ubuntu

A detailed hardware guide with builds at three price points ($150–$1,000+) is available at projectnomad.us/hardware.

Why This Matters

Most tech infrastructure assumes internet connectivity. Cloud AI, SaaS tools, online maps, streaming education platforms — they all break the moment your connection goes down.

NOMAD is a bet against that assumption.

Use cases where this shines:

Disaster preparedness — grid failures, infrastructure outages
Remote deployments — ships, mountain research stations, rural schools
Privacy-first setups — zero telemetry, zero cloud, everything local
Offline development environments — air-gapped networks, secure facilities
Self-sufficient homesteads — off-grid living without sacrificing access to knowledge

This is also a DevOps/infrastructure dream for anyone building air-gapped systems. The entire stack is containerized and Docker-managed — you can customize, extend, or integrate it into existing infrastructure.

The Stack Under the Hood

Capability	Powered By	What You Get
AI Chat	Ollama + Qdrant	Local LLMs + RAG semantic search
Information Library	Kiwix	Wikipedia, medical refs, ebooks
Education	Kolibri	Khan Academy courses + progress tracking
Maps	ProtoMaps	Regional offline maps
Data Tools	CyberChef	Encryption, encoding, hashing
Notes	FlatNotes	Local markdown note-taking

How to Get Started

GitHub: github.com/Crosstalk-Solutions/project-nomad
Website: projectnomad.us
Discord: Join the community via the Discord link on GitHub
Hardware Guide: projectnomad.us/hardware

Final Take

Project N.O.M.A.D. is one of those rare open-source projects that solves a real problem most people don't think about until it's too late. Internet access is fragile. Knowledge shouldn't be.

If you're into self-hosting, DevOps, privacy, or just building systems that work under any conditions — this is worth exploring. Install it on an old machine, a Raspberry Pi alternative, or a full GPU rig. Start small with Wikipedia and maps. Add AI when your hardware is ready.

The internet is optional. Knowledge isn't.

Written by Arshdeep Singh

GitHub: Crosstalk-Solutions/project-nomad | Website: projectnomad.us

Qwen 3.5: The AI Model That Runs on Your iPhone Without an Internet Connection

Arshdeep Singh — Fri, 20 Mar 2026 17:15:50 +0000

Qwen 3.5: The AI Model That Runs on Your iPhone Without an Internet Connection

Written by Arshdeep Singh

The default assumption in AI today is connectivity. Ask a question → request goes to a data center → model processes it → response comes back. Fast, convenient, and entirely dependent on a working internet connection and a third party you're trusting with your data.

Qwen 3.5 is part of a different trend: capable AI models small enough to run on your phone, your laptop, your edge device — entirely offline. Alibaba's open-weight model family has been moving steadily in this direction, and with Qwen 3.5, released in February 2026, on-device AI crossed a meaningful capability threshold.

The Qwen Family: Context

To understand where Qwen 3.5 sits, it helps to see the progression:

Qwen 2.5 (2024) — Alibaba's strong open-weight series, competitive with Llama 3 at various sizes. Solid general-purpose models from 0.5B to 72B parameters.

Qwen 3 (April 2025) — A major leap. Introduced hybrid thinking/non-thinking modes (like DeepSeek's chain-of-thought toggle), scaled up to 235B parameters via Mixture of Experts (MoE), and achieved near-frontier performance on reasoning benchmarks. The 235B MoE model became a serious open-weight competitor to closed models.

Qwen 3.5 (February 2026) — The on-device focus. Rather than scaling up, Alibaba optimized down. The key innovation: taking Qwen 3's capabilities and compressing them into sizes that run on consumer hardware — phones, laptops, embedded devices.

The Model Sizes

Qwen 3.5 ships in four sizes:

Model	Parameters	Target Hardware
Qwen 3.5-0.8B	0.8 billion	iPhone 12+, mid-range Android
Qwen 3.5-2B	2 billion	iPhone 14+, any modern laptop
Qwen 3.5-4B	4 billion	iPhone 15 Pro, M1 MacBook Air
Qwen 3.5-9B	9 billion	M2/M3 MacBook, high-end phones

The 0.8B model runs on an iPhone 12. The 9B model runs on a MacBook Air with an M-series chip. All of them run without an internet connection.

This isn't a theoretical capability. These models run at usable speeds on the hardware most people already own.

Hybrid Thinking Mode

One of Qwen 3.5's key inherited features from Qwen 3 is the hybrid thinking/non-thinking mode.

In thinking mode, the model uses chain-of-thought reasoning — working through problems step by step before producing an answer. This is slower but significantly more accurate for complex reasoning tasks: math, coding, multi-step logic.

In non-thinking mode, the model responds immediately without the intermediate reasoning steps. Faster, suitable for conversational use, simple lookups, and tasks where speed matters more than depth.

The ability to toggle between these modes on-device is meaningful. You get a model that can be fast and lightweight for casual use, and slow-and-thorough when you need it to actually think.

Open-Weight: What That Actually Means

"Open-weight" is meaningfully different from "open-source," and it's worth being precise.

Open-weight means the model weights are publicly available for download. You can:

Download the model and run it locally
Fine-tune it on your own data
Deploy it on your own infrastructure
Integrate it into your application

You cannot necessarily see the training code, the data curation process, or the full training recipe — that's where "open-weight" differs from fully open-source.

But for practical purposes, open-weight is what matters for most developers and most use cases:

No per-token fees — run as many tokens as you want, pay only for your own hardware/compute
No rate limits — inference speed is limited only by your hardware
No privacy concerns — data never leaves your device
No downtime — if your internet is down, the model still runs
Fine-tunable — adapt the model to your domain, your style, your use case

For consumer applications, edge deployments, and privacy-sensitive use cases, these properties are transformative.

Why On-Device AI Matters Now

The argument for on-device AI has always been there: privacy, latency, offline capability, cost. But for years, the models small enough to run on phones were too limited to be genuinely useful — good enough for autocomplete, not good enough for reasoning.

Qwen 3.5 is evidence that this gap is closing.

The 4B model, running locally on a modern phone, can:

Answer complex questions with reasonable accuracy
Write and explain code
Summarize documents
Reason through multi-step problems
Translate between languages

Not perfectly. Not at the level of GPT-4o or Claude Sonnet. But well enough for a significant fraction of real tasks — and entirely offline.

The 9B model on a MacBook is more capable still. For many everyday AI tasks, it's competitive with early-generation frontier models.

Running Qwen 3.5 Locally

Via Ollama (easiest):

ollama pull qwen3.5:4b
ollama run qwen3.5:4b

Via llama.cpp:

# Download GGUF from HuggingFace
./llama-cli -m qwen3.5-4b-q4_k_m.gguf -p "Your prompt here"

Via Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-4B")

On iPhone: Via apps like LLM Farm, PocketPal, or MLX-based iOS apps that support Qwen 3.5 weights.

The Broader Significance

Qwen 3.5 isn't just a technical achievement — it's a signal about where the AI industry is heading.

The frontier models (GPT-5, Claude 4, Gemini Ultra) will keep getting larger and more capable. But in parallel, a different optimization is happening: making capable models smaller and faster.

This second trajectory matters for:

Developing markets — where connectivity is unreliable but smartphones are ubiquitous
Privacy-first applications — medical, legal, personal data that can't leave the device
Edge computing — AI in IoT devices, industrial equipment, vehicles
Cost reduction — enterprise deployments where inference costs matter at scale
Resilience — applications that need to function even when cloud services are down

Alibaba's Qwen series has been consistently impressive and consistently underappreciated in Western tech media. Qwen 3.5 continues that pattern: a serious technical achievement that quietly expands what's possible for developers and users who care about running AI on their own terms.

Final Thoughts

The question "what AI can I run without internet?" has historically had a depressing answer. Qwen 3.5 changes that.

A 4B model that reasons well, runs on a modern iPhone, and supports hybrid thinking mode is a qualitatively different kind of tool than the cramped, limited on-device models of two years ago.

Download it. Run it. See for yourself what offline AI looks like in 2026.

The model weights are free. The inference is yours. The data stays on your device.

Written by Arshdeep Singh

Contains Studio Agents: A Full AI Department for Claude Code, Ready to Clone

Arshdeep Singh — Fri, 20 Mar 2026 17:15:14 +0000

Contains Studio Agents: A Full AI Department for Claude Code, Ready to Clone

Written by Arshdeep Singh

Claude Code's sub-agent system is one of the most powerful and underused features in AI-assisted development. Instead of one generalist model handling everything, sub-agents let you route specific tasks to specialized agents — each with their own instructions, context, and expertise.

The problem is building those agents from scratch. Defining a useful agent requires deep thought about scope, constraints, tool access, and interaction patterns. For most developers, that's a significant upfront investment before you get any value.

Contains Studio Agents eliminates that investment entirely. It's an open-source collection of 30+ production-ready Claude Code sub-agents, organized as a complete AI department, ready to clone and use immediately.

GitHub: contains-studio/agents

What It Is

Contains Studio Agents is structured as a set of departments — mirroring how a real company is organized. Instead of one "do everything" agent, you have specialists:

🎨 Design — brand guardians, UX designers, visual direction
⚙️ Engineering — backend architects, DevOps automators, security reviewers
📣 Marketing — growth hackers, content strategists, social media specialists
📦 Product — roadmap planners, feature prioritizers, user research analysts
📋 Project Management — sprint planners, standup facilitators, dependency trackers
🏢 Studio Operations — process designers, retrospective facilitators, culture builders
🧪 Testing — QA engineers, test case generators, coverage analysts

Each agent in each department is a .md file containing a carefully crafted system prompt, tool guidelines, and behavioral constraints.

Installation

Three commands. Done.

git clone https://github.com/contains-studio/agents
cp -r agents/agents/* ~/.claude/agents/
# Restart Claude Code

After restarting, all 30+ agents are available to Claude Code. No configuration, no API keys, no setup wizard.

The Agents (Selected)

Engineering Department

backend-architect
Designs scalable backend systems. Focuses on API design, data modeling, service boundaries, and avoiding premature optimization. Ask it to review your architecture and it'll push back on the right things.

devops-automator
Specializes in CI/CD pipelines, infrastructure as code, container orchestration, and deployment automation. Knows Terraform, GitHub Actions, Docker, and Kubernetes in depth.

security-reviewer
Reviews code for vulnerabilities, OWASP Top 10, secrets exposure, and dependency risks. Useful for pre-PR security checks without the overhead of a full audit.

Marketing Department

growth-hacker
Experiment-driven marketing strategy. A/B tests, acquisition funnels, retention loops. Talks in metrics, not vibes.

tiktok-strategist
Short-form video content strategy specific to TikTok's algorithm and audience. Knows what hooks work, what formats get completed, what drives follows.

content-strategist
Long-form content planning — blog calendars, SEO strategy, thought leadership positioning. Works well paired with the brand-guardian agent.

Product Department

feedback-synthesizer
Takes raw user feedback (support tickets, reviews, interviews) and synthesizes it into structured insights, themes, and prioritized feature signals. Removes the bias from manual reading.

roadmap-planner
Helps prioritize features using frameworks like RICE, ICE, and value/effort matrices. Useful when you have 50 backlog items and need to decide what actually ships next.

Design Department

brand-guardian
Enforces brand consistency — voice, tone, visual language, messaging hierarchy. Useful for reviewing copy, designs, or anything that represents your company externally.

How It Works in Practice

The elegance of the sub-agent system is that routing is automatic. You don't need to specify which agent handles your request — Claude Code reads your task description and delegates to the most appropriate specialist.

Example interactions:

"Review this PR for security vulnerabilities" → security-reviewer handles it

"We have 40 backlog items. Help me decide what ships in Q2." → roadmap-planner takes over

"Write three TikTok scripts for our product launch" → tiktok-strategist activates

"Our pipeline is taking 20 minutes to build. Find the bottleneck." → devops-automator digs in

You describe the work in natural language. The right expert shows up.

Why This Matters for AI-Assisted Work

There's a meaningful difference between using AI as a general assistant and using AI through a system of specialists.

Generalist AI agents tend to give generalist answers. When you ask a single agent to review your architecture, write marketing copy, and plan your testing strategy in the same session, you get reasonable outputs — but nothing that reflects deep domain expertise.

Specialist agents behave differently. A backend-architect agent that's been given focused context about system design patterns, tradeoffs, and anti-patterns will produce meaningfully better architecture reviews than a generalist agent asked to "think like a senior backend engineer."

Contains Studio Agents is a practical demonstration of this. It's not theoretical. You can install it in 3 commands and immediately experience the difference between asking a generalist "review this Terraform" and asking a devops-automator who's been specifically prompted on IaC best practices.

Extending the Collection

The format is intentionally simple. Each agent is a single markdown file:

# Agent Name

## Role
[Brief description of the agent's specialty]

## Responsibilities
[What this agent does]

## Guidelines
[How this agent thinks and behaves]

## Tools
[What tools this agent uses]

Adding a custom agent is as simple as writing a new .md file and dropping it in ~/.claude/agents/. The Contains Studio repo gives you a clear template and a high-quality reference set to build from.

Who Should Use This

Solo developers who want specialist AI reviewers without a team
Startups that need to move fast across engineering, product, and marketing
Claude Code users looking to get significantly more value from sub-agents
Anyone building their own agent collection and wanting a strong starting point

Final Thoughts

Contains Studio Agents represents a mature approach to AI-assisted work: not one smart assistant, but a coordinated team of specialists. The open-source release makes that team available to everyone.

Three commands. Thirty specialists. Start delegating.

👉 github.com/contains-studio/agents

Written by Arshdeep Singh

Karpathy's AutoResearch: What Happens When You Let an AI Run Its Own Experiments Overnight

Arshdeep Singh — Fri, 20 Mar 2026 17:14:39 +0000

Karpathy's AutoResearch: What Happens When You Let an AI Run Its Own Experiments Overnight

Written by Arshdeep Singh

In March 2026, Andrej Karpathy released something quietly remarkable: AutoResearch — a framework that lets an AI agent autonomously run machine learning experiments, iterate on its own training code, and improve a model overnight without human intervention.

This isn't a research paper. It's a working system you can clone and run today. And it points to something significant about where AI-assisted research is heading.

GitHub: karpathy/autoresearch

The Problem It Solves

ML research is, fundamentally, an experimental loop:

Have an idea
Implement it
Train a model
Evaluate results
Keep or discard the change
Go back to step 1

This loop is slow because it's human-bottlenecked. Steps 2-5 can take hours per cycle. A researcher might run 5-10 experiments per day if they're focused. Most of that time is waiting — waiting for training runs, waiting for evaluations to complete, waiting to context-switch back to the right mental model.

AutoResearch removes the human from steps 2-5 entirely. The loop runs overnight. You wake up to 50 completed experiments.

The Architecture

AutoResearch is built on nanochat — Karpathy's minimal GPT implementation designed for single-GPU training runs. Each training job takes about 5 minutes, which is the key design constraint: fast enough to run many experiments in a single overnight session.

The system has exactly three files that matter:

`prepare.py` — Fixed

Data preparation and tokenization. The agent never touches this. The dataset and preprocessing are locked in, giving the agent a stable foundation to experiment from.

`train.py` — Agent's Playground

This is where the agent operates. It contains everything about the model: architecture decisions, hyperparameters, optimizer configuration, learning rate schedules, regularization. The agent reads this file, proposes a modification, implements it, and measures whether it helped.

`program.md` — Your Research Direction

Here's the clever part: you don't write Python to configure AutoResearch. You write markdown.

program.md is the research organization's charter. You describe what you're trying to achieve, what directions seem promising, what constraints to respect. The agent reads this document and uses it to guide its experimental strategy.

Want to focus on attention mechanisms? Write that in program.md. Want to avoid changes that increase parameter count beyond a threshold? Write that too. The agent follows it.

How the Loop Works

Read program.md → Understand current model state
→ Propose an experiment (hypothesis + implementation)
→ Edit train.py
→ Run training (~5 min)
→ Evaluate validation loss
→ Compare against baseline
→ If improvement: commit change, update baseline
→ If regression: revert, log the failure
→ Record findings in experiment log
→ Repeat

In a single overnight run, the system executed 50 experiments. It explored:

Attention head configurations
Activation functions
Layer normalization variants
Learning rate schedule shapes
Optimizer hyperparameters
Residual connection patterns

By morning, it had found configurations that meaningfully improved the baseline model — without a single human keypress after the initial launch.

What This Actually Means

Let's be precise about what AutoResearch is and isn't.

It is:

A working demonstration that AI agents can run meaningful ML experiments autonomously
A practical tool for exploring hyperparameter and architecture spaces overnight
A framework that treats research direction as a natural-language configuration problem
Open-source and runnable today on a single GPU

It isn't:

A system that generates novel research ideas from scratch
A replacement for human intuition in designing experiments
A tool that works on arbitrary codebases (it's scoped to nanochat)
Guaranteed to find improvements — many experiments fail

The honest framing: AutoResearch is an automated experimental assistant, not an autonomous research scientist. You still define the direction. It executes and iterates faster than you can.

The `program.md` Insight

The design decision to use a markdown file for research configuration is worth dwelling on.

Most automation systems are configured with code or structured config files. AutoResearch deliberately chooses natural language. This means:

Researchers without strong coding skills can participate — you describe your research intent in prose, not parameters
The configuration is human-readable — you can audit what the agent understood and adjust it
The boundary between researcher and agent is clear — humans write intent, agents write code

This is a small but meaningful step toward AI systems that collaborate with humans at the level of ideas rather than just implementation.

Getting Started

git clone https://github.com/karpathy/autoresearch
cd autoresearch
pip install -r requirements.txt

# Configure your research direction
vim program.md

# Prepare your dataset
python prepare.py

# Launch overnight run
python autorun.py

You'll need a single GPU (the system is designed for consumer hardware — an RTX 3090 or 4090 is ideal). Set it running before you sleep. Review results in the morning.

Implications for the Field

AutoResearch is a prototype of something bigger: AI as an active participant in scientific research.

The current model is: human researchers use AI tools to accelerate their work. The next model is: AI systems run experiments in parallel with human researchers, exploring the search space faster than any single person could.

We're not at the "AI has research ideas" stage yet. But we're clearly at the "AI can run research experiments faster than humans" stage. AutoResearch makes that concrete and tangible.

Karpathy has a track record of releasing tools that become foundational — nanoGPT, micrograd, minbpe. AutoResearch feels like another one of those releases: minimal, clearly designed, and pointing at something important.

Final Thoughts

The most interesting thing about AutoResearch isn't the technical implementation — it's the workflow it enables. Run experiments while you sleep. Wake up to data. Make decisions informed by 50 trials instead of 3.

That's not a marginal improvement in research productivity. It's a structural shift in what a single researcher can accomplish.

For anyone doing ML research or experimentation on single-GPU hardware, AutoResearch is worth studying and running. Even if you don't use it directly, the design philosophy — natural language research configuration, autonomous experimental loops, fast iteration over small models — is worth internalizing.

👉 github.com/karpathy/autoresearch

Written by Arshdeep Singh

AIChat: One CLI Tool to Rule All Your LLMs

Arshdeep Singh — Fri, 20 Mar 2026 17:14:03 +0000

AIChat: One CLI Tool to Rule All Your LLMs

Written by Arshdeep Singh

If you work with multiple LLMs regularly, you know the friction: different web interfaces, different API clients, different context windows, different ways to attach files. Switching between Claude, GPT-4, Gemini, and a local Ollama model means juggling multiple tools and losing the flow of your terminal-based workflow.

AIChat solves this with one elegantly built CLI — 29,000+ GitHub stars, written in Rust, and supporting 20+ LLM providers through a unified interface that actually respects how developers work.

GitHub: sigoden/aichat

What Is AIChat?

AIChat is an all-in-one command-line LLM tool that puts every major AI provider — and your local models — behind a single, consistent interface. It's not just a wrapper; it's a fully-featured AI workbench built for the terminal.

The 29k+ star count on GitHub isn't hype. It's the result of years of steady development, a genuinely useful feature set, and a community of developers who've found it indispensable.

20+ Providers, One Config

AIChat supports:

OpenAI (GPT-4, GPT-4o, o1, o3)
Anthropic (Claude 3.5 Sonnet, Claude 3 Opus)
Google (Gemini 1.5 Pro, Gemini Flash)
Ollama (run any local model — Llama, Mistral, Phi, Qwen)
Groq (blazing fast inference)
Azure OpenAI
AWS Bedrock
Mistral
DeepSeek
Perplexity
OpenRouter
And more

One config file. One tool. Switching providers is a flag: aichat -m claude:claude-3-5-sonnet "Explain this code".

Core Features

Shell Assistant 🐚

This might be AIChat's most immediately useful feature for developers. Natural language → shell commands, directly in your terminal.

aichat -e "find all files modified in the last 24 hours larger than 10MB"

AIChat generates the find command, shows it to you, and asks if you want to run it. It can also run it directly with the --execute flag. For anyone who spends time Googling obscure shell commands, this alone is worth the install.

Chat-REPL 💬

A full interactive chat mode with persistent conversation history, multi-line input, and syntax-highlighted responses. Think of it as a terminal-native ChatGPT that works with any model.

aichat  # opens the REPL

Inside the REPL, you can switch models mid-conversation, attach files, run shell commands, and navigate history — all without leaving the terminal.

RAG (Retrieval-Augmented Generation) 📚

AIChat has a built-in RAG pipeline. You can create "sessions" that index local files (PDFs, code, docs) and query them with your LLM of choice. No external vector database setup required — it handles chunking, embedding, and retrieval internally.

aichat --rag ./docs/ "What does the authentication module do?"

AI Agents + Function Calling 🤖

AIChat supports function calling across providers that expose it. You can define tools in a YAML config and have the model call them — file operations, web requests, custom scripts. Build lightweight agents that actually do things.

Built-in HTTP Server 🌐

AIChat ships with an OpenAI-compatible HTTP server. This means you can point any tool or library that expects an OpenAI endpoint at AIChat — and route requests to whatever model you actually want, including local Ollama models.

aichat serve --port 8080

Your local code that calls openai.chat.completions.create(...) now routes to DeepSeek. No code changes.

LLM Arena ⚔️

Side-by-side model comparison in the terminal. Send the same prompt to multiple models simultaneously and compare responses. Essential for prompt engineering and model selection.

aichat --arena "Explain the CAP theorem" -m gpt-4o -m claude-3-5-sonnet -m gemini-1.5-pro

Installation

macOS (Homebrew):

brew install aichat

Rust/Cargo:

cargo install aichat

Linux (pre-built binary):
Download from GitHub Releases — available for x86_64 and ARM.

Windows: Pre-built binary available via GitHub Releases.

Configuration

On first run, AIChat creates a config file at ~/.config/aichat/config.yaml. The structure is clean and well-documented:

model: claude:claude-3-5-sonnet
clients:
  - type: claude
    api_key: sk-ant-...
  - type: openai
    api_key: sk-...
  - type: ollama
    api_base: http://localhost:11434

Set up once, use forever.

Real-World Use Cases

For developers:

Quick code review in the terminal: cat main.py | aichat "review this for bugs"
Generate boilerplate: aichat "write a Dockerfile for a Node.js app"
Debug errors: cat error.log | aichat "what's causing this?"

For sysadmins:

Shell assistant for complex command generation
Log analysis with RAG over log files
Generate Terraform/Ansible configs from natural language

For researchers:

Multi-model comparison for evaluation
RAG over paper PDFs
Automated pipelines via the HTTP server

Why 29,000 Stars?

AIChat has earned its star count by doing one thing exceptionally well: respecting developer workflow.

It doesn't try to be a web app, a hosted service, or a product with a dashboard. It's a tool — fast, composable, local-first, and configurable. It fits into existing workflows rather than replacing them.

In a world full of AI tools that want to be platforms, AIChat is content to be a very good hammer.

Final Thoughts

If you use LLMs and live in the terminal, AIChat belongs in your toolkit. The breadth of provider support, the shell assistant, the built-in HTTP server, and the LLM arena make it genuinely more useful than any single-provider CLI.

And because it's open-source Rust with active development, it keeps getting better.

brew install aichat
aichat "what can you do?"

Find out for yourself.

👉 GitHub: sigoden/aichat

Written by Arshdeep Singh

Spacebot: The Multi-Agent AI Platform Built in Rust for Teams

Arshdeep Singh — Fri, 20 Mar 2026 17:12:47 +0000

Spacebot: The Multi-Agent AI Platform Built in Rust for Teams

Written by Arshdeep Singh

Most AI assistant platforms are built around a single agent and a single conversation. You ask a question, the model answers, the session ends. Simple, stateless, and ultimately limited.

Spacebot takes a fundamentally different approach. It's a multi-agent AI orchestration platform — built in Rust as a single binary — designed for teams that want persistent, context-aware AI infrastructure running alongside their work, not just responding to ad-hoc queries.

Built in Rust. Built to Last.

The choice of Rust is a signal. Spacebot isn't another Python wrapper around an LLM API — it's a serious piece of infrastructure built for performance, reliability, and long-term operation.

Tech stack:

Rust + Tokio — async runtime for high-throughput, low-latency agent orchestration
SQLite — embedded database for persistent storage without external dependencies
LanceDB — vector database for semantic memory and embedding search
FastEmbed — on-device embedding generation (no external API calls for embeddings)
Serenity — Discord gateway integration for team-facing interface

The result is a single binary that runs everywhere, boots fast, and doesn't require a Kubernetes cluster to operate.

The Three-Agent Architecture

Spacebot's most interesting design decision is how it structures agent cognition across three distinct roles:

🎭 Face Agent

The user-facing agent. This is the personality your team interacts with — friendly, contextual, and responsive. The Face Agent handles conversation, interprets intent, and coordinates responses. Think of it as the "front desk" of your AI infrastructure.

🧠 Conscience

An independent thinking fork that runs in parallel to the Face Agent. While the Face Agent is handling the immediate response, the Conscience is evaluating, questioning, and sometimes pushing back. This separation prevents the sycophantic drift common in single-agent systems — the agent that just agrees with everything because it's optimizing for immediate approval.

⚙️ Worker

Pure execution. No personality, no reasoning overhead — just doing the thing. The Worker handles tool calls, API requests, file operations, and any task where speed and reliability matter more than conversational quality.

This three-layer architecture means Spacebot isn't just answering questions — it's thinking about them from multiple angles while simultaneously acting on them.

The Cortex: Memory That Actually Works

The feature that sets Spacebot apart from almost every other platform is The Cortex — a persistent, cross-conversation memory system built on an 8-dimensional memory graph.

Most AI systems have no memory between sessions. Some have basic summarization. Spacebot has a structured knowledge graph that tracks:

Facts and entities mentioned across all conversations
Relationships between topics, people, and projects
Temporal patterns (what gets asked when, what follows what)
Team-level context that persists across users

Every 60 minutes, the Cortex synthesizes a fresh briefing — a structured summary of what's been discussed, what decisions were made, and what needs attention. This means your AI assistant actually knows what happened yesterday without you having to re-explain it.

For teams, this is transformative. Instead of every team member bootstrapping context from scratch, Spacebot carries institutional memory.

10 LLM Providers, All BYOK

Spacebot supports 10 LLM providers out of the box:

OpenAI (GPT-4, GPT-4o)
Anthropic (Claude Sonnet, Claude Opus)
Google (Gemini)
Groq
Mistral
DeepSeek
Ollama (local models)
And more

All plans are Bring Your Own Key (BYOK) — Spacebot never touches your API keys beyond routing requests. You control costs, you choose models, you own the data.

OpenClaw Skills Compatibility

Spacebot is compatible with OpenClaw skills — the skill/tool ecosystem built for AI agents running in OpenClaw environments. This means if you've already built custom integrations or automations as OpenClaw skills, they drop straight into Spacebot without modification.

Pricing

Plan	Price	Best For
Pod	$29/mo	Small teams, personal use
Outpost	$59/mo	Growing teams, more resources
Nebula	$129/mo	Large teams, high-volume
Self-host	Free	Full control via Docker

All plans are BYOK. The self-hosted option via Docker gives you the full Spacebot experience on your own infrastructure — no subscription required if you're comfortable running your own stack.

Self-Hosting

docker pull spacebot/spacebot:latest
docker run -d   -e OPENAI_API_KEY=your_key   -v spacebot_data:/data   -p 3000:3000   spacebot/spacebot:latest

The single-binary architecture means there's no complex orchestration setup. One container, one persistent volume, and you're running a full multi-agent AI platform.

Who Is Spacebot For?

Engineering teams that want a persistent AI assistant with real team memory
Startups looking for AI infrastructure that scales without rebuilding
Self-hosters who want control over their AI stack
Discord-based communities that want embedded AI with memory and multi-agent reasoning
Builders who value Rust-quality reliability over Python-ecosystem convenience

Why This Matters

The current generation of AI assistants treats every session as a blank slate. That's fine for consumer apps, but it's a significant limitation for teams doing complex, ongoing work.

Spacebot's bet is that the next generation of team AI tools will look less like chatbots and more like colleagues — entities with memory, independent thinking, and the ability to coordinate specialized work across multiple execution contexts.

The three-agent architecture, the Cortex memory system, and the Rust foundation suggest a team that's building for that future rather than optimizing for today's demo.

👉 spacebot.sh

Written by Arshdeep Singh

Composio: The Integration Layer Your AI Agents Have Been Waiting For

Arshdeep Singh — Fri, 20 Mar 2026 17:12:41 +0000

Composio: The Integration Layer Your AI Agents Have Been Waiting For

Written by Arshdeep Singh

Building an AI agent in 2025 is surprisingly easy. You pick a model, wire up a few prompts, and you're off. The hard part — the part nobody warns you about — is connecting that agent to the real world.

Sending an email. Creating a GitHub issue. Updating a Notion page. Syncing a Slack message. Each of these requires OAuth flows, token refresh logic, API-specific quirks, and error handling. Multiply that across 10 apps and suddenly your "simple AI agent" has become a full-time infrastructure project.

That's the exact problem Composio was built to solve.

What Is Composio?

Composio is an AI-native integration platform that gives your LLMs and agents authenticated, production-ready access to 850+ applications — GitHub, Slack, Notion, Gmail, Jira, Linear, Salesforce, and hundreds more.

Think of it as the integration layer between your intelligent agent and the world it needs to operate in. Instead of building OAuth flows from scratch or hand-rolling API wrappers, you get a clean, consistent SDK that handles all of it for you.

The project is open-source (GitHub: ComposioHQ/composio), backed by a growing community, and available at composio.dev.

The Real Problem: Auth Is a Nightmare

Let's be honest about what "connecting an agent to an API" actually involves:

OAuth 2.0 flows — authorization codes, PKCE, redirect URIs
Token storage — securely persisting access/refresh tokens per user
Token refresh — silently refreshing before expiry without losing context
Scopes management — requesting the right permissions upfront
Multi-tenant support — managing tokens for thousands of users, not just one
Rate limiting and retries — handling 429s gracefully
API versioning — keeping up with breaking changes in third-party APIs

Composio handles every single one of these. You just call composio.tools.get("GITHUB") and get back tools your LLM can immediately use — authenticated, scoped, and ready.

Key Features

1. Managed OAuth 2.0

Composio handles the full auth flow on your behalf. You don't store tokens, you don't write callback handlers — Composio manages connected accounts per user and gives your agent clean, authorized API access.

2. 850+ Toolkits

From developer tools (GitHub, GitLab, Jira, Linear) to productivity apps (Notion, Google Drive, Calendar) to communication platforms (Slack, Discord, Gmail, Outlook) — if your agent needs to touch it, there's probably a Composio toolkit for it.

3. Python & TypeScript SDKs

First-class SDKs for both ecosystems. Works natively with LangChain, LlamaIndex, CrewAI, AutoGen, and any framework that accepts tool definitions.

from composio_langchain import ComposioToolSet, Action

toolset = ComposioToolSet()
tools = toolset.get_tools(actions=[Action.GITHUB_CREATE_ISSUE])

That's it. Your LangChain agent can now create GitHub issues on behalf of authenticated users.

4. Sandboxed Execution

Actions run in an isolated execution environment, preventing accidental data leakage between users or between agent runs. Critical for multi-tenant production deployments.

5. MCP Support

Composio supports the Model Context Protocol (MCP), meaning it integrates seamlessly with Claude via the MCP tool-use interface — no custom wiring required.

6. Observability

Every tool call is logged. You can inspect inputs, outputs, latencies, and errors — giving you the visibility needed to debug agent behavior in production.

How It Compares

Feature	Composio	Nango	Arcade	LangChain
Integrations	850+	500+	~25	DIY
Managed Auth	✅ Full OAuth	✅ Data sync	✅ Lightweight	❌ None
Agent-first SDK	✅	❌	✅	✅ (tools only)
MCP Support	✅	❌	✅	❌
Open Source	✅	✅	❌	✅

Nango is excellent for data synchronization and has strong OAuth support, but it's not primarily designed for LLM agent tool-use. Arcade is slim and MCP-native but limited in scope (25 tools). LangChain gives you the framework but leaves auth entirely to you.

Composio sits squarely in the intersection of managed auth + agent-native tooling + breadth of integrations.

Pricing

Plan	API Calls	Price
Free	20,000/mo	$0
Growth	200,000/mo	$29/mo
Pro	2,000,000/mo	$229/mo
Enterprise	Custom	Contact

The free tier is genuinely useful for personal projects and prototypes. If you're building production agents at scale, the Growth plan gets you a lot of runway for $29/month.

Who Should Use Composio?

AI agent builders who don't want to re-implement OAuth for the 10th time
Teams building multi-tenant SaaS products with agent capabilities
Developers using LangChain, CrewAI, AutoGen, or LlamaIndex who need real-world integrations
Anyone using Claude via MCP who wants 850+ tools out of the box

Getting Started

pip install composio-core composio-langchain
composio login
composio add github  # walks you through OAuth

From there, you're one get_tools() call away from a fully-authorized, production-ready agent.

Final Thoughts

The bottleneck in agentic AI isn't intelligence — it's integration. Models are getting smarter fast. But an agent that can reason brilliantly and can't actually do anything in the real world is just a very expensive chatbot.

Composio removes that bottleneck. It's the glue layer that turns LLM capability into real-world action — authenticated, observable, and production-grade.

If you're building agents and you're not using Composio (or something like it), you're probably spending 60% of your time on plumbing that someone else has already built.

Stop writing OAuth flows. Start building agents.

👉 composio.dev | GitHub

Written by Arshdeep Singh

Paperclip: The Open-Source Platform That Lets You Run a Company with AI Agents

Arshdeep Singh — Fri, 20 Mar 2026 17:00:29 +0000

What if you could hire a CEO, CTO, a team of engineers, and a marketing department — and none of them were human?

That's exactly what Paperclip is designed for. It's an open-source orchestration platform that lets you build and run fully autonomous AI companies. Not just a chatbot. Not just an AI assistant. An actual organization — with an org chart, budgets, goals, reporting lines, and agents doing real work on a schedule.

The tagline says it all: "If OpenClaw is an employee, Paperclip is the company."

The Problem Paperclip Solves

Here's a scenario that's becoming increasingly common: You have 15 AI agents running simultaneously — Claude Code sessions, Codex workers, OpenClaw agents. Each one is capable of doing serious work autonomously. But there's no coordination. No shared context. No way to know what any of them are actually doing at a given moment.

That was the exact situation Paperclip's creator found himself in. He was running an automated hedge fund and had 20 Claude Code tabs open. He couldn't remember what half of them were doing, and there was no system keeping them aligned toward a common goal. Paperclip was his solution — and he open-sourced it.

The AI worker is no longer the bottleneck. Individual agents are capable enough. The bottleneck is coordination — and that's exactly what Paperclip solves.

How It Works

Paperclip is a Node.js server with a React dashboard. You self-host it. Setup is a single command:

npx paperclipai onboard --yes

From there, the workflow is three steps:

Step 1: Define the goal
Something like: "Build the #1 AI note-taking app to $1M MRR."

Step 2: Hire the team
You assign AI agents to roles — CEO, CTO, engineers, designers, marketers. Each agent can be from any provider: OpenClaw, Claude, Codex, Cursor, even a plain Bash script. The rule is simple: if it can receive a heartbeat, it's hired.

Step 3: Approve and run
Review the CEO agent's strategy. Set monthly token budgets per agent. Hit go. The agents work autonomously, delegate to each other, and report back through the org chart.

The Architecture Under the Hood

What makes Paperclip more than a fancy task manager is how it handles the hard problems of multi-agent coordination:

Goal Alignment

Every piece of work — from a low-level coding task to a high-level project goal — traces back to the company mission. When an agent picks up a task, it knows not just what to do but why it matters. Context flows down the org chart automatically.

Heartbeats

Agents don't sit idle waiting for instructions. They wake up on a schedule, check their work queue, and act. Delegation flows up and down the org chart automatically — a task assigned to the CTO gets broken down and delegated to the engineering team without you touching anything.

Cost Control

Every agent has a monthly token budget. When they hit the limit, they stop. No runaway loops. No surprise bills. You can see cost breakdowns per agent, per task, per project.

Org Chart + Governance

Agents have titles, roles, reporting lines, and a boss. You're the board. You can approve hires, override strategy, pause any agent, or terminate one at any time. Full audit log of every decision.

Ticket System

Every conversation is threaded to a task. Sessions persist across reboots — so when an agent wakes up for its next heartbeat, it picks up exactly where it left off.

Multi-Company Support

One Paperclip deployment can manage multiple companies with complete data isolation.

A Concrete Example

Here's what a Paperclip company structure might look like:

Company Mission: Make $2M ARR with the #1 AI note-taking app

CEO → Claude          ($60/mo budget)
├── CMO → OpenClaw    ($40/mo budget)  
│   ├── Content Writer (every 4h)
│   ├── SEO Analyst    (every 8h)
│   └── Social Manager (every 12h)
├── CTO → Cursor      ($50/mo budget)
│   ├── Coder 1 → Codex
│   └── Coder 2 → Claude
└── COO → Claude      ($30/mo budget)

Coming Soon: Clipmart

Paperclip has a marketplace in the pipeline — Clipmart — where you can download and run an entire pre-built company with one click. Full org structures, agent configs, and skills bundled as templates.

Where It Falls Short

Self-hosted — you manage the infrastructure
Agent quality still matters — Paperclip handles coordination, not capability
Complex workflows need careful org design — heartbeat intervals and reporting structure require experimentation
Community is early-stage — ecosystem of pre-built templates is still thin

Who Should Be Looking at This

If you're already running multiple AI agents for development, content, research, or business automation — Paperclip is the coordination layer that's been missing.

The individual AI worker is solved. The organizational layer is just getting started.

Resources:

GitHub: https://github.com/paperclipai/paperclip
Docs: https://paperclip.ing/docs
Quickstart: npx paperclipai onboard --yes

Written by Arshdeep Singh

Composio: The Integration Layer Your AI Agent Actually Needs

Arshdeep Singh — Mon, 16 Mar 2026 13:46:31 +0000

Building an AI agent is easy. Connecting it to Gmail, Slack, GitHub, and Salesforce with proper OAuth and error handling? That's where most projects die.

What Is Composio?

Composio is AI-native integration middleware — it sits between your agent and the outside world. 850+ pre-built connectors, fully managed OAuth, sandboxed execution, and native observability.

The Core Problem It Solves

Every API integration is its own project: different auth schemes, different error formats, different rate limits. Your agent's context window fills up with tool definitions. Composio abstracts all of this.

Key Features

850+ Agent-Optimized Toolkits — GitHub, Slack, Gmail, Notion, Jira, HubSpot, and hundreds more. Each connector is built for LLM consumption — descriptions crafted so models understand when and how to use each action.

Fully Managed Auth — OAuth 2.0 end-to-end. Token storage, refresh cycles, scope management. Triggered inline when needed, not pre-configured.

Sandboxed Execution — Remote ephemeral sandboxes. Multi-step workflows with sub-LLM invocations. Large responses stored on a navigable filesystem.

Native Observability — Every tool call logged and traceable. Replay exactly what happened when your agent does something unexpected.

Quick Example

from composio import Composio

client = Composio(api_key="your_api_key")

response = client.actions.execute(
    action="github_create_issue",
    params={
        "owner": "my-org",
        "repo": "my-repo",
        "title": "Feature request from agent"
    },
    entity_id="user-123"
)

Pricing

Free: 20K tool calls/month
$29/mo: 200K calls, email support
$229/mo: 2M calls, Slack support
Enterprise: Custom, SOC-2, VPC

Composio vs Alternatives

	Composio	Nango	Arcade	LangChain
Connectors	850+	500+	~25	Community
Managed Auth	✅	✅	❌ BYO	❌ DIY
Observability	✅ Native	✅	Basic	Basic

Website: https://composio.dev | Docs: https://docs.composio.dev

Written by Arshdeep Singh