DEV Community: Josh Adler

Everyone's Launching Wrappers. Nobody's Going Deep.

Josh Adler — Thu, 04 Jun 2026 22:45:17 +0000

You know that feeling when you open someone's "AI-powered" product, view source, and realize the entire intelligence layer is a single API call with a system prompt? I get that feeling about four times a week now.

I'm building memory for AI agents. Not the kind where you shove conversation logs into a vector database and run similarity search, which is what every tutorial teaches and every wrapper product ships. The kind where you actually measure whether retrieval works, find out it doesn't, and spend three months fixing it instead of launching.

Here's what "going deep" looks like day to day, because I think people romanticize it and the reality is mostly spreadsheets and DoorDash at 2am.

The boring part nobody shows you

The first thing I did was run a benchmark against my own retrieval pipeline. Ground-truth questions, known correct answers. The results were bad. Not "needs tuning" bad. The system was confidently retrieving wrong memories and missing obvious temporal references, confusing things said last week with things said six months ago, mixing up which person said what in multi-party conversations.

I categorized 357 failures by hand. Two weeks of reading each failed retrieval and classifying why. The finding: 92% of failures were retrieval failures, not reasoning failures. The data was in the database. The search couldn't find it.

I confirmed with an oracle test. Bypassed retrieval, gave the model the full conversation as context. Accuracy jumped to 93.8%. The information was always there. The search layer was broken. The entire field was focused on improving the reasoning layer while the retrieval layer underneath was silently failing, and nobody had checked because the failures are invisible. The system returns results. They just happen to be the wrong results.

So then I needed to understand how much the embedding model and reranker choice mattered. I built a test rig: 7 embedding models crossed with 8 rerankers, 56 combinations, each evaluated against 1,540 ground-truth questions. About 26,000 total evaluations.

Nobody had published this comparison before. The reason is simple: it's tedious work with no shortcut. You configure, run, wait, record, repeat. For weeks.

What the data showed

The spread across all 56 combinations was 3.2 percentage points (89.9% to 93.1%). Most products never test a single combination. They use whatever the tutorial picked.

The finding that broke my brain: a $0.40 per million token model with 100 retrieved memories beat a $15 per million token model with 15 retrieved memories. The cheap model with better retrieval recovered 82% of errors. The expensive model with worse retrieval recovered 54%. Retrieval quality dominated model quality completely. Optimizing your search pipeline was worth more than a model upgrade costing 37 times as much.

I also found a silent bug in my own code during this process. A script was loading MiniLM instead of the GTE ModernBERT reranker I'd configured. No error, no warning. Just quietly wrong. If I hadn't been running ground-truth benchmarks I never would have caught it. This exact type of misconfiguration is sitting in production systems that have never been tested against known correct answers.

What "going deep" actually means in practice

It means choosing SQLite over Pinecone and everyone thinking you're not serious. But the constraint forced a hybrid search pipeline (sparse FTS5 plus dense vector search, reciprocal rank fusion, cross-encoder reranking) that runs on a Raspberry Pi for $12/month. The whole system scores within 3 points of setups requiring $150 to $400/month in GPU infrastructure. One file, no cluster, no excuses. If retrieval breaks, the architecture broke, and you fix the actual problem instead of blaming infrastructure.

It means reading neuroscience papers about how the hippocampus filters incoming memories and building a three-signal encoding gate (novelty, salience, prediction error) instead of just storing everything and hoping retrieval sorts it out. Your brain doesn't record everything, it runs a filter first, and that's not a limitation, it's the mechanism that makes retrieval work. Less noise going in means better results coming out. The benchmarks supported this approach.

It means writing a research paper and getting it on arXiv instead of shipping the next feature. The paper (arXiv:2605.04897) has methodology, controlled benchmarks, and reproducible results. If the claims were going to hold up, the data had to be public.

That's the foundation TrueMemory is built on. Research first, product second.

Why this matters for builders

Anthropic is shipping native memory for Claude. OpenAI is building memory into ChatGPT. Google's Gemini remembers conversations. Every platform is adding memory as a checkbox feature.

When the platform ships a native version of your wrapper, you die. Not because their version is better but because it's already installed and free. Meeting summarizers learned this last year when Zoom, Meet, and Teams all shipped native summarization within months of each other. The platform doesn't need a good version, just a good enough version with better distribution than you'll ever have. The bar for survival is high.

If you're building an AI product, here's my suggestion: run a benchmark. A real one, with ground-truth answers. Measure whether your retrieval actually returns correct results or just plausible-looking ones. You might not like what you find, but at least you'll know what you're shipping.

Everyone's launching wrappers. The ones that survive will be the ones that went deep enough to own something real.

Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.

Your Brain Doesn't Have a Paste Button

Josh Adler — Tue, 02 Jun 2026 08:08:32 +0000

I maintained a file called skippy-context-may.md for months. Four hundred lines of project state, architectural decisions, tool versions, things that broke, things I fixed. Every new AI session started the same way: open the file, select all, paste, then fill in whatever happened since my last edit. It was fifteen minutes of overhead every single day and I told myself it was just part of the workflow.

Then I automated it away and realized how much time I'd been wasting. Six hours a month, minimum, on a ritual that felt productive but was really just me being my AI's secretary.

How automatic ingestion works in practice

The system runs four hooks during normal Claude Code usage. One fires at session start, one on prompt submission, one on stop, one on compaction. The stop hook is the one that does the heavy lifting. After a session ends, it grabs the full conversation transcript and runs an extraction pipeline against it.

The extraction pulls seven categories of information out of the raw text:

Personal facts (stable things about the user)
Preferences (choices you've expressed)
Decisions (conclusions you reached during a session)
Corrections (moments you changed a prior answer)
Temporal events (dates, deadlines, things tied to specific days)
Technical context (configs, architecture choices, tool versions)
Project state (what's working, what's broken, what's next)

Each category matters because it serves a different retrieval pattern later. When the system needs to know what tools you use, it pulls preferences and technical context. When you're about to make an architecture decision, it surfaces prior decisions. When you corrected yourself three weeks ago, the correction overrides the old answer instead of competing with it.

The encoding gate: what gets stored and what gets dropped

Raw extraction produces too many candidates. Fifteen sessions a day across three projects means hundreds of potential memories, most of which are noise. "Fix the indentation on line 47" is not worth storing. "The indentation convention for this project is tabs" probably is.

The encoding gate handles the filtering. Three signals score each candidate memory:

Novelty - how different is this from what we already know?
Salience - how important is this information?
Prediction error - does this contradict something we already stored?

The scores combine into a threshold decision. Clear the gate, you get stored. Don't clear it, you get dropped. Not archived, dropped. Because storing everything makes retrieval worse, not better. Every stored memory competes with every other memory during search.

The prediction error signal does something counterintuitive with contradictions. When a new memory contradicts an existing one, the prediction error spikes and actually lowers the storage threshold. Contradictions get stored more easily because they mean the user changed their mind. I said I preferred npm in March. By May I was using Bun and never explicitly said "I switched." The system caught the behavioral shift and encoded the new preference without me having to announce it.

What this looks like after a month

By day three I had over two hundred extracted memories from about a dozen sessions. Zero saved manually. The skippy-context-may.md file was already stale and I kept opening it out of habit before realizing the system already knew everything in there plus a hundred things I never wrote down.

The real proof came around week three. I was debugging a Pi node that kept dropping its NAS connection every two hours. Couldn't figure it out. Before I'd even finished describing the problem, the system surfaced a memory from eleven days earlier: during an unrelated router configuration session, I'd mentioned I changed the DHCP lease time from 24 hours to 2 hours. One throwaway sentence. The system stored it as technical context, and eleven days later it turned out to be the root cause.

By week four the system was flagging contradictions in my own decisions. I'd said I wanted SQLite for everything early on, then started quietly exploring Postgres for one specific use case. When I was making a related architecture decision, the system surfaced both positions and asked if I wanted to update the earlier one. A context file can't do that. It doesn't know the difference between you changing your mind and you forgetting what you decided.

Cross-project connections are the other thing you can't replicate manually. A debugging insight from one project showed up as relevant context in a different project weeks later because the underlying pattern was the same. That only works because TrueMemory stores memories without scoping them to a single project.

Why most products skip ingestion

Honestly, it's because search is easier. Vector embeddings, reranking, RAG pipelines, these are well-understood problems with a dozen open-source implementations. You can get 86% on LoCoMo with off-the-shelf tools.

Ingestion is a judgment problem. You're deciding what's worth keeping before you know what future query will need it. That's architecturally harder than matching queries to documents. It requires real-time evaluation of novelty, salience, and contradiction state, plus scale handling as the memory store grows.

Most companies skip the hard part and build better search on top of whatever the user manually saves. It works, but it's not memory, it's a notebook with good search. The full architecture, including the encoding gate design and benchmark results, is in the arXiv paper.

Your brain doesn't have a paste button. It doesn't need one. And after a month of running a system that actually handles ingestion automatically, I can't go back to the old way.

Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.

The Best Technology Disappears

Josh Adler — Sun, 31 May 2026 05:56:08 +0000

Your keyboard app is the most important app on your phone and you have never once thought about it. You don't know what version it's on. You've never read its changelog. You couldn't name a single feature it shipped in the last year. It just works, and that's the whole point, and also that's the highest bar any piece of technology can clear.

The technologies that actually changed how people live all share one trait: they disappeared. Not failed, not faded. They got so good at solving their problem that users stopped noticing them entirely.

The progression every technology follows

Every category that truly won followed the same four-stage path.

Visible. You notice it. You learn it. You fight with it. Early GPS was like this. Staring at the screen, second-guessing routes, squinting at maps.

Useful. You start relying on it but you're still aware. You trust the GPS but you glance at the route before driving.

Habitual. You stop questioning it. You follow the blue line without thinking. You click the first Google result without scanning alternatives.

Invisible. You stop experiencing the technology entirely. You experience the outcome. Not GPS but the turn. Not Google but the answer. Not autocorrect but a correct text message.

That last step is where the value is. Google didn't become a $1.7 trillion company because it had a clean UI. It got there because the first result was usually right, which meant you never thought about search. You typed, got your answer, moved on. The entire valuation traces back to the experience of not thinking about Google while using Google.

Why most products never get there

Most products don't even try to disappear. They actively resist it. Every notification is the system saying "remember me." Every loading screen, onboarding tooltip, rating prompt, and "what's new" modal is the product waving its hand at you when the ideal outcome would be you forgetting it exists.

Every one of those moments is a design failure. Not a marketing opportunity. A failure. Because in that instant the user became conscious they're using a tool instead of just doing the thing they wanted to do.

Disappearing requires solving every edge case. Not most, all. One bad autocorrect pulls you out. One wrong reroute makes you aware of satellites. One buffering spinner breaks the spell. The tech has to be right every time or close enough that the misses feel like flukes. That is an absurdly high bar, which is why the products that clear it tend to be worth hundreds of billions of dollars.

As a builder, this changes how you think

The key insight for anyone shipping product: invisibility is not polish. It's not a thing you add after engineering is done. It's the design philosophy from the first commit.

You have to architect for disappearance. You have to solve the problem so completely that there's nothing left for the user to think about. TikTok's recommendation engine is one of the most sophisticated ML systems in production anywhere, transformer models, reinforcement learning, multi-armed bandits, and they never show you any of it. Because showing it would break the experience. The magic requires the magician to vanish.

If you're building developer tools, this applies directly. The best CLI is the one nobody notices running. The best CI pipeline is the one developers forget exists because it just catches things. The best monitoring is the alert you never see because the system healed itself. The best linter is the one that fixed the problem before you knew there was a problem.

Think about git for a second. You probably use it fifty times a day and never think about distributed hash graphs or Merkle trees or content-addressable storage. It disappeared. The technology is invisible and all you experience is: my code is saved, my changes are tracked, I can go back if something breaks. That's the invisible stack doing its job.

AI hasn't disappeared yet, and that's the gap

Every AI product right now is stuck firmly in the "visible" stage. Chatbots. Prompts. Copy-paste workflows. Context windows you manage manually. System prompts you write and rewrite. Every interaction announces: you are using AI.

Some are approaching useful. A few are getting habitual. None have vanished. And the reason is telling. Most AI tools are built around the chat interface, which is the technology making itself visible by design. The prompt box is a loading screen. The conversation thread is a changelog you didn't ask for. Every "how can I help you" is the system reminding you it exists.

This is what drives the work at TrueMemory. The question isn't "how do we build a better memory tool" but "how do we build a memory system the user forgets about." The architecture follows biological memory patterns: an encoding gate that filters before storage, automatic novelty and salience scoring, natural decay that keeps things manageable. The full system is detailed in the research paper but honestly the thesis is one sentence: if you notice the memory system, it failed.

Every feature gets one test. Does this make the system more visible or less? If the user has to remember to save something, the design isn't done. If they have to open an app to store context that should have been captured automatically, the design isn't done. Every manual step is the technology announcing itself.

The trajectory is the same one GPS and search and autocorrect followed. The first personal AI system that crosses from habitual to invisible wins everything.

That's the only step that matters.

Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.

The Real Moat Isn't Software

Josh Adler — Sat, 30 May 2026 02:00:39 +0000

Last month I ripped five 64MP cameras out of a wall-mounted sensor network and replaced them with 12MP ones. Downgrade on paper. Best decision I made all year.

The Problem Nobody Is Solving

Your AI knows what you type. That's it. Every piece of context, every preference, every behavioral pattern your AI has about you came through a text box. You manually told it, during a conversation you chose to have, about a topic you remembered to bring up.

Meanwhile the stuff that actually defines your behavior is invisible to you. You don't notice that you pace when you're anxious. You don't track how long you actually sit at your desk versus how long you think you do. You tell your AI you work out four times a week when you go twice.

The models are smart enough. The input layer is broken.

What I Built: Paradox

Five nodes. Each one is a Raspberry Pi Zero 2W ($15), an ArduCam IMX708 12MP camera with 120-degree FOV, and a WM8960 audio HAT for microphone capture. About $100 per node, $500 total.

Each node runs a custom Python daemon that handles:

Motion detection on a low-res 320x240 stream
Audio detection via the WM8960
Triggered recording at 1280x720 @ 15fps when motion or audio fires
MJPEG and H.264 streaming to a NAS

Inference runs on an RTX 5090 on my local network. The whole thing fits on a desk.. well kinda, minus the cameras on the walls.

The Camera Saga

I started with OwlSight 64MP sensors using the ov64a40 driver. On paper, incredible. In practice, a nightmare.

The Pi Zero 2W would thermal throttle within twenty minutes. I'm talking 80C+ temps on a board that draws 4W under camera load. The dtoverlay configuration needed a specific link-frequency parameter (link-frequency=360000000) that I spent entire nights debugging. One node would initialize fine, an identical SD card image on the next node would fail. The answer was always something dumb: a loose ribbon cable, a kernel version mismatch, a PSU that couldn't sustain the current draw.

I eventually switched everything to the IMX708 with a simple dtoverlay=imx708 config. Less flashy specs, dramatically more stable. The boring choice was the right choice.

If you're building hardware: optimize for "does it actually work at 3am when nobody's watching," not for the spec sheet.

What the Data Showed

Within the first week, the system captured patterns I never would have typed into a chat window. Movement patterns through my apartment, actual sleep schedule versus what I'd report, real desk time versus perceived desk time. One hour of physical observation generates more behavioral data than a year of chat transcripts.

That's not an exaggeration. That's the gap.

The Three-Layer Stack

Here's the framework that I keep coming back to:

Layer 1: Observation. Getting data from the physical world into a format AI can process. Cameras, microphones, sensors, wearables. This is what Paradox does.

Layer 2: Memory. Taking raw observational data plus conversational data and encoding it intelligently. Deciding what matters, letting stale information decay, surfacing the right context at the right time. This is what I built TrueMemory to solve. The architecture is in my arXiv paper, and it's based on how biological memory actually works: encoding gates, salience scoring, temporal decay.

Layer 3: Reasoning. The LLM. Claude, GPT, whatever comes next.

Right now, billions of dollars are flowing into Layer 3. Anthropic, OpenAI, Google, all building better reasoning engines. And they're getting incredible. But Layer 3 is reasoning on top of almost nothing because Layers 1 and 2 barely exist.

It's like building the most powerful engine in the world and putting it in a car with no windows.

What Developers Should Take Away

Software wrappers get replicated in a weekend. A better RAG pipeline, a smarter reranking algorithm, a novel encoding gate, those are all real innovations but they're also all just code. Somebody reads your paper, understands the approach, ships their own version.

Hardware can't be replicated like that. The physical deployment, sensor calibration, months of debugging driver conflicts and thermal issues and network topology, that's a different kind of moat entirely.

If you're looking for an interesting project:

Start with a single Pi Zero 2W and an IMX708. Total cost under $50. Get motion detection working with picamera2 and a basic frame-differencing algorithm.
Ship the data somewhere useful. A NAS, a cloud bucket, even a local SSD. The storage pipeline matters more than the capture quality.
Build the memory layer. Don't just store raw footage. Extract behavioral patterns, encode them, make them searchable. This is the hard part and the interesting part.

The observation layer is the missing piece in AI. Everyone is building smarter reasoning on top of the same garbage input. Nobody is fixing the input.

Honest Limitations

The Pi Zero 2W draws about 1.5W idle but spikes to nearly 4W under camera load. Battery operation is not Realistic. These need to be plugged in.

Five cameras at 15fps generates a lot of data. Even with motion-triggered recording, my NAS fills up faster than I'd like. I spent a week building a cleanup pipeline just to keep storage from overflowing.

And there's the social cost. My girlfriend didn't talk to me for two days after I installed the cameras. We worked it out, there are zones now, rooms where the cameras don't run. But social acceptability is a constraint as hard as any engineering limitation. You can't debug your way out of it.

The Point

Nobody is going to win the AI race by building a better chat interface. The chat interface is a temporary artifact of the fact that we haven't figured out how to get AI into the room with you.

I don't have this figured out. I have five cameras generating data I'm still learning to process, a NAS that fills up too fast, and a lot of 2am debugging sessions behind me. But I know the moat isn't who builds the cleverest wrapper. It's who gets AI into the physical world first.

That's a hardware problem. And it's a lot harder than fine-tuning a prompt template.

Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.

Everyone's Building Jarvis. Nobody's Even Close.

Josh Adler — Tue, 26 May 2026 02:07:11 +0000

A Swiss Army knife is a terrible knife.

It's a terrible screwdriver, a terrible bottle opener, and a terrible saw. The only thing a Swiss Army knife is genuinely good at is being small enough to carry around, and the reality of "small enough to carry" is not a strong engineering thesis.

But that's exactly what everyone in AI is building right now.. the everything assistant. The one that reads your email, manages your calendar, writes your code, books your flights, and files your taxes. The pitch sounds incredible but the product is a mediocre version of six different tools duct-taped together.

I know because I built one.

I Built the Swiss Army Knife

I spent about three months on a product called Skippy. It leveraged OCR screen capture, email ingestion, calendar sync, and pattern recognition to become an AI assistant that understood your whole digital life. It attracted a scary amount of interest from investors and Reddit before I'd even launched, people were just asking for beta access non-stop.

And I shelved it. Not because it failed, but because the more I used it the more I realized that a tool trying to do everything just ends up doing nothing well enough to actually rely on. I was playing with automations, making reservations, ordering food on DoorDash, and every single integration felt like a worse version of the thing it was replacing. You sacrifice everything for the benefit of having everything, and the benefit isn't actually that great.

If you don't believe me, go look at Killed by Google sometime. Google Inbox. Google Allo. Google Hangouts. Google Wave. Google Stadia. Products backed by billions of dollars and thousands of engineers. Dead. If Google with functionally infinite resources can't sustain multi-feature products, what makes a solo dev stitching 15 libraries together in Bali think they're going to pull it off?

The tools that actually win do one thing exceptionally well, so well you stop noticing they exist. That's the product thesis nobody in the personal AI space seems to accept, and honestly I think it's because the Jarvis fantasy is just too seductive to let go of.

Local Models Are Not the Answer

And while we're on the topic of things people don't want to hear, quick reality check for the r/LocalLLaMA crowd: your MacBook is not a datacenter.

Open-source models are genuinely impressive for what they are, I'm not going to pretend otherwise. But the gap between a 70B open model and frontier production models from Anthropic or OpenAI is not a crack, it's a chasm. There's a reason the GPU shortage exists and there's a reason inference at scale costs what it costs.

I actually went through a phase where I thought there had to be models you could run locally that would be comparable. So I built a home lab, RTX 5090, RTX 6000 PRO, 256GB DDR5, 128TB NAS, 42U rack, the whole setup. I use local models for experimentation and fine-tuning constantly but what I don't do is pretend they're competitive with frontier intelligence at tasks where output quality actually matters. That's not pessimism, that's just what the benchmarks say. If it were actually possible to match frontier quality on consumer hardware, companies like Anthropic wouldn't exist and NVIDIA wouldn't have the market cap it does.

Vibe Coders vs. Orchestrators

This one's going to piss some people off, but it needs saying.

Most developers using AI to write code are getting worse at their jobs, not better. And that's coming from someone who uses AI to write code every single day.

What good AI-assisted development looks like is basically pair programming, which has been around since the beginning of time. You direct, you review, you push back when the model suggests something dumb. You understand every line that ships and the AI just accelerates your judgment rather than replacing it.

What's actually happening is people type "build me a todo app with auth" into Cursor, tab-accept whatever comes out, run npm run dev, take a screenshot, and post it to Reddit as something they "built."

That's not engineering. That's pulling on a slot machine lever and hoping for 7's.

These vibe coders, use the term loosely, can't debug their own code because it was never their code. They don't understand the architecture, they can't explain the state management pattern, and when production breaks at 2 AM they're completely lost because they vibed with the output and never actually directed it. Go on Reddit for fifteen minutes and you'll see people pushing AI slop for days, they didn't even change the default Claude color scheme, you click buttons and they don't work, and they can't fix it because they don't even understand the causality of the bug.

AI is meant to speed up your pace of development. Not replace the need to understand what you built.

The orchestrators are the ones who will still have jobs in five years. They use AI more aggressively than the vibe coders actually, but they understand every line. They refactor. They question the model's choices. They treat AI as a power tool, not a replacement for skill. Prompt engineering is a skill of its own, and leveraging other skills to prompt engineer more effectively is a skill of its own too. People underestimate this hard.

Why Everything Breaks Without Memory

And here's the thing that ties all of this together, the part that nobody's really talking about.

There's a book series called Expeditionary Forces where an alien elder AI named Skippy can literally manipulate wormholes. Omniscient-level intelligence. But it has a fatal design flaw: it only answers exactly what you ask.

Ask "is there danger ahead?" and Skippy says no. Because you didn't ask about danger to the left, or danger arriving in thirty seconds. The answer was technically correct and also catastrophically incomplete.

Sound familiar?

Ask Claude a question and you'll get a brilliant answer, scoped precisely to what you asked. But it won't mention the related problem from last week, it won't connect the dots to the bug you introduced three months ago, it won't anticipate what you actually need versus what you literally typed. And that's not because the model is bad at reasoning, it's because it has no memory. No continuity. No accumulated understanding of you or your work. Every session starts completely from zero. Real reasoning isn't just answering your question, it's answering the pieces surrounding it too, and AI can't do that without knowing what you've been working on, what's gone wrong before, what you actually care about.

TrueMemory

That's what I built TrueMemory to fix, by enabling persistent memory that survives across sessions. It has an encoding gate inspired by how biological memory works by that evaluating the novelty, salience, and prediction error before deciding what to store. It's not a vector dump or a conversation log, it's a system that watches your workflow and decides what matters the same way a brain does.

The architecture and benchmarks are in my arXiv paper.

The bottleneck in AI right now isn't intelligence. It's that your model forgets you exist every time you close the tab. Everyone's building Jarvis and nobody's even close, because they keep building the mouth and the hands and nobody's building the brain.

Josh Adler builds persistent memory systems for AI. Research: arXiv:2605.04897. More at joshadler.com.

There Are Cameras in Every Room of My House. I Put Them There.

Josh Adler — Sat, 23 May 2026 01:29:45 +0000

My girlfriend asked why there's a red light blinking in the bedroom at 3 AM. I told her it's for the AI. She didn't talk to me for two days.

I know how that sounds. But I'm trying to solve a problem that nobody else seems to want to touch: giving AI access to the physical world.

Every AI product right now knows you through text or voice. What you type into a prompt. What you paste into a context window. Maybe your calendar, your emails, your screen. But your actual life? The one that happens in physical space? Your AI knows nothing about it.

Last year I built a product that used OCR to grab my screen, pulled in emails, tried to understand patterns. Investors loved it. Reddit loved it. And it was still fundamentally blind. It could see my screen but it couldn't see me. It knew what I typed but not what I did.

That gap bothered me for months. Then I did something about it.

The hardware

I built a network of cameras and microphones in my house and wired them into a pipeline:

5x Raspberry Pi Zero 2W ($15 each)
5x ArduCam IMX708 12MP 120° wide-angle cameras
5x WM8960 audio HATs for ambient sound capture
1x Ugreen NAS for storage
Custom Python daemon: motion detection, triggered recording, sleep when idle

Total hardware cost: under $500. I spent more on the camera modules I threw away than the ones that worked.

I spent weeks debugging device tree overlays. Swapped camera modules three times before finding ones that actually performed. Burned through two Pi Zeros that couldn't handle the thermal load. This wasn't a weekend project someone vibed together. This was real infrastructure.

The cameras have been recording for months. Writing to SD cards. Capturing fragments of my daily life. Motion clips. Audio snippets. And I won't be analyzing it manually. Claude will.

Why physical-world data matters more than prompts

Nobody tells their AI "I've been pacing around my office for 20 minutes." Nobody types "I skipped lunch again today." Nobody prompts "I've been staring at the same file for an hour without making a single edit."

But a camera sees all of that. And that context is worth more than a thousand carefully worded prompts.

Think about the people who actually know you. Not your boss. Your boss knows nothing about you other than your output. The people who really know you. They know your tells. They know you fidget when you're nervous, that you pace the room when you're stuck. That stuff isn't in any context window. But it's the difference between software that assists you and something that actually understands you.

The stack nobody's building

The whole industry is trying to make AI feel more human by tweaking the output. "Don't say awesome." "Match the user's tone." But the problem isn't the output. It's the input. They're training on polished, sanitized datasets and then wondering why it still feels like AI.

Making AI more human isn't about adjusting personality settings or temperature. It goes deeper than tone. Who you are. What you value. How you think. Everyone has different values and a generalized AI is never going to capture that.

Here's what I think the real stack looks like for AI that actually knows you:

Observation layer - cameras, mics, sensors, the physical world
Memory layer — persistent, cross-session, not just a context window
Reasoning layer — the model, which is already good enough

Everyone is pouring billions into layer 3. Almost nobody is building layers 1 and 2. The models are smart enough. That's not the bottleneck anymore. The bottleneck is that your AI has never seen you. It's never been in the room. It's a hyper-intelligent entity trapped behind a text box.

I built TrueMemory to solve layer 2 — persistent memory that follows you across AI sessions. My research on cognitive memory architectures is published on arXiv. Now I'm working on layer 1.

I'm not asking for permission. I'm just showing you what's coming.

Josh Adler is a researcher and builder. More at joshadler.com.