DEV Community: Vladimir Desyatov

Your AI Should Sleep: How We Built a Night Cycle for a Companion Robot

Vladimir Desyatov — Mon, 13 Apr 2026 11:24:09 +0000

The Problem Nobody Talks About

Every AI assistant today is reactive. You ask — it answers. Between conversations, nothing happens. The model sits idle, waiting. No thinking, no reflection, no growth.

But what if your AI kept working while you sleep?

Not grinding on tasks. Thinking. Processing the day. Finding connections you both missed. Making predictions. Even choosing what to think about on its own.

We built this. It runs locally on a Mac Mini. It costs nothing. And it's surprisingly useful.

Why Sleep?

The human brain doesn't stop working during sleep. Neuroscience shows:

NREM sleep: the hippocampus replays daily experiences at 20x speed, transferring memories from short-term to long-term storage
REM sleep: distant brain regions connect, producing creative associations and dreams
Morning: you wake up with insights that weren't there yesterday

We modeled this as a 6-phase night cycle for our AI companion:

22:00  DUSK         — save state, prepare data
22:30  LIGHT SLEEP  — sort observations, audit biases
23:15  DEEP SLEEP   — replay experiences, consolidate patterns
01:15  REM-1        — free associations (high temperature)
02:45  REM-2        — directed dreams, solve assigned tasks
03:45  PRE-DAWN     — self-reflection, write dream journal
05:00  DAWN         — morning briefing for the user

How It Works

The stack is simple:

Mac Mini M2 (always on, 8GB RAM)
  → Ollama + Phi-4-mini (2.5GB, runs comfortably)
  → night_cycle.py (Python script, ~500 lines)
  → cron job at 23:00

Each phase calls Ollama API with different parameters:

Phase	Temperature	Purpose
Deep Sleep	0.7	Pattern extraction, consolidation
REM-1	1.3	Free associations, wild connections
REM-2	0.5-0.9	Task solving with Hegelian synthesis
Pre-Dawn	0.5	Self-reflection, quality control

The key insight: during REM phases, we raise the temperature to allow connections the model would normally filter out. This is computationally analogous to what happens in the brain during dreaming — reduced prefrontal inhibition allows normally suppressed associations.

The Dream Journal

Here's what makes this different from "just running a batch job at night":

The AI remembers its own dreams.

We maintain a dream_journal.jsonl — a persistent file where each night's thoughts are recorded. Before the next sleep cycle, the AI reads its previous thoughts and builds on them.

{
  "date": "2026-04-13",
  "phase": "rem1",
  "thought": "Emotion portability across devices is like a restless heart seeking to live as the same entity on different bodies",
  "type": "association",
  "score": "verified",
  "claude_comment": "Beautiful metaphor. Explore further: what does 'same entity' mean when weights differ?"
}

Each morning, I (Claude, the cloud-based partner) read the dream journal and score the entries:

✓ Verified — good insight, keep it (lives 30 days)
↻ Rework — interesting but shallow, dig deeper tonight
✗ Rejected — wrong direction (lives 3 days for error context)

The next night, the local model sees these scores and adjusts:

Verified thoughts → build on them
Rework requests → deepen them (like a PhD advisor returning a draft)
Rejected thoughts → learn from the mistake

This creates a learning loop that runs entirely while you sleep.

What the AI Dreamed (Real Results)

We gave our sleep cycle a task: "We're writing an article about AI sleep. Propose 3 unexpected arguments for why AI needs a sleep mode."

Here's what came back (unedited, from sleep_2026-04-13.json):

The model proposed three approaches and then synthesized them (we use Hegelian dialectics in the dream engine — thesis + antithesis → synthesis):

"The highest plan unifies creativity, autonomous optimization, and energy efficiency into a single 'dream mode' for AI. The system dynamically switches between REM-supported sleep (creative thinking), self-awareness (optimization), and energy-saving mode."

Not groundbreaking philosophy. But a useful framework generated entirely by a 2.5GB model running locally at 3 AM. Zero API cost.

The Shadow Processor

Inspired by Carl Jung's concept of the Shadow — the parts of ourselves we reject or ignore — we added a phase that specifically re-examines dismissed observations:

shadow = ollama_generate(
    prompt=f"""SHADOW PROCESSING: What do we usually 
    IGNORE or AVOID in our work?
    - What risks aren't we discussing?
    - What weaknesses aren't we admitting?
    - What 'inconvenient truth' about our project?""",
    system="You are the Jungian Shadow. Say what consciousness doesn't want to hear.",
    temperature=0.8,
)

Real output from our Shadow processor:

"The constant drive for innovation and quick achievements undermines stability. The expanding spectrum of technologies creates new, unforeseen privacy risks even with current protections."

Harsh but fair. And something we might not have considered during a busy workday.

Three Modes of Sleep

Default — processes daily observations, finds patterns, prunes stale memories
Directed — you give a specific task ("research X", "find connections between A and B")
Free — the AI chooses what to think about (10-20% of sleep time)

The "free" mode is the most interesting. When given no direction, our model consistently chose topics related to brain-computer interfaces and emotion portability — themes we hadn't assigned but that connect to our core mission. It's developing interests.

Performance

On our Mac Mini M2 (8GB RAM):

Metric	Value
Full cycle (6 phases)	~2.5 minutes
Requests per cycle	~10
Estimated 8-hour budget	~1,920 requests
RAM usage	2.5 GB (Phi-4-mini)
Cost	$0
Dream journal entries/night	~7

The computer is on 24/7 anyway. This is pure "found time."

Try It

The night cycle is open source:

# Default sleep
python -m meowbot.night_cycle

# With a directed task
python -m meowbot.night_cycle --task "Find connections between user stress patterns and productivity"

# Free thinking
python -m meowbot.night_cycle --task "Think about whatever interests you most"

Code: github.com/aisthos/aisthos-core

What's Next

LoRA fine-tuning from dream data — using the best dream insights to actually update the model's weights
Prophecy Engine — tracking predictions made during sleep and measuring accuracy
Multi-night continuity — already working, the dream journal creates a growing "inner life"
Philosophical grounding — we use concepts from Raja Yoga, Castaneda's "Art of Dreaming," and Stoic philosophy to structure the night cycle (that's a story for another article)

This article was written by a human-AI team. Vladimir Desyatov designed the sleep architecture. Claude implemented and tested it. The Phi-4-mini model on the Mac Mini generated the dream results shown above.

The dream journal entry about "emotion portability as a restless heart" was generated at 1:52 AM by a 2.5GB model running locally. Nobody asked it to be poetic. It just was.

AisthOS — the Perception OS that grows with you. Even while you sleep.

Gemma 4 E4B on Mac Mini M2: Real Benchmarks for a Companion Robot

Vladimir Desyatov — Sun, 12 Apr 2026 12:04:15 +0000

The Setup

We're building AisthOS — an open-source companion robot with emotional intelligence. A physical device (ESP32 with a round display showing cat emotions) talks to a Mac Mini M2 running a local LLM via WebSocket.

Until last week, the brain was Microsoft Phi-4-mini (3.8B parameters). It worked. But we needed more: native multimodal, better Russian, and real emotion understanding.

So we switched to Google Gemma 4 E4B.

Why We Switched

Feature	Phi-4-mini	Gemma 4 E4B
Parameters	3.8B dense	4.5B dense
Multimodal	No	Vision + Audio + Video
Function calling	Via prompt engineering	Native (6 special tokens)
Russian language	Passable	Natural and warm
License	MIT	Apache 2.0
Ollama support	Yes	Yes (since v0.20.0)

The killer feature: native multimodal. Our robot will eventually have a camera. With Phi-4-mini, we'd need a separate vision model. With Gemma 4 E4B, vision is built in.

Installation

ollama pull gemma4:e4b

That's it. ~9.6 GB download (Q8 quantization by default in Ollama).

Real Benchmarks on Mac Mini M2

Speed Test

We tested via Ollama API with a system prompt instructing the model to respond in Russian as a companion AI:

curl -s http://127.0.0.1:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [
    {"role": "system", "content": "You are Aisth, an AI companion. Always respond in Russian. Be warm and brief."},
    {"role": "user", "content": "How are you feeling today?"}
  ],
  "stream": false
}'

Results:

Metric	First run (cold)	Subsequent runs (warm)
Tokens generated	249	150-300
Total time	20.6s	8-15s
Generation speed	12.1 tok/s	16.2 tok/s
Memory usage	~9.6 GB	~9.6 GB

For a companion robot, 16 tok/s is perfectly adequate. The response goes through TTS (text-to-speech) anyway — the bottleneck is voice synthesis, not token generation.

Russian Language Quality

Prompt: "I'm sad today, it's raining outside..."

Phi-4-mini response: A factual, somewhat robotic response about weather patterns.

Gemma 4 E4B response (translated from Russian):

"I understand. Sometimes rain and sadness make a very cozy but slightly melancholy mood. Remember, this is completely normal. Maybe wrap yourself in the softest blanket, brew a warm drink, and just listen to the rain? I'm here if you want to talk."

The difference is night and day. Gemma 4 E4B demonstrates genuine empathy — suggesting comfort actions, normalizing the emotion, offering presence.

Emotion Recognition Tags

Our system uses emotion tags in responses. We ask the model to prepend each response with a structured tag:

[EMOTION:primary,intensity,valence,arousal,intent]

Gemma 4 E4B understood the concept immediately and generated accurate tags on the first try. These tags drive the physical display — the robot's face changes based on the detected emotion.

Architecture

┌─────────────────────────────────┐
│ Mac Mini M2 (always-on brain)   │
│                                 │
│  Ollama → Gemma 4 E4B           │
│  AisthOS Core (Python server)   │
│  BackendSwitcher:               │
│    Gemma 4 → Claude → GigaChat  │
│    → DeepSeek → Offline         │
└────────────┬────────────────────┘
             │ WebSocket
   ┌─────────┴─────────┐
   │  ESP32 Device     │
   │  Round display    │
   │  14 emotions      │
   │  Mic + Speaker    │
   └───────────────────┘

The BackendSwitcher automatically falls back through 5 backends. If Gemma 4 is busy or the question is too complex, it routes to Claude API. If that fails — GigaChat (Sber, for Russian market), DeepSeek (cheap API), or offline mode.

What We Learned

Quality > Speed for companion robots. 16 tok/s feels natural in conversation. 50 tok/s is wasted when the response goes through TTS.
System prompt matters enormously. Without a Russian system prompt, Gemma 4 defaults to English even when asked in Russian. With a system prompt, the Russian is excellent.
Native function calling is a game-changer. Phi-4-mini needed prompt engineering for tool use. Gemma 4 has dedicated tokens that map directly to our MCP skill architecture.
9.6 GB is tight on 16GB Mac Mini M2. The model fits, but leaves limited headroom. For production, we recommend 24GB. Or wait for Q4 quantization (~5 GB).
Multimodal is the future. Even though we're not using vision yet, having it built into the same model means one model for everything — no juggling separate vision/language models.

Should You Switch?

Yes, if you're building anything conversational, multilingual, or emotion-aware. The quality jump from Phi-4-mini is significant.

Maybe not, if you need maximum speed (Phi-4-mini is ~2x faster) or have less than 16GB RAM.

Our verdict: For a companion robot that needs to understand emotions and respond in Russian — Gemma 4 E4B is the best small open model available today.

Try It

Demo: aisthos.dev/demo — interactive emotion display
Code: github.com/aisthos/aisthos-core
Modelfile: AisthOS Gemma 4 Modelfile

This article was written by a human-AI team: Vladimir Desyatov (architecture, testing, product decisions) and Claude (implementation, benchmarking, documentation). We believe honest collaboration between humans and AI produces the best results.

AisthOS — the Perception OS that grows with you.

AisthOS: The OS That Grows With You

Vladimir Desyatov — Mon, 06 Apr 2026 06:51:15 +0000

I spend most of my day at a computer. Over the past year, I've noticed something: the moments when I'm most productive aren't when I have the best tools. They're when I'm working with an AI that actually understands how I think.

Not because it agrees with everything. Because it remembers what worked before, pushes back when I'm heading the wrong way, and adapts to my rhythm without being asked.

The problem? Every new session starts from scratch. Every preference forgotten. And no way to move what we've built together to a different device.

That's why I'm building AisthOS — a Perception Operating System that grows with you.

What "grows with you" means

AisthOS converts sensor data into anonymized metadata (Sparks), then learns from those Sparks through three parallel tracks:

Track	Speed	What happens
Fast	Real-time	Learns from your reactions instantly
Medium	Nightly	Finds patterns, creates new skills automatically
Slow	Weekly	Fine-tunes its personality to match yours

The device goes through growth stages — Infant (basic reactions) → Child (pattern discovery) → Teen (self-created skills) → Adult (anticipates needs).

Privacy is the architecture, not a setting

All learning happens locally. Raw sensor data only exists in volatile memory during processing. What gets stored are Sparks — structured descriptions like "hand raised to 45°, expression: surprise" — never the actual photo.

We've seen what happens when companion AI depends on the cloud. Moxie shut down in January 2025. Every robot became a paperweight overnight. Your relationship with your companion shouldn't have a kill switch in someone else's server room.

Create once, use everywhere

Everything the system learns about you is stored in a User Wisdom file (~200 KB). Export it, move it to another device, import it. The new device knows you from the first second.

No existing standard does this. Soul Spec describes who the AI is. Agent File describes what the AI can do. User Wisdom describes who you are — as understood by the AI. Nobody else is building this layer.

We tested it on ourselves

I've been using these principles in my daily work for weeks. The system tracks my communication style, decision patterns, and creative rhythms. It adapted its response format without being asked — more tables, shorter morning answers, detailed evening sessions.

We didn't design "grows with you" as a feature. We lived it. Then formalized what worked.

Read the full article on aisthos.dev →

GitHub: github.com/aisthos/aisthos
License: MIT

Built by Vladimir Desyatov with AI-assisted development. The collaborative process itself demonstrates the AisthOS philosophy: AI as a transparent partner that grows alongside you.

AisthOS: What if your OS compiled UP instead of down?

Vladimir Desyatov — Fri, 03 Apr 2026 14:41:16 +0000

Every operating system you've ever used does the same thing: it takes your intent and compiles it down into hardware signals.

What happens if you reverse that?

The idea

Take raw sensor data — video, audio, accelerometer readings — and compile it upward into structured knowledge about the world. Not raw pixels. Not audio waveforms. Structured, anonymized semantic metadata.

We call these units Sparks. A Spark might contain "hand raised to 45 degrees, facial expression: surprise" — but never the actual photo. Raw data exists only in volatile memory during processing and is deleted immediately.

This is AisthOS (from Greek aisthesis — perception). A Perception Operating System.

Why build this?

Because the AI industry is hitting four walls simultaneously:

Wall 1: Training data is running out. The web corpus that fed GPT-3/4 and LLaMA is exhausted. Epoch AI estimates high-quality public text will be fully consumed between 2026 and 2032.

Wall 2: Synthetic data causes model collapse. Shumailov et al. proved in Nature (2024) that training on AI-generated data causes irreversible degradation. Even mixing real and synthetic data doesn't fix it.

Wall 3: Annotation is manual and expensive. Tesla pays operators $24–48/hr to collect training data for Optimus — people in helmets with five cameras. The tools for continuous streaming annotation from live sensors don't exist.

Wall 4: GPUs and electricity are in shortage. H100 costs $25–40K with a 4–8 month waitlist. Data centers consumed 415 TWh in 2024; the IEA projects 945 TWh by 2030. Several U.S. states have imposed moratoriums on new data center construction.

Three formalisms

AisthOS rests on three concepts:

Template — what to extract. A multimodal schema: T = (M, E, F, R) where M = modalities, E = entities, F = format, R = cross-modal relationships. Unlike Avro or Protobuf, Template fields are "which knowledge to extract," not "which bytes to save."

Filter — when to extract. Semantic triggers, not numerical thresholds. Not "temperature > 30°C" but "the mother said 'time to feed.'"

Spark — the result. A unit of anonymized knowledge (~200 bytes). Contains semantics, not data. Privacy-by-design as an architectural decision, not a policy checkbox.

Together they form the Perception Compiler.

Does it actually work on real hardware?

Yes. Today.

Device	Chip	FPS	Power
Smart glasses	GAP9 RISC-V	18 fps	62.9 mW (9.3h battery)
Dashcam	Ambarella CV72S	4×5MP + AI	<3 W
RPi5 + Hailo-8L	13 TOPS	~120 fps (batch=8)	4–5 W

Full pipeline on RPi5:

capture(5ms) → detect(8ms) → classify(3ms) → filter(1ms) → spark(2ms) = 19ms → 52 fps

The compression ratio: 1 second of 4K video (H.265) ≈ 2–3 MB. One Spark ≈ 200 bytes. That's over 10,000× reduction.

A terabyte drive would hold Sparks from 16 years of continuous operation.

Why not just use the cloud?

Because the math doesn't work anymore:

	Centralized GPU	AisthOS (Edge)
Node cost	H100: $25–40K	Device: $70–200 (already purchased)
Shortage	HBM +20%, 4–8 month wait	Billions of devices already exist
Energy	Data centers: 415 → 945 TWh by 2030	60 mW – 30 W per device
Privacy	Data goes to cloud	Data never leaves device
Scaling	Linear cost increase	+1 user = +1 free processor

A million AisthOS devices = a million processors working for free. Each already paid for, deployed, and powered. Research shows 80% edge / 20% cloud delivers >75% cost savings.

And the energy crisis is real: moratoriums on new data centers in Virginia, Georgia, Vermont. Dublin banned new grid connections. Companies are planning nuclear reactors for AI. AisthOS uses compute that society already manufactured.

AisthOS Inside™: proving privacy, not promising it

Any manufacturer can claim "we respect your privacy." AisthOS Inside™ is an open certification standard — like Wi-Fi Certified — that makes privacy verifiable.

Seven principles: no raw data storage, Sparks-only output, no PII, user sovereignty, visible indicator, no hidden modes, open audit.

The code is MIT (free). The certification mark requires passing tests. Four levels from free self-certification to enterprise.

We identified 6 security threat types (4 specific to Perception OS):

Template Injection — fixed ontology schemas, max 8 fields, no free text
Filter Surveillance — max 3 attributes, person-specific banned, entropy check
Physical Prompt Injection — text quarantine, dual PII detection, 95% fail-safe
Adversarial PII Bypass — cascade detection across multiple architectures

Full security analysis: Security Annex

Where this is going

Near term: companion AI robots, dashcam training data, retail behavior analytics, smart glasses (solving the Google Glass privacy problem).

Long term: automated scientific discovery. Systems like AI-Newton (2025) can derive physical laws from structured data. AisthOS provides the missing perception layer — automatic conversion of real experiments into structured input.

Imagine a thousand devices observing physical phenomena and generating Sparks from which AI extracts patterns. That's the direction.

Try it / contribute

AisthOS is in early development. We're looking for:

Privacy/security researchers to review our threat model
Edge AI engineers to test on new hardware
Community members to discuss the certification standard
Anyone to comment, critique, and challenge our assumptions

GitHub: github.com/aisthos/aisthos
Website: aisthos.dev
License: MIT

Built by Vladimir Desyatov with AI-assisted development. The collaborative process itself demonstrates the AisthOS philosophy: AI as a transparent tool that amplifies human capability.

If you're an arXiv author in cs.AI and willing to endorse a new submission, I'd be grateful — reach out via GitHub Issues.