Strand OS: Building a Multimodal Spatial Cognitive Agent with Gemini & Google Cloud

#agents #gemini #googlecloud #showdev

I created this project and this post for the purposes of entering the #GeminiLiveAgentChallenge.
The Problem: Knowledge is Stagnant
Information is everywhere, but true knowledge is rare. Most digital learning happens through flat lists, flashcards, or disconnected notes. I wanted to change that.
I built Strand OS—a cognitive operating system that turns your scattered notes into a living, explorable neural network.
The Solution: A Spatial Cognitive Agent
Strand OS is a full-stack system that bridges the gap between spatial memory and intelligent reasoning.
Key Features:
3D Knowledge Graph: Built with React Three Fiber, visualizing connections as a navigable star map.
Multimodal Reasoning: Uses Gemini 1.5 Pro to "see" the 3D topology and suggest tactical learning paths.
Industrial RAG Pipeline: Distills raw uploads into structured knowledge fragments, indexed in ChromaDB and archived in Google Cloud Storage.
Proactive Agent: A console-based co-pilot that manages your SRS (Spaced Repetition System) progress.
Architecture Overview
(在这里插入你那张黑色工业风架构图的图片)
The system is designed for reliability and production-grade deployment:
Frontend: React + Vite + Zustand + R3F
Backend: FastAPI + SQLModel + ChromaDB
LLM Core: Gemini 1.5 Pro (via official google-genai SDK)
Infrastructure: Deployed on Google Cloud Run + Firebase Hosting + GCS
Technical Decisions & Engineering Challenges

Identity Proof: Why the Official SDK Matters To comply with the challenge rules, I migrated the backend to use the official Google GenAI SDK. This ensures the implementation is fully auditable and aligned with Google's latest standards for building AI agents.
Resilient Execution Building an agent that feels "alive" requires low latency. I implemented a boot-time parallel preloading sequence: Fetching graph context (center + neighbors) Loading user profiles Initializing the vector store handles This hides cold-start latency and ensures the interface is ready to interact the moment the 3D world loads.
RAG Pipeline: Upload -> Distill -> Index To avoid "garbage in, garbage out," I implemented an atomic distillation layer. Raw uploads are archived in GCS, then processed by Gemini to extract concise, high-signal knowledge points before they ever hit the vector store. This significantly improves retrieval accuracy during the chat/link phase. Lessons Learned The biggest challenge was maintaining layout stability in the 3D graph while ensuring the AI's state machine remained in sync with the UI. Moving from a monolithic approach to a decoupled service layer allowed me to scale the agent’s capabilities without breaking the spatial interface. Try it out! I’ve made Strand OS fully deployable on Google Cloud. You can explore the live system or dive into the code:

GitHub Repository: https://github.com/xingyeee-L/Strand-OS
I'm excited to hear your thoughts! How do you visualize your "second brain"? Let's discuss in the comments.