Building My Smart 2nd Brain: Part 1 - Agentic AI with RAG

#python #ai #agentic #langgraph

Overview

In my previous articles, I developed two LangGraph applications utilizing distinct personas for generating blog articles and images, with web search capabilities as tools. To complete this learning journey with LangGraph, I will now build a new application incorporating the following features:

Text embedding using a vector database for Retrieval-Augmented Generation (RAG).
Long-term memory using Sqlite to store message history.
Human-in-the-Loop (HITL) functionality for user verification of generated results.

This new project, my “Smart 2nd Brain”, is a personal knowledge management system designed to make my life more organized and spark creativity. RAG, long-term memory and HITL aren’t groundbreaking, but combining them felt like building a custom toolbox for my brain. The vector DB made retrieving my scattered notes a breeze, Sqlite gave the system a memory that rivals my own, and HITL kept me in control, letting me tweak outputs to feel just right.

This “Smart 2nd Brain” was a rewarding step in my LangGraph journey, blending AI smarts with personal flair. It’s more than a tool — it’s like an extension of my mind, helping me organize thoughts, rediscover ideas, and fuel curiosity.

The Knowledge State Object

Let's tale a look of the master state object:

from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Field

class KnowledgeState(BaseModel):
    messages: List[Dict[str, str]] = Field(
        default_factory=list,
        description="Conversation-style messages exchanged between user, AI, and system."
    )
    user_input: Optional[str] = Field(
        None, description="Raw input from user (query or new document)."
    )
    query_type: Optional[str] = Field(
        None, description="Either 'ingest' for new document or 'query' for retrieval."
    )  
    raw_document: Optional[str] = Field(
        None, description="Full raw text of the ingested document."
    )
    chunks: Optional[List[str]] = Field(
        None, description="Text chunks after splitting for embedding."
    )
    embeddings: Optional[List[List[float]]] = Field(
        None, description="Vector embeddings for the chunks."
    )
    source: Optional[str] = Field(
        None, description="Source of the document (e.g., filename, URL, transcript ID)."
    )
    categories: Optional[List[str]] = Field(
        None, description="One or more categories/tags for the document."
    )
    metadata: Optional[Dict[str, Any]] = Field(
        default_factory=dict,
        description="Additional metadata (author, date, page, etc.)."
    )
    retrieved_docs: Optional[List[Dict[str, Any]]] = Field(
        None, description="Relevant documents retrieved from vector DB."
    )
    retrieved_chunks: Optional[List[str]] = Field(
        None, description="Raw chunks retrieved from the vector DB."
    )
    generated_answer: Optional[str] = Field(
        None, description="AI-generated draft answer or summary."
    )
    final_answer: Optional[str] = Field(
        None, description="Final, human-approved answer or summary."
    )
    human_feedback: Optional[str] = Field(
        None, description="Feedback on AI output: 'approved' | 'rejected' | 'edited'."
    )
    edits: Optional[str] = Field(
        None, description="Manual corrections provided by the human."
    )
    edited_answer: Optional[str] = Field(
        None, description="The edited version of the answer provided by human feedback."
    )
    knowledge_type: Optional[str] = Field(
        None, description="Type of knowledge: 'conversational' | 'reusable' | 'verified'."
    )
    conversation_history: Optional[List[Dict[str, str]]] = Field(
        None, description="Running log of user/AI/system messages for context."
    )

    user_preferences: Optional[Dict[str, Any]] = Field(
        None, description="User customization, e.g. summary style, tone, etc."
    )
    status: Optional[str] = Field(
        None, description="Pipeline status: 'pending', 'processing', 'done', 'error'."
    )
    logs: Optional[List[str]] = Field(
        default_factory=list,
        description="Debug logs collected during pipeline execution."
    )

The KnowledgeState class includes all the details that are being passed in the course of the graph execution:

messages: Conversation-style messages for chat-like interactions
user_input: Raw user input (query or document content)
query_type: Workflow type identifier ('ingest' or 'query')
raw_document: Full document text for ingestion workflows
chunks: Text chunks after document splitting
embeddings: Vector representations of text chunks
source: Document source identifier
categories: Document classification tags
metadata: Additional structured metadata
retrieved_docs: Retrieved documents from vector database
retrieved_chunks: Raw text chunks from retrieval
generated_answer: AI-generated response draft
final_answer: Human-approved final response
human_feedback: Human feedback on AI outputs
edits: Manual corrections and edits
conversation_history: Complete conversation context
user_preferences: User customization settings
status: Current workflow status
logs: Debug and execution logs

Please note that in this KnowledgeState object some properties are cumulative (e.g. conversation_history) while some are reflecting just the latest values (e.g. generated_answer, final_answer); I will go into the details in later sections. Anyway, this state object facilitates the capability of the Smart 2nd Brain system, enables all required information can be passed throughout the graph.

The Tales of Two Paths

The application graph supports two different paths: "ingest" and "query".

The Smart 2nd Brain supports PDF document ingestion into a vector database for text embedding, this is the so-called "ingest" path. After the documents are stored into the vector database they are available for enriching the LLM to answer questions from users. We call this path as the "query" path.

In the MasterGraphBuilder class, this graph is created by the build method:

def build(self) -> CompiledStateGraph:
        """
        Build and compile the complete LangGraph workflow.

        This method constructs the workflow graph by defining nodes,
        setting up conditional routing, and establishing the flow
        between different processing stages.

        Returns:
            CompiledStateGraph: Compiled and checkpointed workflow graph

        Graph Structure:
            Entry Point: router
            Conditional Routing: Based on query_type
            Ingestion Branch: router -> chunk -> embed -> store -> END
            Query Branch: router -> retriever -> answer -> review -> validated_store -> END

        Features:
            - Conditional edges for workflow routing
            - Checkpointing for state persistence
            - Error handling and logging throughout
            - Human-in-the-loop integration
        """
        # Create the base state graph
        graph = StateGraph(KnowledgeState)

        # =============================================================================
        # NODE DEFINITIONS
        # =============================================================================

        # Add all workflow nodes
        graph.add_node("router", self.input_router)           # Entry point and routing
        graph.add_node("chunk", self.chunk_doc_node)          # Document chunking
        graph.add_node("embed", self.embed_node)              # Embedding generation
        graph.add_node("store", self.store_node)              # Document storage
        graph.add_node("retriever", self.retriever_node)      # Document retrieval
        graph.add_node("answer", self.answer_gen_node)        # Answer generation
        graph.add_node("review", self.human_review_node)      # Human review
        graph.add_node("validated_store", self.validated_store_node)  # Validated storage

        # Set the entry point
        graph.set_entry_point("router")

        # =============================================================================
        # CONDITIONAL ROUTING
        # =============================================================================

        # Define routing logic based on query_type
        def route_condition(state):
            if hasattr(state, 'query_type'):
                if state.query_type == "ingest":
                    return "ingest"
                elif state.query_type == "query":
                    return "query"
                else:
                    return "__end__"
            return "query"

        # Add conditional edges from router
        graph.add_conditional_edges("router", route_condition, {
            "ingest": "chunk",      # Route to document ingestion
            "query": "retriever",   # Route to knowledge query
            "__end__": END,         # End workflow for invalid types
        })

        # =============================================================================
        # INGESTION BRANCH
        # =============================================================================

        # Document processing pipeline
        graph.add_edge("chunk", "embed")      # Chunk -> Embed
        graph.add_edge("embed", "store")     # Embed -> Store
        graph.add_edge("store", END)         # Store -> End

        # =============================================================================
        # QUERY BRANCH
        # =============================================================================

        # Knowledge query pipeline
        graph.add_edge("retriever", "answer")           # Retrieve -> Generate
        graph.add_edge("answer", "review")             # Generate -> Review
        graph.add_edge("review", "validated_store")    # Review -> Store
        graph.add_edge("validated_store", END)         # Store -> End

        # Compile the graph with checkpointing
        return graph.compile(checkpointer=self.checkpointer)

Feeding the Brain Chunk by Chunk

Let's check the ingest branch first. If the graph is invoked for ingest, the first encountered is the chunk node.

def chunk_doc_node(self, state: KnowledgeState):
        if state.raw_document:
            # Use RecursiveCharacterTextSplitter for intelligent text segmentation
            splitter = RecursiveCharacterTextSplitter(
                chunk_size=500,     # Maximum characters per chunk
                chunk_overlap=50    # Overlap between consecutive chunks for context
            )
            state.chunks = splitter.split_text(state.raw_document)
            state.logs = (state.logs or []) + [
                f"Chunked document into {len(state.chunks)} chunks"
            ]
        else:
            state.logs = (state.logs or []) + ["No raw_document found for chunking"]

        return state

To ingest lengthy text, it must first be divided into smaller chunks for efficient processing. The chunk_doc_node function takes the raw_document and splits it into manageable segments for downstream tasks.

In LangGraph, the RecursiveCharacterTextSplitter facilitates this by breaking text into semantically coherent chunks, ideal for NLP applications like embeddings generation or RAG. It employs a hierarchical approach, using customizable separators (e.g., paragraphs, sentences, words) to prioritize natural boundaries and preserve context.

Here, we set chunk_size=500 and chunk_overlap=50, striking a balance between coherence and minimal redundancy. Once the state returns the chunks, the next step is generating embeddings.

def embed_node(self, state: KnowledgeState):
        # Use embedding model if available, otherwise fallback to placeholder
        if state.chunks:
            if self.embedding_model:
                try:
                    logger.info(f"🔤 Generating embeddings for {len(state.chunks)} chunks")
                    state.embeddings = self.embedding_model.embed_documents(state.chunks)
                    logger.info(f"✅ Generated {len(state.embeddings)} embeddings")
                except Exception as e:
                    logger.error(f"❌ Embedding generation failed: {e}")
                    # Fallback to placeholder embeddings for continued processing
                    state.embeddings = [[0.1, 0.2]] * len(state.chunks)
                    state.logs = (state.logs or []) + [f"Embedding error: {str(e)}"]
            else:
                logger.warning("⚠️ No embedding model provided, using placeholder embeddings")
                state.embeddings = [[0.1, 0.2]] * len(state.chunks)
        return state

This node transforms text chunks into high-dimensional vector embeddings that encode semantic meaning, enabling effective similarity searches. The embedding model handles this conversion, and LangChain’s AzureOpenAIEmbeddings class seamlessly integrates the model for robust embedding generation.

# Initialize Azure embedding model for document vectorization
embedding_model = AzureOpenAIEmbeddings(
                azure_deployment="text-embedding-3-small",  # Your embedding deployment name
                openai_api_version="2024-12-01-preview",    # Azure OpenAI API version
                azure_endpoint=azure_endpoint,              # Azure service endpoint
                openai_api_key=api_key                      # API key for authentication
            )

At the conclusion of the pipeline, the store_node function saves the embedded chunks into a vector database, utilizing Chroma, a widely adopted solution for vector storage.

The metadata, which includes multiple categories, is designed to enhance the querying of these embedded chunks during the query path, enabling more precise and context-aware retrieval.

def store_node(self, state: KnowledgeState):

        if state.embeddings and state.chunks:
            try:
                # Initialize vectorstore if not provided
                if not self.vectorstore:
                    self.vectorstore = Chroma(
                        collection_name="knowledge_base",
                        embedding_function=self.embedding_model,
                        persist_directory=self.chromadb_dir
                    )

                # Prepare metadata for each chunk             
                metadatas = [
                    {
                        "source": state.source or "unknown",
                        "categories": ", ".join(state.categories) if state.categories else "general",
                        "chunk_id": i
                    }
                    for i in range(len(state.chunks))
                ]

                # Store chunks with metadata in ChromaDB
                self.vectorstore.add_texts(
                    texts=state.chunks,
                    metadatas=metadatas
                )

                # Data is automatically persisted when using persist_directory

                state.status = "stored"
                state.logs = (state.logs or []) + [
                    f"Stored {len(state.chunks)} chunks in ChromaDB with categories {state.categories or ['general']}"
                ]
            except Exception as e:
                state.status = "error"
                state.logs = (state.logs or []) + [f"Storing failed: {e}"]
        else:
            state.logs = (state.logs or []) + ["No embeddings/chunks to store"]

        return state

More Exciting Features Await

Rather than overwhelming you with every detail now, I’ll explore the query path, long-term memory, and Human-in-the-Loop capabilities in upcoming articles. I’ll also showcase how to integrate a FastAPI suite with a sleek user interface to seamlessly connect all components. Stay tuned for what’s next!

(Source code will be provided when concluding this series)