Sopaco

Posted on Oct 12

Litho Project Architecture Analysis: How the Four-Stage Pipeline Enables Automated Documentation Generation

#openai #gemini #deepseek #deepwiki

Imagine inheriting a massive legacy system with hundreds of thousands of lines of code but only a few pages of outdated documentation. Have you ever lost sleep over that feeling of helplessness? Litho was born to solve this "documentation dilemma" in modern software development.
Project Open Source Address: https://github.com/sopaco/deepwiki-rs

Introduction: The Transformation Journey from Code Jungle to Knowledge Map

In today's fast-paced software development world, we often fall into a strange cycle: code evolves continuously, features keep increasing, but documentation always lags behind. New team members need weeks to understand the system architecture, technical debt accumulates silently, and team communication costs remain high.

Litho (also known as deepwiki-rs) emerges like an "explorer" and "cartographer" that never tires for this code jungle. It can delve into every corner of the code, understand design intentions, and transform everything into clear technical documentation.

Let's follow Litho's footsteps and begin this wonderful journey from raw code to structured knowledge.

Chapter 1: Meeting Litho - The Intelligent Cartographer of the Code World

1.1 Project Background

The story begins with a real development scenario. A large e-commerce platform's technical team faced a typical problem: their microservices architecture had grown to include 50+ services, each with complex business logic. New engineers needed at least a month to fully understand the system, and architects had to manually draw architecture diagrams for each review.

The team tried various documentation tools, but either the maintenance cost was too high, or they couldn't keep up with code changes. Until they discovered Litho - this AI-driven documentation generation engine built with Rust.

1.2 Litho's Core Mission

Litho's mission can be summarized in one sentence: "Let every line of code tell its own story." It's not just a documentation generation tool, but an intelligent knowledge extraction and expression system.

Imagine pointing Litho to a code repository, and it will:

Like an experienced archaeologist, excavate design patterns and architectural decisions from the code
Like a meticulous librarian, organize and categorize the functionality of each module
Like a professional technical writer, describe the system's operation principles in clear language

Chapter 2: Revealing the Four-Stage Pipeline - Litho's "Magic Factory"

2.1 Overall Architecture: Transformation from Chaos to Order

Litho's core is a carefully designed four-stage processing pipeline. This pipeline is like a modern "document processing factory," where each workshop has specialized equipment and workers.

2.2 Stage 1: Preprocessing - "Geological Exploration" of the Code World

Scenario Recreation: Imagine Litho's first encounter with a large Rust project.

The preprocessing stage is like sending an exploration team into an unknown code continent. The exploration team's tasks are:

Draw Topographic Maps (Structure Extraction)
- Use StructureExtractor to scan the entire project
- Identify all files and directories, calculate importance scores
- Like explorers drawing maps, marking important landmarks
Collect Samples (Code Analysis)
- Perform deep analysis on core code files
- Extract function responsibilities, interface definitions, dependency relationships
- Like geologists collecting rock samples for analysis
Establish Connections (Relationship Analysis)
- Analyze call relationships and dependency chains between modules
- Build complete dependency graphs
- Like establishing trade routes between different tribes

Technical Implementation Details:

// The exploration journey in the preprocessing stage
pub struct PreProcessAgent {
    structure_extractor: StructureExtractor,    // Topographic mapping expert
    code_analyze_agent: CodeAnalyzeAgent,       // Sample analysis expert
    relationships_analyze_agent: RelationshipsAnalyzeAgent, // Relationship network expert
}

impl PreProcessAgent {
    pub async fn explore_code_continent(&self, context: &mut GeneratorContext) -> Result<()> {
        info!("🚀 Starting code continent exploration mission...");

        // Step 1: Draw complete topographic maps
        let terrain_map = self.structure_extractor.map_terrain(context).await?;
        info!("🗺️ Topographic mapping completed, discovered {} important landmarks", terrain_map.landmarks.len());

        // Step 2: Deep sampling of key areas
        let core_samples = self.code_analyze_agent.collect_samples(context).await?;
        info!("🔬 Collected {} core code samples", core_samples.len());

        // Step 3: Establish connection networks between regions
        let connection_network = self.relationships_analyze_agent.build_network(context).await?;
        info!("🌐 Established relationship network with {} connections", connection_network.connections.len());

        // Store exploration results in knowledge base
        context.store_to_memory("EXPLORATION", "terrain_map", terrain_map)?;
        context.store_to_memory("EXPLORATION", "core_samples", core_samples)?;
        context.store_to_memory("EXPLORATION", "connection_network", connection_network)?;

        info!("✅ Code continent exploration mission successfully completed!");
        Ok(())
    }
}

2.3 Stage 2: Research - AI Agents' "Roundtable Conference"

Story Continues: Now, the exploration team returns with rich data, followed by an exciting "expert symposium."

The research stage is like convening experts from various fields to conduct in-depth analysis of exploration results. Each expert has their own specialization:

System Context Expert (Macro Perspective): Analyze the project's positioning in the enterprise environment
Domain Module Detective (Business Perspective): Identify core business domains and functional modules
Architecture Analyst (Technical Perspective): Evaluate technology selection and architecture patterns
Workflow Reconstructor (Process Perspective): Restore business processes and execution paths

ReAct Pattern Work Scenario:

2.4 Stage 3: Orchestration - "Editing and Publishing" of Technical Documentation

Plot Development: The experts' research results are already quite rich, now an experienced editor is needed to integrate these scattered insights into coherent documentation.

The orchestration stage is like establishing a "technical publishing house," where various editors collaborate:

Overview Editor: Write project introduction and technology stack description
Architecture Draftsman: Draw C4 model architecture diagrams
Process Describer: Describe business processes in detail
Technical Insight Analyst: Deeply analyze key technical implementations

Editorial Department Collaboration Process:

// Daily work of the technical publishing house
pub struct DocumentationComposer {
    overview_editor: OverviewEditor,           // Chief editor, responsible for project overview
    architecture_editor: ArchitectureEditor,  // Art editor, responsible for architecture diagrams
    workflow_editor: WorkflowEditor,          // Process editor, describing business processes
    technical_insight_editor: TechnicalInsightEditor, // Technical editor, in-depth analysis
}

impl DocumentationComposer {
    pub async fn publish_technical_manual(&self, context: &mut GeneratorContext) -> Result<DocTree> {
        info!("📚 Starting technical manual editing and publishing work...");

        let mut doc_tree = DocTree::new();

        // Chapter 1: Project Overview (Chief Editor responsible)
        let overview = self.overview_editor.compile_introduction(context).await?;
        doc_tree.add_chapter("1、Project Overview.md", overview);
        info!("📖 Chapter 1: Project Overview writing completed");

        // Chapter 2: Architecture Design (Art Editor responsible)
        let architecture = self.architecture_editor.design_blueprints(context).await?;
        doc_tree.add_chapter("2、Architecture Overview.md", architecture);
        info!("🏗️ Chapter 2: Architecture design diagrams completed");

        // Chapter 3: Business Processes (Process Editor responsible)
        let workflows = self.workflow_editor.document_processes(context).await?;
        doc_tree.add_chapter("3、Workflow.md", workflows);
        info!("🔧 Chapter 3: Business process description completed");

        // Chapter 4: Technical Depth (Technical Editor responsible)
        let insights = self.technical_insight_editor.analyze_techniques(context).await?;
        doc_tree.add_chapter("4、Technical Deep Dive.md", insights);
        info!("🔍 Chapter 4: Technical depth analysis completed");

        info!("🎉 Technical manual publishing work fully completed!");
        Ok(doc_tree)
    }
}

2.5 Stage 4: Output - "Delivery and Use" of Knowledge Results

Story Climax: After relentless efforts through the previous three stages, the final knowledge results are about to be presented to users.

The output stage is like completing the final printing and binding work:

Format Standardization: Ensure all documents comply with Markdown specifications
Diagram Optimization: Fix format issues in Mermaid diagrams
Quality Check: Generate completeness reports and performance statistics
Result Delivery: Save the complete document set to the specified directory

Chapter 3: Litho's Technical Magic - The Art of Making AI Understand Code

3.1 Memory Context: The "Central Library" of Knowledge

One of Litho's most clever designs is its memory context system. Imagine a huge central library where research results from all stages are stored, and each expert can consult previous work at any time.

// Management system of the central library
pub struct GeneratorContext {
    llm_client: LLMClient,           // Intelligent Q&A system
    config: Config,                  // Library rules and regulations
    cache_manager: Arc<RwLock<CacheManager>>, // Borrowing record system
    memory: Arc<RwLock<Memory>>,     // Book collection management system
}

impl GeneratorContext {
    // Store new knowledge
    pub fn contribute_knowledge<T: Serialize>(&self, department: &str, topic: &str, knowledge: T) -> Result<()> {
        let serialized = serde_json::to_string(&knowledge)?;
        self.memory.write().unwrap().store(department, topic, serialized)
    }

    // Consult existing knowledge
    pub fn acquire_knowledge<T: DeserializeOwned>(&self, department: &str, topic: &str) -> Option<T> {
        self.memory.read().unwrap().get(department, topic)
            .and_then(|s| serde_json::from_str(&s).ok())
    }
}

3.2 Asynchronous Processing: Efficient "Parallel Workshops"

Litho's asynchronous architecture based on Tokio allows multiple experts to work simultaneously, greatly improving efficiency. Like in a large research institution, different laboratories can conduct experiments in parallel.

3.3 Plugin Design: Extensible "Expert Team"

Litho's plugin architecture means new experts can be invited to join the team at any time. Whether supporting new programming languages or integrating new large language models, it can be easily achieved through plugins.

Chapter 4: Practical Exercise - How Litho Parses a Real Project

4.1 Case Background: Microservices E-commerce Platform

Let's follow Litho's footsteps to see how it parses a real microservices e-commerce platform.

Project Overview:

Technology Stack: Rust backend + React frontend + PostgreSQL database
Scale: 15 microservices, totaling about 200,000 lines of code
Complexity: Involves core businesses like order processing, payment, inventory, user management

4.2 Litho's Parsing Process

Stage 1 Results:

Identified 187 core code files
Analyzed 324 main function interfaces
Built complete dependency relationship graphs

Stage 2 Insights:

System Context: B2C e-commerce platform, serving small and medium businesses
Domain Modules: Identified 6 core business domains
Architecture Patterns: Microservices architecture, event-driven design

Stage 3 Output:

Generated complete C4 model documentation
Drew system context diagrams and container diagrams
Detailed description of order processing business processes

Stage 4 Delivery:

Produced over 50 pages of technical documentation
Included 15 Mermaid architecture diagrams
Provided complete technology stack description and design decision records

4.3 User Feedback: From Confusion to Clear Technical Journey

The project team's technical lead commented: "Litho is like having a never-tiring technical documentation expert for us. New members now only need days to understand the entire system, instead of weeks before."

Chapter 5: Litho's Technical Philosophy - Perfect Combination of Intelligence and Engineering

5.1 Design Philosophy: Not Replacing Humans, but Enhancing Humans

Litho's design philosophy is clear: it's not meant to replace human architects or developers, but to become their capable assistant. By automating tedious documentation work, it allows human experts to focus on more creative tasks.

5.2 The Profound Meaning of Technology Selection: Why Choose Rust?

The choice of Rust language reflects Litho's ultimate pursuit of performance and reliability:

Memory Safety: Ensures stability during long-term operation
Zero-Cost Abstraction: High-performance code analysis capability
Powerful Concurrency Model: Supports parallel processing of large-scale projects

5.3 Future-Oriented Architecture: Evolvable Technical Foundation

Litho's architecture design considers future expansion needs:

Support for new AI models and algorithms
Adaptability to different documentation standards
Ability to handle larger-scale projects

Conclusion: Opening a New Era of Intelligent Documentation Generation

Litho's emergence marks a new era in documentation generation technology. It's not just an upgrade of tools, but a transformation of development paradigms - from "manual documentation writing" to "intelligent knowledge generation."

As a senior architect said: "Litho makes documentation no longer a burden of development, but a natural product of development."

In the upcoming series articles, we will continue to deeply explore more technical details of Litho:

The mysteries of multi-agent collaborative architecture
The unique advantages of Rust language in high-performance AI applications
The design philosophy of plugin architecture
The engineering wisdom of intelligent caching mechanisms

The technical journey has just begun, let's look forward to more surprises from Litho!

This article is the first in the Litho project technical analysis series. Litho is an open source project, welcome technical enthusiasts to participate in contributions and discussions. Project address: https://github.com/sopaco/deepwiki-rs

Next Preview: In-depth Analysis of Litho's Multi-Agent Collaborative Architecture and ReAct Reasoning Mechanism - Revealing how AI agents collaborate like human experts to analyze code.

DEV Community