Altug Tatlisu

Posted on Mar 22

Why I Chose Rust Over Python for My AI Research Agent (3-Week Build)

#rust #ai #blockchain #opensource

The Problem: Reading Research Papers is Soul-Crushing

Picture this: You're deep in blockchain research. You need to understand consensus mechanisms. So you:

Search arXiv for "blockchain consensus"
Get 847 results
Download 50 PDFs
Realize you need to read all of them
Cry

Then it hit me: What if an AI could do this entire workflow autonomously?

Not just "summarize papers" like ChatGPT. I mean:

Search academic databases
Download papers automatically
Parse PDFs and extract knowledge
Build a searchable knowledge base
Generate hypotheses for novel protocols
Run simulations to test ideas
Write research papers with proper citations

All by itself. While I sleep.

That's ConsensusMind.

The Controversial Decision: Pure Rust (No Python)

Everyone builds AI tools in Python. Everyone.

I chose Rust.

"Are you insane?"

That's what I thought too. But here's the thing:

Python is slow:

# Python: PDF parsing
start = time.time()
text = extract_text("paper.pdf")
print(f"Took {time.time() - start}s")
# Output: Took 0.5s

Rust is fast:

// Rust: Same PDF
let start = Instant::now();
let text = parser.extract_text(&pdf_path)?;
println!("Took {:?}", start.elapsed());
// Output: Took 100ms

5x faster. And that's just parsing ONE paper.

When you're processing thousands of papers, this compounds.

"But Python has better AI libraries!"

True. But I don't need them.

LLM? Self-hosted vLLM via REST API (language-agnostic)
Vector search? SQLite with vec0 extension (pure Rust)
PDF parsing? pdf-extract crate (pure Rust)
Everything else? Rust ecosystem has it

The real kicker: Rust gives me a single binary deployment.

# Python deployment
pip install -r requirements.txt
# Result: 50 packages, version conflicts, pray it works

# Rust deployment  
./consensusmind
# Result: One file. Just works.

Week 1: Foundation (Avoiding Analysis Paralysis)

Day 1 was rough. I had two choices:

Spend weeks designing the perfect architecture
Ship something that works, iterate fast

I chose option 2.

The Minimum Viable Foundation

// Day 1: Just make it compile
pub struct ConsensusMind {
    config: Config,
    logger: Logger,
    llm_client: LlmClient,
}

impl ConsensusMind {
    pub fn new() -> Result {
        // Load config, setup logging, that's it
        Ok(Self { /* ... */ })
    }
}

Quality rule from day 1: Zero compiler warnings.

Not "we'll fix it later." Not "technical debt is fine for MVP."

Zero. Warnings.

This decision saved me later.

Week 2: The arXiv Integration (When APIs Fight Back)

The Challenge

Build a client that:

Searches arXiv for papers
Downloads PDFs
Doesn't get rate-limited
Doesn't crash
Actually works

The Reality

// First attempt - DON'T DO THIS
pub async fn search(&self, query: &str) -> Result<Vec> {
    let response = self.client.get(&url).send().await?;
    // ... parse XML ...
    // Works! Ship it!
}

What happened: Got rate-limited after 10 requests. arXiv blocked me.

The Fix

// What actually works
pub async fn search(&self, query: &str) -> Result<Vec> {
    let response = self.client.get(&url).send().await?;
    let papers = self.parse_response(&response)?;

    // The magic: respect rate limits
    sleep(Duration::from_secs(3)).await;

    Ok(papers)
}

Lesson learned: Read the API docs. All of them. Twice.

The Payoff

First successful test:

$ cargo test test_arxiv_search -- --ignored --nocapture

Downloaded: "Byzantine Fault Tolerance in Practice"
Downloaded: "Consensus in the Age of Blockchains"  
Downloaded: "Practical Byzantine Fault Tolerance Revisited"

✓ 3 papers in 12 seconds

That moment when it works: Chef's kiss.

Week 2.5: PDF Parsing Hell

The Problem

Some PDFs are... weird.

Scanned images (no text)
Encrypted
Malformed
In Comic Sans (okay, not really, but felt like it)

My First Naive Attempt

let text = extract_text(&pdf_path)?;
// Assumes it just works

Spoiler: It didn't.

What Actually Worked

pub fn extract_text(&self, pdf_path: &Path) -> Result {
    // Check file exists (obvious but important)
    if !pdf_path.exists() {
        return Err(ParserError::FileNotFound);
    }

    // Try extraction
    let text = extract_text(pdf_path)
        .map_err(|e| ParserError::ExtractionFailed(e.to_string()))?;

    // Sanity check
    if text.trim().is_empty() {
        warn!("Empty PDF or scanned document: {}", pdf_path.display());
        return Err(ParserError::EmptyDocument);
    }

    // More sanity
    let word_count = text.split_whitespace().count();
    if word_count < 100 {
        warn!("Suspiciously short: {} words", word_count);
    }

    Ok(text)
}

Real test result:

Paper: 20 pages, dense academic writing
Extracted: 12,973 words, 83,063 characters
Time: 100ms
Accuracy: Near-perfect

Victory.

Week 3: Vector Search (Making Papers Searchable)

The Vision

"Show me papers about Byzantine fault tolerance that mention network partitions"

Not keyword search. Semantic search.

The Implementation

// Store papers as vectors
pub struct VectorStore {
    db: Connection,
    embeddings: HashMap>,
}

impl VectorStore {
    pub fn search(&self, query: &str, top_k: usize) -> Result<Vec> {
        // Convert query to vector
        let query_vec = self.embed(query)?;

        // Find similar vectors using cosine similarity
        let results = self.db.query(
            "SELECT * FROM papers 
             ORDER BY vec_distance_cosine(embedding, ?) 
             LIMIT ?",
            params![query_vec, top_k],
        )?;

        Ok(results)
    }
}

The "Holy Shit It Works" Moment

// Query: "consensus under network partition"
let results = store.search("consensus under network partition", 5)?;

// Results (paraphrased):
// 1. "Partition-tolerant consensus protocols"
// 2. "Byzantine agreement with network delays"  
// 3. "Consensus in asynchronous systems"
// 4. "Network partition recovery in distributed systems"
// 5. "Fault tolerance under partial connectivity"

Not a single result mentioned "network partition" explicitly in the title.

Semantic search works. Mind = blown.

The Brutal Truth: What Almost Killed The Project

Problem 1: Scope Creep

Week 2, 3am: "What if it also analyzed tweets and Reddit posts and—"

Solution: Slapped myself. Stuck to the plan.

Problem 2: Perfect Code Paralysis

Week 2, day 4: Spent 6 hours debating enum vs struct for error types.

Solution: Picked one. Shipped it. Moved on.

Problem 3: "Should I Use Python?"

Week 2, day 6: Seriously considered rewriting in Python because "that's what everyone uses."

Solution: Looked at my Rust code. Zero warnings. Fast as hell. Single binary.

Stayed with Rust.

The Numbers (Because People Love Numbers)

Development Timeline

Week 1: Foundation + arXiv integration
Week 2: PDF parsing + metadata tracking
Week 3: Vector search + agent core
Total: 3 weeks, ~150 hours

Code Quality

$ cargo clippy
# Result: 0 warnings

$ cargo test  
# Result: 15 tests, 15 passed, 0 failed

$ cargo build --release
# Result: Binary size: 20MB (single file!)

Performance Benchmarks

Task	Rust	Python	Speedup
PDF parsing	100ms	500ms	5x faster
Vector search	10ms	100ms	10x faster
Full pipeline	2s	10s+	5x faster
Memory usage	50MB	500MB	10x less

Business Metrics (Projected)

Development cost: $0 (solo project)
Hosting cost: ~$280/month (RunPod GPU)
Year 1 revenue target: $140,000
Break-even: 2-6 paying users

What I Learned

1. Rust Is Ready for AI

Myth: "You need Python for AI."

Reality: You need good libraries and APIs. Language doesn't matter.

Rust has:

Excellent HTTP clients (reqwest)
PDF processing (pdf-extract)
Vector databases (SQLite + vec0)
Async runtime (tokio)
Everything you need

2. Quality Compounds

Day 1 decision: Zero warnings allowed.

Week 3 result: Zero refactoring needed. Code just worked.

The math:

Fix warnings daily: 10 min/day × 21 days = 210 minutes
Fix warnings at the end: 3-5 days of hell

Front-load the pain. Thank yourself later.

3. Ship Fast, But Ship Quality

Fast ≠ Sloppy

Fast = Efficient

I shipped in 3 weeks by:

Making quick decisions (not perfect ones)
Writing tests immediately (not "later")
Maintaining quality gates (zero warnings)
Iterating rapidly (ship, measure, improve)

4. Self-Hosted LLMs Work

RunPod + vLLM:

$0.39/hour for A40 GPU
DeepSeek-R1 quality responses
Full control over prompts
No OpenAI API bills

Total LLM cost during development: $47

vs. OpenAI API for same workload: $300+

5. Open Source Builds Credibility

Released on GitHub from day 1:

Forces clean code (people will see it)
Builds portfolio
Attracts contributors
Creates trust with users

Side benefit: Looks great on LinkedIn.

The Tech Stack (For the Curious)

Core Technologies:

Language: Rust 2021
Async Runtime: Tokio
HTTP Client: Reqwest + Rustls (HTTPS only)
Database: SQLite
Vector Database: vec0 extension
PDF Processing: pdf-extract
XML Parsing: quick-xml
LLM: Self-hosted vLLM (RunPod)
Deployment: Single binary
CI/CD: GitHub Actions
Hosting: RunPod (GPU) + GitHub Pages (landing)

Notable absences: Python, Docker (for core app), Kubernetes, microservices

Why? They weren't needed. Simplicity wins.

What's Next?

Short Term (This Month)

[x] GitHub release v1.0.0
[ ] Landing page launch
[ ] Waitlist setup
[ ] First 100 signups

Q1 2026

[ ] Authentication (GitHub, Google, Email)
[ ] SaaS infrastructure
[ ] Beta launch
[ ] First paying customers

Q2 2026

[ ] Public launch
[ ] Enterprise tier
[ ] Academic paper publication
[ ] $10k MRR (Monthly Recurring Revenue)

The Dream

Platform for autonomous research. Not just blockchain. Any technical domain.

Imagine:

AI researching quantum computing papers
AI analyzing medical research
AI exploring ML architectures
All autonomous. All documented. All open source.

Try It Yourself

# Clone the repository
git clone https://github.com/ChronoCoders/consensusmind.git
cd consensusmind

# Build (requires Rust)
cargo build --release

# Run
./target/release/consensusmind

# Or just explore the code

Fair warning: You'll see that AI tools don't need Python. This might change your perspective on everything.

The Real Lesson

You don't need:

Python for AI
Months to build production software
A team to ship something real
Venture capital to start
Permission to build the future

You DO need:

A clear vision
Quality standards
Execution speed
Willingness to learn
Guts to ship

I built ConsensusMind in 3 weeks, solo, with zero budget.

What's your excuse?

Links and Resources

Live Demo: chronocoders.github.io/consensusmind
GitHub Repository: github.com/ChronoCoders/consensusmind
Documentation: GitHub README
Contact: hello@dslabs.network

Built something cool with Rust? Drop a comment. Let's talk.

About the Author

Altug Tatlisu builds autonomous research tools at Distributed Systems Labs. He believes the future is written in Rust, not Python. Follow him on GitHub to watch the journey unfold.

P.S. If you're still reading, you're probably going to build something awesome. When you do, tag me. I want to see it.