DEV Community

Cover image for Why I Chose Rust Over Python for My AI Research Agent (3-Week Build)
Altug Tatlisu
Altug Tatlisu

Posted on

Why I Chose Rust Over Python for My AI Research Agent (3-Week Build)

The Problem: Reading Research Papers is Soul-Crushing

Picture this: You're deep in blockchain research. You need to understand consensus mechanisms. So you:

  1. Search arXiv for "blockchain consensus"
  2. Get 847 results
  3. Download 50 PDFs
  4. Realize you need to read all of them
  5. Cry

Then it hit me: What if an AI could do this entire workflow autonomously?

Not just "summarize papers" like ChatGPT. I mean:

  • Search academic databases
  • Download papers automatically
  • Parse PDFs and extract knowledge
  • Build a searchable knowledge base
  • Generate hypotheses for novel protocols
  • Run simulations to test ideas
  • Write research papers with proper citations

All by itself. While I sleep.

That's ConsensusMind.


The Controversial Decision: Pure Rust (No Python)

Everyone builds AI tools in Python. Everyone.

I chose Rust.

"Are you insane?"

That's what I thought too. But here's the thing:

Python is slow:

# Python: PDF parsing
start = time.time()
text = extract_text("paper.pdf")
print(f"Took {time.time() - start}s")
# Output: Took 0.5s
Enter fullscreen mode Exit fullscreen mode

Rust is fast:

// Rust: Same PDF
let start = Instant::now();
let text = parser.extract_text(&pdf_path)?;
println!("Took {:?}", start.elapsed());
// Output: Took 100ms
Enter fullscreen mode Exit fullscreen mode

5x faster. And that's just parsing ONE paper.

When you're processing thousands of papers, this compounds.

"But Python has better AI libraries!"

True. But I don't need them.

  • LLM? Self-hosted vLLM via REST API (language-agnostic)
  • Vector search? SQLite with vec0 extension (pure Rust)
  • PDF parsing? pdf-extract crate (pure Rust)
  • Everything else? Rust ecosystem has it

The real kicker: Rust gives me a single binary deployment.

# Python deployment
pip install -r requirements.txt
# Result: 50 packages, version conflicts, pray it works

# Rust deployment  
./consensusmind
# Result: One file. Just works.
Enter fullscreen mode Exit fullscreen mode

Week 1: Foundation (Avoiding Analysis Paralysis)

Day 1 was rough. I had two choices:

  1. Spend weeks designing the perfect architecture
  2. Ship something that works, iterate fast

I chose option 2.

The Minimum Viable Foundation

// Day 1: Just make it compile
pub struct ConsensusMind {
    config: Config,
    logger: Logger,
    llm_client: LlmClient,
}

impl ConsensusMind {
    pub fn new() -> Result {
        // Load config, setup logging, that's it
        Ok(Self { /* ... */ })
    }
}
Enter fullscreen mode Exit fullscreen mode

Quality rule from day 1: Zero compiler warnings.

Not "we'll fix it later." Not "technical debt is fine for MVP."

Zero. Warnings.

This decision saved me later.


Week 2: The arXiv Integration (When APIs Fight Back)

The Challenge

Build a client that:

  • Searches arXiv for papers
  • Downloads PDFs
  • Doesn't get rate-limited
  • Doesn't crash
  • Actually works

The Reality

// First attempt - DON'T DO THIS
pub async fn search(&self, query: &str) -> Result<Vec> {
    let response = self.client.get(&url).send().await?;
    // ... parse XML ...
    // Works! Ship it!
}
Enter fullscreen mode Exit fullscreen mode

What happened: Got rate-limited after 10 requests. arXiv blocked me.

The Fix

// What actually works
pub async fn search(&self, query: &str) -> Result<Vec> {
    let response = self.client.get(&url).send().await?;
    let papers = self.parse_response(&response)?;

    // The magic: respect rate limits
    sleep(Duration::from_secs(3)).await;

    Ok(papers)
}
Enter fullscreen mode Exit fullscreen mode

Lesson learned: Read the API docs. All of them. Twice.

The Payoff

First successful test:

$ cargo test test_arxiv_search -- --ignored --nocapture

Downloaded: "Byzantine Fault Tolerance in Practice"
Downloaded: "Consensus in the Age of Blockchains"  
Downloaded: "Practical Byzantine Fault Tolerance Revisited"

βœ“ 3 papers in 12 seconds
Enter fullscreen mode Exit fullscreen mode

That moment when it works: Chef's kiss.


Week 2.5: PDF Parsing Hell

The Problem

Some PDFs are... weird.

  • Scanned images (no text)
  • Encrypted
  • Malformed
  • In Comic Sans (okay, not really, but felt like it)

My First Naive Attempt

let text = extract_text(&pdf_path)?;
// Assumes it just works
Enter fullscreen mode Exit fullscreen mode

Spoiler: It didn't.

What Actually Worked

pub fn extract_text(&self, pdf_path: &Path) -> Result {
    // Check file exists (obvious but important)
    if !pdf_path.exists() {
        return Err(ParserError::FileNotFound);
    }

    // Try extraction
    let text = extract_text(pdf_path)
        .map_err(|e| ParserError::ExtractionFailed(e.to_string()))?;

    // Sanity check
    if text.trim().is_empty() {
        warn!("Empty PDF or scanned document: {}", pdf_path.display());
        return Err(ParserError::EmptyDocument);
    }

    // More sanity
    let word_count = text.split_whitespace().count();
    if word_count < 100 {
        warn!("Suspiciously short: {} words", word_count);
    }

    Ok(text)
}
Enter fullscreen mode Exit fullscreen mode

Real test result:

  • Paper: 20 pages, dense academic writing
  • Extracted: 12,973 words, 83,063 characters
  • Time: 100ms
  • Accuracy: Near-perfect

Victory.


Week 3: Vector Search (Making Papers Searchable)

The Vision

"Show me papers about Byzantine fault tolerance that mention network partitions"

Not keyword search. Semantic search.

The Implementation

// Store papers as vectors
pub struct VectorStore {
    db: Connection,
    embeddings: HashMap>,
}

impl VectorStore {
    pub fn search(&self, query: &str, top_k: usize) -> Result<Vec> {
        // Convert query to vector
        let query_vec = self.embed(query)?;

        // Find similar vectors using cosine similarity
        let results = self.db.query(
            "SELECT * FROM papers 
             ORDER BY vec_distance_cosine(embedding, ?) 
             LIMIT ?",
            params![query_vec, top_k],
        )?;

        Ok(results)
    }
}
Enter fullscreen mode Exit fullscreen mode

The "Holy Shit It Works" Moment

// Query: "consensus under network partition"
let results = store.search("consensus under network partition", 5)?;

// Results (paraphrased):
// 1. "Partition-tolerant consensus protocols"
// 2. "Byzantine agreement with network delays"  
// 3. "Consensus in asynchronous systems"
// 4. "Network partition recovery in distributed systems"
// 5. "Fault tolerance under partial connectivity"
Enter fullscreen mode Exit fullscreen mode

Not a single result mentioned "network partition" explicitly in the title.

Semantic search works. Mind = blown.


The Brutal Truth: What Almost Killed The Project

Problem 1: Scope Creep

Week 2, 3am: "What if it also analyzed tweets and Reddit posts andβ€”"

Solution: Slapped myself. Stuck to the plan.

Problem 2: Perfect Code Paralysis

Week 2, day 4: Spent 6 hours debating enum vs struct for error types.

Solution: Picked one. Shipped it. Moved on.

Problem 3: "Should I Use Python?"

Week 2, day 6: Seriously considered rewriting in Python because "that's what everyone uses."

Solution: Looked at my Rust code. Zero warnings. Fast as hell. Single binary.

Stayed with Rust.


The Numbers (Because People Love Numbers)

Development Timeline

  • Week 1: Foundation + arXiv integration
  • Week 2: PDF parsing + metadata tracking
  • Week 3: Vector search + agent core
  • Total: 3 weeks, ~150 hours

Code Quality

$ cargo clippy
# Result: 0 warnings

$ cargo test  
# Result: 15 tests, 15 passed, 0 failed

$ cargo build --release
# Result: Binary size: 20MB (single file!)
Enter fullscreen mode Exit fullscreen mode

Performance Benchmarks

Task Rust Python Speedup
PDF parsing 100ms 500ms 5x faster
Vector search 10ms 100ms 10x faster
Full pipeline 2s 10s+ 5x faster
Memory usage 50MB 500MB 10x less

Business Metrics (Projected)

  • Development cost: $0 (solo project)
  • Hosting cost: ~$280/month (RunPod GPU)
  • Year 1 revenue target: $140,000
  • Break-even: 2-6 paying users

What I Learned

1. Rust Is Ready for AI

Myth: "You need Python for AI."

Reality: You need good libraries and APIs. Language doesn't matter.

Rust has:

  • Excellent HTTP clients (reqwest)
  • PDF processing (pdf-extract)
  • Vector databases (SQLite + vec0)
  • Async runtime (tokio)
  • Everything you need

2. Quality Compounds

Day 1 decision: Zero warnings allowed.

Week 3 result: Zero refactoring needed. Code just worked.

The math:

  • Fix warnings daily: 10 min/day Γ— 21 days = 210 minutes
  • Fix warnings at the end: 3-5 days of hell

Front-load the pain. Thank yourself later.

3. Ship Fast, But Ship Quality

Fast β‰  Sloppy

Fast = Efficient

I shipped in 3 weeks by:

  • Making quick decisions (not perfect ones)
  • Writing tests immediately (not "later")
  • Maintaining quality gates (zero warnings)
  • Iterating rapidly (ship, measure, improve)

4. Self-Hosted LLMs Work

RunPod + vLLM:

  • $0.39/hour for A40 GPU
  • DeepSeek-R1 quality responses
  • Full control over prompts
  • No OpenAI API bills

Total LLM cost during development: $47

vs. OpenAI API for same workload: $300+

5. Open Source Builds Credibility

Released on GitHub from day 1:

  • Forces clean code (people will see it)
  • Builds portfolio
  • Attracts contributors
  • Creates trust with users

Side benefit: Looks great on LinkedIn.


The Tech Stack (For the Curious)

Core Technologies:

  • Language: Rust 2021
  • Async Runtime: Tokio
  • HTTP Client: Reqwest + Rustls (HTTPS only)
  • Database: SQLite
  • Vector Database: vec0 extension
  • PDF Processing: pdf-extract
  • XML Parsing: quick-xml
  • LLM: Self-hosted vLLM (RunPod)
  • Deployment: Single binary
  • CI/CD: GitHub Actions
  • Hosting: RunPod (GPU) + GitHub Pages (landing)

Notable absences: Python, Docker (for core app), Kubernetes, microservices

Why? They weren't needed. Simplicity wins.


What's Next?

Short Term (This Month)

  • [x] GitHub release v1.0.0
  • [ ] Landing page launch
  • [ ] Waitlist setup
  • [ ] First 100 signups

Q1 2026

  • [ ] Authentication (GitHub, Google, Email)
  • [ ] SaaS infrastructure
  • [ ] Beta launch
  • [ ] First paying customers

Q2 2026

  • [ ] Public launch
  • [ ] Enterprise tier
  • [ ] Academic paper publication
  • [ ] $10k MRR (Monthly Recurring Revenue)

The Dream

Platform for autonomous research. Not just blockchain. Any technical domain.

Imagine:

  • AI researching quantum computing papers
  • AI analyzing medical research
  • AI exploring ML architectures
  • All autonomous. All documented. All open source.

Try It Yourself

# Clone the repository
git clone https://github.com/ChronoCoders/consensusmind.git
cd consensusmind

# Build (requires Rust)
cargo build --release

# Run
./target/release/consensusmind

# Or just explore the code
Enter fullscreen mode Exit fullscreen mode

Fair warning: You'll see that AI tools don't need Python. This might change your perspective on everything.


The Real Lesson

You don't need:

  • Python for AI
  • Months to build production software
  • A team to ship something real
  • Venture capital to start
  • Permission to build the future

You DO need:

  • A clear vision
  • Quality standards
  • Execution speed
  • Willingness to learn
  • Guts to ship

I built ConsensusMind in 3 weeks, solo, with zero budget.

What's your excuse?


Links and Resources

Built something cool with Rust? Drop a comment. Let's talk.


About the Author

Altug Tatlisu builds autonomous research tools at Distributed Systems Labs. He believes the future is written in Rust, not Python. Follow him on GitHub to watch the journey unfold.

P.S. If you're still reading, you're probably going to build something awesome. When you do, tag me. I want to see it.

Top comments (0)