The Problem
As a developer working with AI, I constantly found myself drowning in arXiv papers. The existing search was purely keyword-based, which meant I'd miss relevant papers that used different terminology. Even worse, after finding papers, I’d lose hours parsing dense academic prose just to understand core ideas.
I needed a system that could:
- Understand the semantic meaning behind my search queries
- Let me "talk" to research papers directly
- Actually understand context, not just keywords
The Solution: A Semantic Search + Chatbot System
I decided to build a comprehensive system using MindsDB's knowledge base capabilities. The final product allows researchers to search semantically through research papers and then have natural language conversations with individual papers or groups of papers.
The tech stack
- MindsDB: Handles semantic indexing and chat interface using LLMs.
- ChromaDB & PGVector: Stores embeddings for fast similarity search.
- FastAPI: Powers a RESTful backend to interact with the front end and MindsDB.
- Javascript & CSS: Powers the frontend.
High-Level Architecture
Paperspace Features
- Search for papers using natural language
- Summarize the paper using AI
- Generate new innovative future research directions based on a paper
- Chat with the paper (using arxiv id or click on one of the semantic search results)
Lessons Learned
What worked well:
- MindsDB: Using AI services as SQL queries saves a lot of development time.
- Semantic Search: I found relevant papers I would never have discovered with keyword search
- Metadata Enrichment: Including authors and categories in search dramatically improved relevance
What Was Harder Than Expected:
- Escaping characters: Academic papers are full of mathematical notation that breaks text processing
- Context Window Management: Long papers required careful chunking strategies
- Rate Limiting: Both arXiv API and OpenAI had strict limits that required careful orchestration. I lost enough openAI credits while playing with long research papers.
Want to try it out or contribute?
Top comments (0)