DEV Community: Keshav Ashiya

CodeSage: When grep Just Isn't Enough Anymore

Keshav Ashiya — Thu, 05 Feb 2026 18:30:48 +0000

CodeSage is a local-first code intelligence CLI. You index your codebase once, and then you can search it using natural language. No cloud. No API keys. Everything runs on your machine.

codesage init
codesage index
codesage chat

But semantic search is just the beginning.

The Three Pillars of Code Understanding

Most code search tools do one thing: match text. CodeSage takes a fundamentally different approach by combining three complementary techniques:

1. Vector Search: Find Code by Meaning

"Authentication middleware" and "JWT token validator" mean similar things, even though they share no words. Vector embeddings capture this semantic relationship, letting you find code by what it does, not just what it's called.

2. Graph Traversal: Understand Relationships

Code doesn't exist in isolation. Functions call other functions. Classes inherit from base classes. Modules import dependencies. CodeSage builds a knowledge graph of these relationships using KuzuDB, so you can ask questions like:

"What functions call this method?"
"What would break if I changed this class?"
"How does data flow from the API endpoint to the database?"

3. Developer Memory: Learn Your Patterns

Here's where it gets interesting. CodeSage doesn't just index what your code does—it learns how you write it.

It tracks your naming conventions, your preferred patterns, your common approaches to recurring problems. This memory persists globally across all your projects, so insights from one codebase help with others.

The result? Suggestions that feel idiomatic to your style, not generic best practices from Stack Overflow.

Interactive Chat with Specialized Modes

Not all work is the same, and neither is how you interact with your code.

CodeSage offers three chat modes, each optimized for a different workflow:

Mode	What It's For
Brainstorm	Exploring ideas, asking open-ended questions
Implement	Focused task execution, generating code and plans
Review	Code review, security analysis, quality checks

Switch anytime with /mode implement or let CodeSage detect your intent from context.

The chat commands go deep:

/search <query>    Semantic code search
/deep <query>      Multi-agent analysis (runs parallel search strategies)
/plan <task>       Implementation plan using your existing patterns
/review [file]     Code review with security awareness
/impact <element>  Blast radius analysis—what breaks if this changes?
/similar <code>    Find similar patterns across your codebase
/patterns          Show what CodeSage has learned about your style

Smart Query Expansion

Intent Detection: It understands if you're explaining, debugging, implementing, or reviewing—and adjusts its response accordingly.

Synonym Expansion: Search for "auth" and it automatically includes authentication, authorization, JWT, OAuth, and related terms.

Context Memory: It remembers what you discussed earlier in the session. "What about the other handler?" just works.

Works with Your AI IDE

CodeSage isn't just a standalone tool. It integrates with Claude Desktop, Cursor, and Windsurf via the Model Context Protocol (MCP).

To start the MCP server manually:

codesage mcp serve --global

Or add this to your MCP client configuration:

{
  "mcpServers": {
    "codesage": {
      "command": "codesage",
      "args": ["mcp", "serve", "--global"]
    }
  }
}

Now your AI assistant has access to intelligent code search across all your indexed projects:

MCP Tool	What It Does
`search_code`	Semantic search with graph context
`get_file_context`	File content with its dependencies and relationships
`get_task_context`	Implementation guidance grounded in your patterns
`review_code`	Code review using learned conventions
`analyze_security`	Security scan with project-specific context

Your AI assistant stops giving generic answers and starts giving grounded answers based on your actual codebase.

The Technical Stack

Everything runs locally. No exceptions.

Ollama handles embeddings and LLM responses
LanceDB provides vector storage with fast similarity search
KuzuDB powers the code relationship graph
SQLite stores metadata and developer memory

You need Ollama running with a few models:

ollama pull qwen2.5-coder:7b
ollama pull qwen3-embedding
ollama serve

That's it. No cloud accounts, no API keys, no data leaving your machine.

Getting Started

Installation takes one command:

pipx install pycodesage

Then index your first project:

cd your-project
codesage init
codesage index

Start searching:

codesage chat

Language Support

Python works out of the box. For JavaScript, TypeScript, Go, and Rust:

pipx inject pycodesage "pycodesage[multi-language]"

What's Next

We're actively developing:

More language parsers — expanding beyond the current set
Cross-project learning — enhanced pattern sharing between codebases
Documentation generation — auto-generate docs that match your style

Try It

pipx install pycodesage

GitHub: github.com/keshavashiya/codesage

Contributions, feedback, and feature requests are all welcome.

CodeSage: Stop searching. Start asking.

Docify v2: Moving Beyond Standard RAG to Multi-Document Agents

Keshav Ashiya — Thu, 01 Jan 2026 17:08:29 +0000

When I first launched Docify, the goal was to build a reliable way to chat with your documents locally. It worked well for a handful of PDFs, but as the library grew, the limitations of a "one-size-fits-all" search became clear. A single giant index works for simple questions, but it struggles when you want to compare two research papers or find deep insights hidden across hundreds of files.

What’s New: The "Agentic" Shift

In the first version, Docify would look through every single chunk of text in your workspace at once. This was slow and often noisy.
In v2, every document you upload is essentially treated as its own "mini-expert" or Document Agent. When you ask a question, the system doesn't just dive into the data—it actually plans its approach.

Smarter Query Planning

Instead of just searching, Docify now "thinks" first. If you ask for a comparison between two documents, it recognizes that intent and orchestrates a search across those specific entities. If you ask a general question, it sweeps the workspace to find the most relevant "experts" to consult.

Parallel Document Retrieval

Once the system knows which documents are relevant, it searches them in parallel. By treating documents as independent agents, we can fetch information from multiple sources simultaneously. This makes the system feel much snappier, even as your workspace grows.

Better Retrieval, Better Answers

Finding the right information is only half the battle; ensuring the AI uses it correctly is the other half.

Hybrid Search (SQL-Native): We’ve combined the best of both worlds—semantic "meaning-based" search and traditional keyword search. This is now handled directly inside the database, making it incredibly fast and much better at finding exact names or technical terms that semantic search sometimes misses.
Strict Grounding: One of the biggest issues with AI is "hallucination." In v2, we’ve implemented stricter rules for how the AI cites its sources. If the information isn’t in your documents, the system will tell you, rather than making up a plausible-sounding answer.

Hardware-Aware Performance

Since Docify is a local-first application, everyone’s computer is different. v2 includes Hardware Detection that tunes the system to your specific machine.

If you have a GPU or Apple Silicon (Metal), it automatically enables more powerful models and larger context windows for deeper reading.
If you’re on a standard CPU, it intelligently scales down to more efficient models and optimized thread counts so your computer doesn't lock up while searching.

Looking Ahead

The move to a multi-document agent architecture isn't just a performance boost—it changes how you interact with your knowledge. Instead of searching a database, you're essentially orchestrating a team of experts that live in your documents.
I'm continuing to refine the system to make it even more intuitive. If you’re interested in building local-first RAG or want to see the code behind the agents, check out the repository. Docify GitHub

Docify: Building a Production RAG System for Knowledge Management

Keshav Ashiya — Sun, 14 Dec 2025 17:33:26 +0000

Knowledge workers drown in information. We collect documents at scale—research papers, PDFs, articles, code—but can't retrieve or synthesize what we've gathered. Most solutions force a choice: keep data local and lose AI, or move to cloud and lose privacy. Docify dissolves this false binary through 11 specialized services orchestrated into a complete RAG pipeline.

Architecture: 11 Services, One Pipeline

Input Layer: Parsing & Chunking -
Resource Ingestion handles heterogeneous formats (PDF, DOCX, XLSX, Markdown, TXT). Deduplication Service computes SHA-256 hashes on raw content—preventing re-processing when the same research paper arrives from three sources. Chunking Service uses tiktoken for accurate token counting (512 tokens, 50-token overlap) while respecting paragraph boundaries and preserving section hierarchies.

Embedding Layer: Async Vector Generation -
Embeddings block APIs. Docify uses Celery + Redis to decouple: uploads return immediately, workers process embeddings asynchronously. Embeddings Service uses all-minilm:22m (384-dim, 22MB)—aggressively lightweight compared to 768-dim models, but sentence-transformers research shows minimal quality loss. Storage in PostgreSQL pgvector with HNSW indexing enables <200ms vector search across 10K documents.

Search Layer: Hybrid Retrieval -
Semantic search alone fails on exact phrases; keyword search alone fails on synonyms. Hybrid Search combines pgvector cosine distance with BM25 ranking via reciprocal rank fusion—a technique that elegantly merges different ranking philosophies. A chunk ranked #2 by vectors and #5 by keywords scores higher than one ranked #1 by vectors and #100 by keywords.

Ranking Layer: Multi-Factor Scoring -
Re-Ranking Service refines results using five factors: base relevance (40%), citation frequency (15%), recency (15%), specificity (15%), source quality (15%). This produces 5-10 final chunks sent to LLM. Notably, it flags conflicting sources—if multiple documents contradict each other, the service signals this upstream.

Context Layer: Token Budget Management -
LLMs have finite context windows. Context Assembly respects token budgets: 2000-token default split 60% primary sources (top-ranked chunks), 30% supporting context, 10% metadata. Truncates intelligently at sentence boundaries (never mid-sentence gibberish). Most questions need 2-3 high-quality sources, not 100.

Prompt Layer: Anti-Hallucination Engineering -
Prompts with strict rules: "ONLY use provided context. ALWAYS cite sources [Source 1]. If unknown, say not available. When sources conflict, present both sides." Source markers in context enable citation verification—making it tractable to validate claims post-generation.

LLM Service: Provider Flexibility -
Provider-agnostic architecture. Ollama local Mistral 7B (4-bit quantized) is default, with OpenAI/Anthropic support. Hardware auto-detection adjusts: GPU available? Accelerate. CPU-only? Extend timeouts. Low VRAM? Switch models. Streaming enabled by default for responsive UX.

Verification Layer: Citation Grounding -
LLMs fabricate sources. Citation Verification runs post-generation: extracts [Source N] references, searches for cited claims in source chunks, flags mismatches. Catches egregious errors like citing sources containing no relevant information. Not foolproof, but reduces hallucination significantly.

Orchestration: Message Generation Pipeline -
Message Generation Service coordinates all services:

User Query → Query Expansion (3-5 variants) → Hybrid Search (20-30 candidates)
→ Re-Ranking (5-10 selected) → Context Assembly (token budgeting)
→ Prompt Engineering → LLM Call → Citation Verification → Response with metrics

Returns structured data: message content, source UUIDs, citations, verification results, pipeline latencies.

Database Design

Chunks table optimized for vector retrieval:

CREATE TABLE chunks (
  id UUID PRIMARY KEY,
  resource_id UUID REFERENCES resources,
  content TEXT,
  embedding Vector(384),
  chunk_metadata JSONB
);
CREATE INDEX idx_chunks_embedding ON chunks USING hnsw (embedding vector_cosine_ops);

HNSW indexing enables approximate nearest-neighbor search in logarithmic time. For semantic search with millions of vectors, this speedup is essential.

Resources table tracks documents with content_hash VARCHAR(64) UNIQUE (SHA-256) and is_duplicate_of foreign key for deduplication.
Conversations & Messages maintain chat history with source tracking, citations as JSONB, model metadata.
Workspaces enable personal/team/hybrid collaboration with data isolation via workspace_id in all queries.

API & Infrastructure

REST Endpoints (full documentation at /docs):

POST /api/resources/upload - Upload documents
GET /api/resources/{id}/embedding-status - Poll async embedding progress
POST /api/conversations/{id}/messages - Triggers RAG pipeline
GET /api/conversations/{id}/export - Export as JSON/Markdown

Docker Stack (7 services, docker-compose up):

PostgreSQL (pgvector pre-loaded)
Redis (cache + Celery broker)
Ollama (local LLM)
FastAPI backend
Celery worker (async embeddings)
Celery Beat (optional scheduled tasks)
Vite frontend

Health checks ensure dependencies before dependent services start. Models persist in volumes (~2GB total).

Frontend

React 18 + TypeScript. React Query manages server state (caching, invalidation). Zustand for UI state. API client wrappers shield UI from streaming/polling complexity. Tailwind CSS for styling.

Performance & Design Patterns

Key Patterns:

Async-First: Embeddings/LLM happen async via Celery; API returns immediately
Content Dedup: SHA-256 hashing prevents re-processing identical documents regardless of source
Hybrid Search: Reciprocal rank fusion merges semantic + BM25 for robustness
Token-Aware Assembly: Respects context windows, prioritizes by relevance, truncates intelligently
Multi-Factor Ranking: Combines recency, specificity, source quality, usage history into unified ranking
Citation Verification: Validates LLM claims against source chunks post-generation
Hardware Adaptation: Auto-detects GPU/CPU/VRAM, adjusts timeouts and models accordingly

Docify

Automate, Curate, Share: Building an Open Source Reading List

Keshav Ashiya — Sun, 13 Jul 2025 18:55:48 +0000

Introduction

In the age of information overload, we’re all voracious readers, collectors of bookmarks, and lifelong learners. But if you’re anything like me, you’ve probably faced this: you save a fantastic article on dev.to, a must-read on daily.dev, and a handful of gems elsewhere—only to lose track of them when you need them most. The result? A scattered digital trail and a sense of missed opportunity.

That’s where the idea for my open source Reading List project was born: a single, reliable place to automate, curate, and share everything I’m reading—across platforms, in real time, and with the world.

The Spark: Solving a Real Problem

The inspiration was simple but powerful:

I wanted a way to find my bookmarks from different platforms, all in one place, whenever I needed them.

But I also wanted more. What if this reading list could be public—a living portfolio of my learning journey, a way to show the world what I'm reading, and maybe even inspire others?

This project isn't just about personal productivity. It's about storytelling through reading making your learning journey visible, discoverable, and shareable.

The "Now Page" Philosophy

This project draws inspiration from Derek Sivers' concept of the "now page"—a simple, public declaration of what you're currently focused on. Instead of a static "about me," a "now" page answers: What are you working on right now?

My reading list applies this to learning: What are you reading right now? It’s a living, breathing snapshot of your current intellectual journey—not what you read last year, but what you’re actively engaging with today. The beauty of this approach is its authenticity. It’s not curated for perfection; it’s real, current, and honest about where your attention is actually going.

From Idea to Architecture

The core challenge was integrating multiple data sources—dev.to, daily.dev, and potentially more—each with their own APIs and formats. I wanted a solution that would fetch and update my reading list automatically, without manual intervention, and make it public for anyone to see.

The answer was to use GitHub Actions as the orchestrator. On a schedule, it fetches data from all sources, normalizes it, and prepares it for publishing. The data is stored as simple JSON, which is then bundled with the site and deployed to GitHub Pages. This means my reading list is always up to date, always available, and always at the same universal path: {username}.github.io/readinglist.

By keeping everything automated and using GitHub’s infrastructure, the project is both reliable and easy for anyone to fork and adapt. No server maintenance, no manual updates—just a living record of what I’m reading, always fresh.

Roadmap

Here’s what’s next for the project:

More Data Sources
- Integrate with additional platforms like Pocket, Medium, and Twitter bookmarks.
Access to Browser Bookmarks
- Allow users to import or sync bookmarks directly from their browser, making the reading list even more comprehensive.
Enhanced Filtering
- Add advanced filtering and search, so users can quickly find articles by topic, source, or reading time.

Conclusion: A Universal Reading Path

What started as a solution to my own bookmark chaos has become something bigger—a platform for making learning journeys visible and shareable. The beauty of this project is its universal accessibility. Just like the "now page" philosophy, your reading list will be available at a universal path: {username}.github.io/readinglist. This consistent URL structure makes it easy for others to discover and follow your learning journey, creating a network of shared knowledge and inspiration.

Check out the live demo:

https://keshavashiya.github.io/readinglist/

Explore the source code:

https://github.com/keshavashiya/readinglist

If this resonates with you, fork it, customize it, and share your own reading journey. Let’s make learning—and sharing—more visible, more connected, and more meaningful.