As a backend developer, I've worked with Redis, PostgreSQL, MongoDB, and countless other databases. But I always felt like there was something missing – a deeper understanding of how these systems actually work under the hood. So I decided to embark on a journey to build my own distributed key-value database from scratch.
Meet LimeDB – a distributed key-value store I'm currently building with Java 21 and Spring Boot. My goal is to create a truly custom database system that starts with PostgreSQL as a foundation but evolves into something much more ambitious, all wrapped in a horizontally scalable coordinator-shard architecture.
GitHub: namanvashistha/limedb
🤔 Why Build Another Database?
You might be thinking: "Why reinvent the wheel? Redis and PostgreSQL already exist!" And you're absolutely right. But here's the thing – as backend developers, we often treat databases as black boxes. We know how to use them, but not how they work.
Building LimeDB is already teaching me more about distributed systems, consistency, partitioning, and database internals than years of just using existing solutions. It's like the difference between driving a car and understanding how the engine works.
🎯 The Learning Goals
When I started this project, I had several learning objectives:
- Understand Distributed System Patterns - How do you route requests across multiple nodes?
- Grasp Database Internals - What happens when you store and retrieve data?
- Learn About Horizontal Scaling - How do systems like Redis Cluster actually work?
- Master Modern Java - Put Java 21 features and Spring Boot to real use
- Build Something Production-Adjacent - Not just a toy, but something that could theoretically scale
🏗️ Architecture Decisions
The Coordinator-Shard Pattern
Instead of a peer-to-peer system (like Cassandra) or a single-node system (like Redis), I chose a coordinator-shard architecture:
Client → Coordinator → Shard 1, 2, 3...
Why this pattern?
- Simplicity: Clients only need to know about one endpoint
- Routing Logic: Centralized decision-making about where data lives
- Operational Ease: Easy to monitor and debug
- Familiar: Similar to how many real systems work (think MongoDB's router)
Hash-Based Routing (For Now)
// Simple but effective
int shardIndex = Math.abs(key.hashCode()) % numberOfShards;
This is deliberately simple. I know consistent hashing is "better" for rebalancing, but I wanted to start with something I could fully understand and implement correctly. You can see this decision in the ShardRegistryService:
public String getShardByKey(String key) {
int index = Math.abs(key.hashCode()) % shards.size();
return shards.get(index);
}
Perfect? No. Educational? Absolutely.
PostgreSQL as a Starting Point
Each shard currently uses its own PostgreSQL database (limedb_shard_1, limedb_shard_2, etc.). But here's the key - PostgreSQL is just my Phase 1 storage engine, not the final destination.
Why start with PostgreSQL?
- Quick Validation: Get the distributed architecture working first
- ACID Guarantees: Data survives restarts while I focus on routing logic
- Familiar Tooling: Easy to inspect and debug during development
- Stepping Stone: Proven foundation before building custom storage
The plan is to eventually replace PostgreSQL with custom storage engines optimized for key-value workloads. Think LSM trees, custom file formats, and memory-mapped storage - but PostgreSQL lets me focus on the distributed systems challenges first.
💡 What I'm Learning Building This
1. Distributed Systems Are Hard
Even with this simple architecture, I'm already running into fascinating problems:
- What happens when a shard goes down?
- How do you handle network timeouts?
- What about data consistency across shards?
These aren't academic questions anymore – they're real problems I need to solve as I build this system.
2. The Power of Good Abstractions
The Spring Boot framework is letting me focus on the distributed systems logic rather than HTTP parsing and dependency injection. My controllers are staying clean:
@GetMapping("/get/{key}")
public ResponseEntity<String> get(@PathVariable String key) {
String value = routingService.get(key);
return value != null ? ResponseEntity.ok(value) : ResponseEntity.notFound().build();
}
3. Testing Distributed Systems is Different
You can't just unit test individual methods. You need to:
- Start multiple services
- Test network failures
- Verify data consistency
- Check routing logic
def set_values():
for i in range(1_000):
payload = {"key": f"key_{i}", "value": f"value_{i}"}
response = requests.post("http://localhost:8080/api/v1/set", json=payload)
4. Configuration Management is Crucial
With multiple nodes, configuration becomes complex. Each shard needs to know:
- Which database to connect to
- What port to run on
- Its shard ID
./gradlew bootRun --args='--node.type=shard --server.port=7001 --shard.id=1'
🚀 Current Progress
LimeDB currently supports:
- ✅ GET/SET/DELETE operations (Redis-like API)
- ✅ Hash-based routing across 3 shards
- ✅ PostgreSQL persistence per shard
- ✅ REST API with proper error handling
- ✅ Health monitoring endpoints
Performance? It's not going to beat Redis. But it's already handling operations smoothly and teaching me why Redis is so fast.
🎯 What's Next?
The roadmap is ambitious and includes features I'm excited to tackle:
Phase 2: Better Distribution
- Consistent Hashing: Replace modulo with a proper hash ring
- Health Checks: Automatic failover when shards go down
- Replication: Primary-replica setup for high availability
- Metrics: Monitoring and observability
Phase 3: Custom Storage Engine
- LSM Trees: Replace PostgreSQL with custom key-value storage
- Memory-Mapped Files: Direct file system control
- Custom Serialization: Optimized data formats
- WAL Implementation: Write-ahead logging from scratch
Phase 4: Advanced Features
- Custom Binary Protocol: Move beyond HTTP/REST
- Compression: Custom compression algorithms
- Cache Layers: Multi-level caching strategies
- Transaction Support: ACID across multiple shards
Each phase represents deeper database internals knowledge - PostgreSQL is just the beginning!
💭 Why You Should Build One Too
Building your own database isn't about competing with PostgreSQL or Redis. It's about:
- Deep Learning: Understanding systems from the ground up
- Interview Prep: Nothing impresses like saying "I built a distributed database"
- Problem-Solving Skills: Real distributed systems problems
- Technology Mastery: Push your programming language skills
- Portfolio Project: Something unique that stands out
🛠️ Getting Started
If this inspired you to build your own database, here's my advice:
- Start Simple: Don't try to build Redis on day one
- Pick Your Language: Use something you're comfortable with
- Choose One Feature: GET/SET is enough to start
- Add Gradually: Persistence, then distribution, then optimizations
- Document Everything: Future you will thank you
🔗 Follow the Journey
Want to see the code as I build it? It's all open source:
- GitHub: namanvashistha/limedb
- Tech Stack: Java 21, Spring Boot, PostgreSQL
- Current Status: Basic coordinator-shard architecture working
The README has setup instructions, and I'm trying to make the code as readable as possible for learning purposes. Feel free to star the repo and follow along as I tackle more distributed systems challenges!
🎉 Final Thoughts
Building LimeDB is turning out to be one of the most educational projects I've undertaken as a backend developer. It's not going to be the fastest database, or the most feature-complete, but it's mine. I understand every line of code, every architectural decision, and every trade-off I'm making along the way.
In a world of microservices and cloud abstractions, there's something deeply satisfying about building a system from first principles. I'm already looking at Redis, PostgreSQL, and MongoDB differently after just starting this journey.
So grab your favorite programming language, pick a simple data structure, and start building. The distributed systems knowledge you'll gain is worth its weight in gold.
What do you think? Have you ever built your own database or distributed system? What did you learn? Drop a comment below!
Top comments (5)
While I applaud you for this project (I was thinking about doing something similar in Go just for fun), you released this project as something people should use only saying this:
So you built a database engine to learn about databases, and now we should use it because of that? This makes absolutely no sense. What problem does your LimeDB solve that other key-value storages don't? I doubt it'll teach me about how database engines work unless I look at the code, and I can do that with any other database engine. They're actually open source.
I would much rather use an established, well maintained and backed, database than your pet project. No offense here, I think what you did is a brilliant project as a learning experience, but you should not have created a logo and started promoting it. You got way too serious with this for absolutely the wrong reasons. Unless your tool solves a particular problem that might be helpful for others, don't promote it. You'll let people down if you end up giving up on the project and they're already bought into it.
Maybe I'm just ringing alarm bells for no reason as people should be able to make this determination themselves, but I'm just saying as I see it.
Hey, I get what you’re saying - but I think there’s a bit of a misunderstanding here.
LimeDB isn’t being “promoted” as a production-ready system. It’s an open-source learning project, and I’ve been very clear about that. The fact that it has a logo, proper documentation, and structure doesn’t suddenly make it a product I’m trying to sell - it just means I want it to look good and serious. Some people learn by reading papers and code; I learn best by building and implementing end-to-end.
And honestly, having something open source with a logo doesn’t mean people are dumb enough to adopt it blindly. Developers are smart - they know how to evaluate what’s experimental and what’s production-ready. Sharing something well-organized just helps others explore and maybe even learn from it, not trick them into using it.
Open-source isn’t only about releasing polished, production-grade tools. It’s also about sharing your journey, and for me, this is my way of learning by doing- not just talking about distributed systems, but actually implementing them.
I appreciate your feedback, truly - but I think it’s unfair to equate enthusiasm and effort with misplaced seriousness. LimeDB was built to learn, share, and inspire and if it sparks curiosity in even a few developers, I’d say it’s already done its job.
That said, you’re absolutely right that established databases are the way to go for any real use case. LimeDB’s goal is educational, not competitive more like a playground for curiosity who want to understand the internals by running and experimenting with code that’s simple and open.
Thanks again for taking the time to share your thoughts - I genuinely appreciate you engaging with it. 🙏
This is such a great initiative! I love that you’re building LimeDB not just as a project, but as a way to truly understand distributed systems from the inside out. The coordinator–shard pattern choice is clean and practical — especially starting with PostgreSQL before going custom. Respect 👏
Thanks a lot! Appreciate you taking the time to check it out 🙌
Some comments may only be visible to logged-in visitors. Sign in to view all comments.