DEV Community

Discussion on: I built a database in France because the Cloud Act makes EU data sovereignty impossible

Collapse
 
vuleolabs profile image
vuleolabs

"Very strong and timely post. The legal conflict between the Cloud Act / FISA 702 and GDPR/EU Data Act is real and often underestimated by developers.
I especially appreciate that you didn't just complain about the problem — you actually built something (VelesDB) as an architectural solution: local-first, embedded, no foreign provider in the chain. That’s the kind of pragmatic sovereignty approach Europe needs more of.
A few honest thoughts:

Hosting in “EU region” on AWS/Azure really is mostly a latency choice, not sovereignty. The jurisdiction follows the provider, not the data center.
For sensitive use cases (health, legal, financial, or high-risk AI under the EU AI Act), depending on US-controlled infrastructure does create a real compliance gap that contracts alone can’t fully close.

Quick questions for you:

How does VelesDB handle production-scale workloads today (e.g. concurrent queries, persistence strategy)?
Are you planning on-premise / self-hosted server mode soon for teams that need more than embedded use?

This is the kind of deep-tech project Europe should celebrate. Respect for building it from France.
Will check out the repo. Good luck with VelesDB!"

Collapse
 
wiscale profile image
Julien L WiScale • Edited

Thanks a lot @vuleolabs for your comment !

To answer your two questions:

Persistence and concurrency (open-core, available today):
VelesDB uses a Write-Ahead Log (WAL) for crash recovery, so writes are durable and atomic. The vector index is HNSW (Hierarchical Navigable Small World) for O(log n) approximate nearest neighbor search, with 5 distance metrics (cosine, euclidean, dot product, Hamming, Jaccard). The engine also ships with built-in BM25 full-text indexing and hybrid search combining vectors and keywords via Reciprocal Rank Fusion, so no separate search service needed.

On benchmarks:
sub-millisecond search at 10K vectors (384D), ~1.5ms at 50K, and ~19,000 vectors/sec bulk insert. Full methodology and reproducible scripts are here: github.com/cyberlife-coder/velesdb-benchmarks
=> I also take care of performance, so this can change over time

The concurrency model is single-process with file-level locking (one writer, concurrent reads). This is a deliberate design choice for the embedded/edge use case. For RAG pipelines, agent memory, or regulated applications running inside a single process, the throughput is more than sufficient.

On-premise server mode (enterprise edition, planned):
The open-core edition is and will remain embedded, local-first, source-available under Elastic License 2.0.
The enterprise edition will extend this with features designed for team and production deployments: RBAC (role-based access control), SSO, encryption at rest, audit logging, database snapshots, and GPU acceleration for large-scale vector workloads. (I think to add GPU acceleration on the open-core too)
On-prem server mode with multi-process access is part of that roadmap. Same engine, same sovereignty guarantees, but with the operational features that security teams and DPOs expect.

My approach is: ship a solid engine in the open-core first (WAL, HNSW, hybrid search, graph traversal, agent memory SDK, GPU, AVX512 + AVX2+FMA + ARM NEON acceleration all in ~3MB), then build the enterprise layer on top. Not the other way around.

Thanks for checking out the repo. If you test the open-core version, I would love to hear what breaks first - that is always the most useful feedback. Please also note that if you use velesDB for a public commercial (or not) project, I can share a link on velesDB.com