In January 2025, the US administration fired three members of the Privacy and Civil Liberties Oversight Board (PCLOB), the independent agency responsible for overseeing US mass surveillance. The European Commission had cited this board 31 times in its adequacy decision for EU-US data transfers. Without a quorum, PCLOB can no longer function.
That same month, an executive order launched a review of all Biden-era national security decisions, including Executive Order 14086, the legal foundation of the EU-US Data Privacy Framework.
If you are a European developer storing user data on AWS, Azure, or GCP, these are not abstract policy events. They affect the legal basis of your data processing.
This is the story of why I built VelesDB, and why I think the answer to the sovereignty problem is architectural, not legal.
Who is this for?
If you recognize yourself in one of these profiles, this article is for you:
- Regulated SaaS builders (healthtech, legaltech, fintech): you process sensitive EU data and your DPO keeps asking where the vectors live.
- AI/ML engineers at European companies: you are building RAG pipelines or agent memory and your current stack sends embeddings to three US services.
- CTOs and architects evaluating sovereign infrastructure: you need a data layer that satisfies GDPR, the EU Data Act, and upcoming AI Act requirements without stitching together five vendors.
- Edge and on-premise developers: you deploy AI on devices, in hospitals, on classified networks, or in locations with no reliable internet.
If none of these apply and you just want to understand why EU data sovereignty is harder than "pick an EU region," keep reading anyway.
Three US laws, one problem
TL;DR: The PATRIOT Act, Cloud Act, and FISA 702 give US authorities three independent legal paths to your data on any US provider, anywhere in the world.
Most developers have heard of the Cloud Act. Fewer realize it is just one piece of a layered legal framework that gives US authorities broad access to data held by US companies, anywhere in the world.
The PATRIOT Act (2001) started this. Section 215 allows the FBI to request "any tangible things" - including electronic business records - for terrorism or intelligence investigations. It was reauthorized through the USA Freedom Act with essentially the same powers. If your cloud provider is a US entity, your data falls under its scope.
The Cloud Act (2018) went further. It requires US companies to hand over data to US authorities when served with a warrant, regardless of where that data is physically stored. If your data sits in AWS Frankfurt or Azure Amsterdam, it does not matter. The obligation follows the provider, not the location. Unlike the PATRIOT Act, the Cloud Act explicitly addresses the extraterritorial question: US jurisdiction follows the company, not the server.
FISA Section 702 is the broadest of the three. Reauthorized in April 2024 with an expanded scope, it now covers any entity "with access to equipment that is or can be used to transmit or store wired or electronic communications." That language is wide enough to include virtually any cloud service, data center operator, or SaaS platform under US jurisdiction. FISA 702 runs through April 2026, and renewal pressure is already building.
Together, these three laws create a layered system where US authorities have multiple legal paths to reach data stored by US providers, regardless of geography.
The collision with European law
TL;DR: GDPR says foreign warrants need a treaty. US surveillance laws bypass treaties. No compliant path exists for US providers in the EU.
On the other side of the Atlantic, the legal framework says the opposite.
GDPR Article 48 states that court orders from third countries are only enforceable in the EU if based on an international agreement such as a Mutual Legal Assistance Treaty (MLAT). The Cloud Act and FISA 702 both bypass MLATs entirely.
The European Data Protection Board has been clear: service providers subject to EU law cannot legally base data transfers to the US solely on Cloud Act requests.
This creates a situation where US cloud providers operating in Europe face two contradictory legal obligations. Comply with a US warrant, and you violate GDPR. Refuse the warrant, and you face US legal penalties. There is no compliant path.
For years, the industry relied on legal patches: Safe Harbor (invalidated by Schrems I in 2015), Privacy Shield (invalidated by Schrems II in 2020), and now the Data Privacy Framework (survived its first legal challenge in September 2025, but an appeal is pending before the CJEU since October 2025). Each patch addresses symptoms. None resolves the structural conflict between US surveillance law and EU privacy law.
Europe is building its defenses. The EU Data Act, applicable since September 12, 2025, now requires cloud providers operating in the EU to implement measures that prevent unlawful third-country government access to data. Providers must challenge conflicting foreign orders and publish the jurisdiction of their infrastructure. The EU AI Act, fully applicable from August 2026, adds data governance and documentation requirements for AI systems, including mandatory registration of high-risk systems in an EU database. These regulations make data sovereignty a compliance requirement, not just a preference.
Why "EU region" does not solve the problem
TL;DR: Cloud Act and FISA 702 follow provider control, not server location. AWS eu-west-1 is a latency decision, not a sovereignty decision.
This is the part that surprises most developers. You might think: "I host on AWS eu-west-1, my data is in Ireland, problem solved."
It is not. The Cloud Act and FISA 702 follow provider control, not data location. If the provider is a US company, US authorities can compel access to any data that company controls, wherever it is stored. Hosting in an EU region is a latency decision, not a sovereignty decision.
Standard Contractual Clauses and Data Processing Agreements help with compliance documentation, but they do not change the fundamental power dynamic: a US court order can reach data stored on US-controlled infrastructure, anywhere in the world.
What this means for AI developers
TL;DR: A typical 2026 AI stack sends embeddings, vectors, graphs, and metadata to four US-controlled services. Every one of them is reachable by a FISA warrant.
Now apply this to the AI stack. A typical AI application in 2026 might look like this:
Embeddings -> OpenAI API (US company, subject to FISA 702)
Vector store -> Pinecone (US) or Weaviate Cloud (US)
Graph store -> Neo4j Aura (US)
Metadata -> PostgreSQL on AWS eu-west-1 (US provider, Cloud Act applies)
Your AI agent's memory, your users' embeddings, your knowledge graph of business relationships: all of it lives on infrastructure where a foreign government has multiple legal paths to compel access.
For a European SaaS handling health data, legal documents, or financial records, this is not a theoretical risk. With the EU AI Act requiring data governance documentation and the EU Data Act mandating resistance to unlawful foreign access, it is becoming a compliance gap that no amount of contractual paperwork can fully close.
How the alternatives compare
Before building VelesDB, I evaluated the obvious options. Here is what I found:
| VelesDB | Postgres + pgvector | Qdrant (local) | Neo4j (on-prem) | SQLite + extensions | |
|---|---|---|---|---|---|
| EU sovereignty | Full (no US dependency) | Full (self-hosted) | Full (self-hosted) | Full (self-hosted) | Full (self-hosted) |
| Vector search | Built-in | Via pgvector extension | Native | Via plugin | Via sqlite-vss |
| Graph traversal | Built-in (BFS/DFS + MATCH) | Recursive CTEs (limited) | No | Native (Cypher) | No |
| Agent memory | Built-in SDK | Manual implementation | Manual implementation | Manual implementation | Manual implementation |
| Single query language | VelesQL (SQL + NEAR + MATCH) | SQL only | REST API only | Cypher only | SQL only |
| Install complexity | pip install velesdb |
Server + extension setup | Docker or binary | JVM + server setup | pip + compile extensions |
| Binary size | ~3 MB | ~200 MB+ | ~50 MB | ~500 MB+ | ~5 MB + extensions |
| License | Elastic License 2.0 | PostgreSQL License | Apache 2.0 | GPL / commercial | Public domain |
Every alternative can achieve sovereignty if self-hosted. The difference is what you get in a single engine versus how many pieces you need to assemble. If you only need vectors, Qdrant is excellent. If you only need graphs, Neo4j is the reference. If you need vectors, graphs, structured queries, and agent memory in one place without running four services, that is the gap VelesDB fills.
Why I built VelesDB
I am based in France. I spent years working with cloud databases, and I kept running into the same problem: every time I needed vectors, graphs, and structured data in a single application, the answer was "stitch together three cloud services from two jurisdictions."
I started building VelesDB because of four convictions:
1. I wanted to stop depending on infrastructure I do not control. Not because US providers build bad technology. They often build the best. But depending on a single country's legal framework for your data infrastructure is a form of technical debt that no amount of engineering can repay if the legal ground shifts. And with FISA 702 up for renewal in 2026 and a potential Schrems III decision looming, the ground is shifting.
2. I wanted my data under my hand, on my own infrastructure. Not behind an API, not on someone else's server, not subject to terms of service I cannot negotiate. Whether it is a local file on my laptop or a database on my company's own server, I want to encrypt it, back it up, audit it, and delete it on my terms. No foreign warrant can reach it because there is no foreign provider in the chain.
3. I wanted one database for the full AI data layer. Vectors for semantic search. Graphs for entity relationships. Structured metadata for queries. Agent memory for conversational intelligence. All in a single engine with a single query language, instead of four services with four APIs and four bills.
4. I wanted to show that deep tech can come from Europe too. France has world-class engineers, strong data protection laws, and a growing AI ecosystem. There is no reason the tools we build our AI systems on should all come from the same two zip codes in California.
The architectural solution: local-first by design
TL;DR: Remove the US provider from the chain entirely. No Cloud Act, no FISA 702, no compliance gap.
VelesDB is a database engine written in Rust. Today it runs embedded inside your process, stores everything in a local directory, and never makes a network call. No account, no API key, no data processor.
pip install velesdb
import velesdb
import numpy as np
# Everything lives in this directory, on your machine
db = velesdb.Database("./my_sovereign_data")
# Vector collection for document embeddings
docs = db.get_or_create_collection("documents", dimension=384)
# Graph store for entity relationships
graph = docs.get_graph_store()
# Agent memory for conversation intelligence
memory = db.agent_memory(384)
Three capabilities in four lines of Python. The entire database is in ./my_sovereign_data, on your machine or your own server. You can encrypt it with your OS tools. You can delete it with rm -rf. No API call required. No Cloud Act applies because there is no cloud. No FISA 702 applies because there is no US provider in the chain.
Working example: a GDPR-compliant document store
Imagine you are building an AI assistant that handles EU regulatory documents. With the EU AI Act requiring documentation of data governance, you need to know exactly where your data is and how it flows:
import numpy as np
import velesdb
db = velesdb.Database("./sovereign_demo")
docs = db.get_or_create_collection("documents", dimension=384)
def mock_embedding(text: str) -> list[float]:
"""Deterministic mock embedding for reproducibility."""
rng = np.random.RandomState(hash(text) % 2**31)
vec = rng.randn(384).astype(np.float32)
vec = vec / np.linalg.norm(vec)
return vec.tolist()
# Index regulatory documents (VelesDB uses integer IDs)
documents = [
{"id": 1, "text": "The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay.", "label": "GDPR Art.17"},
{"id": 2, "text": "The data subject shall have the right to receive the personal data concerning him or her in a structured, commonly used and machine-readable format.", "label": "GDPR Art.20"},
{"id": 3, "text": "All customer embeddings must be stored within EU jurisdiction and deleted within 30 days of account closure.", "label": "Internal Policy"},
]
for doc in documents:
embedding = mock_embedding(doc["text"])
docs.upsert(doc["id"], vector=embedding, payload={"text": doc["text"], "label": doc["label"]})
print(f"Indexed {docs.count()} documents")
# Search by semantic similarity
query_vec = mock_embedding("How do I delete user data?")
results = docs.search(vector=query_vec, top_k=2)
for r in results:
print(f" [id={r['id']}] {r['payload']['label']} - score={r['score']:.4f}")
print(f" {r['payload']['text'][:80]}...")
At no point did any data leave your machine. If a user exercises their right to erasure under GDPR Article 17, you have exactly one place to look. If an auditor asks you to document your data flows for EU AI Act compliance, the answer fits in one sentence: "It is in a local directory on our server."
Knowledge graph without a cloud dependency
Your AI agent needs to understand relationships, not just text similarity. VelesDB's GraphStore lets you model entity graphs in the same database:
# Continuing from above
graph = docs.get_graph_store()
# Model organizational relationships (edge IDs are integers)
graph.add_edge(100, source=1, target=2, label="REPORTS_TO",
properties={"since": "2024-01"})
graph.add_edge(101, source=2, target=3, label="MANAGES",
properties={"department": "engineering"})
graph.add_edge(102, source=1, target=3, label="WORKS_IN",
properties={"role": "senior_developer"})
# Traverse the graph from node 1
path = graph.traverse_bfs(source=1, max_depth=2)
print(f"Found {len(path)} connections from node 1")
for step in path:
print(f" {step.source} --[{step.label}]--> {step.target} (depth={step.depth})")
Same database directory. Same sovereignty guarantees. No Neo4j Aura, no connection string pointing to a US-controlled service.
Agent memory that stays under your roof
When your AI agent builds memory about users, that data is the most sensitive in your stack. Preferences, conversation history, learned procedures: this is exactly the kind of intelligence data that FISA 702's expanded scope was designed to reach.
import time
memory = db.agent_memory(384)
# Semantic memory: facts the agent knows
memory.semantic.store(
1001,
"User prefers dark mode and French language",
mock_embedding("User prefers dark mode and French language")
)
# Episodic memory: events that happened
memory.episodic.record(
2001,
"User asked about GDPR compliance for their SaaS product",
timestamp=int(time.time())
)
# Procedural memory: skills the agent has learned
memory.procedural.learn(
3001,
"Handle GDPR deletion request",
steps=["Identify all user data stores", "Execute deletion in each store", "Generate compliance certificate", "Notify user within 72 hours"],
confidence=0.9
)
# Query semantic memory
results = memory.semantic.query(
mock_embedding("What does the user prefer?"),
top_k=3
)
print(f"Semantic results: {len(results)}")
# Recall recent episodes
recent = memory.episodic.recent(limit=5)
print(f"Recent episodes: {len(recent)}")
# List learned procedures
procedures = memory.procedural.list_all()
print(f"Learned procedures: {len(procedures)}")
When a user exercises their right to be forgotten, you delete one directory. Not four services with four different deletion APIs, four different retention policies, and four different legal jurisdictions.
Querying with VelesQL
VelesDB includes VelesQL, a query language that extends SQL with semantic search (NEAR) and graph traversal (MATCH). You can query your collections with familiar SQL syntax:
# Query all indexed documents using VelesQL
results = db.execute_query("SELECT id, payload FROM documents LIMIT 10")
for r in results:
print(f" [id={r['id']}] {r['payload']['label']}")
VelesQL also supports NEAR for vector proximity search and MATCH for graph pattern matching, letting you combine structured queries, semantic search, and graph traversal in a single language. One query language. One database. One jurisdiction.
The bigger picture
The Cloud Act is not going away. FISA 702 is up for renewal. The PATRIOT Act's surveillance provisions remain active. On the EU side, the Data Privacy Framework might survive Schrems III, or it might not. The EU Data Act and AI Act are raising the bar for data governance, not lowering it.
I am not suggesting that every European company should stop using AWS tomorrow. That would be unrealistic and, for many use cases, unnecessary. But for the data layer that gives your AI agents their intelligence - the layer that stores user embeddings, conversation history, and business knowledge - there should be an option that does not require trusting a foreign legal framework that has been invalidated twice in ten years.
VelesDB is that option for developers who want it. A single engine written in Rust, source-available under Elastic License 2.0, that you can read, audit, and deploy on your own infrastructure. Today it runs as an embedded database, local-first by design. Tomorrow, on-premise deployments will extend the same sovereignty guarantees to team and enterprise workloads. The principle stays the same: your data, your servers, your jurisdiction.
It is not designed for multi-node clusters at web scale. It is designed for the use case where sovereignty matters more than horizontal scaling: edge devices, local AI agents, on-premise enterprise systems, regulated applications, and any system where the answer to "where is my data?" should be "right here, on infrastructure I control."
Diversity in infrastructure matters for the same reason biodiversity matters in nature: monocultures are fragile. Europe needs its own tools, not out of protectionism, but because resilience requires alternatives.
Try it in 10 minutes
You do not need to read a whitepaper or schedule a demo. Copy this into a terminal:
pip install velesdb
python3 -c "
import velesdb, numpy as np
db = velesdb.Database('./quickstart')
docs = db.get_or_create_collection('demo', dimension=384)
docs.upsert(1, vector=np.random.randn(384).tolist(), payload={'text': 'hello sovereign world'})
print('Stored:', docs.count(), 'document(s)')
print('Data location: ./quickstart (your machine, your jurisdiction)')
"
Three lines. No Docker, no API key, no account. Your data is in ./quickstart, on your machine.
Want to go further? The quickstart guide on GitHub walks you through vectors, graphs, agent memory, and VelesQL in a single script.
Getting started
- VelesDB on GitHub - source-available under Elastic License 2.0
- Documentation and examples
A star on the repo helps other developers find the project. We are looking for partners and contributors building sovereign AI infrastructure in Europe - check velesdb.com for details.
Where does your AI agent store its memory today? If a FISA warrant reached your cloud provider tomorrow, would you even be notified? Drop a comment below.
Top comments (2)
"Very strong and timely post. The legal conflict between the Cloud Act / FISA 702 and GDPR/EU Data Act is real and often underestimated by developers.
I especially appreciate that you didn't just complain about the problem — you actually built something (VelesDB) as an architectural solution: local-first, embedded, no foreign provider in the chain. That’s the kind of pragmatic sovereignty approach Europe needs more of.
A few honest thoughts:
Hosting in “EU region” on AWS/Azure really is mostly a latency choice, not sovereignty. The jurisdiction follows the provider, not the data center.
For sensitive use cases (health, legal, financial, or high-risk AI under the EU AI Act), depending on US-controlled infrastructure does create a real compliance gap that contracts alone can’t fully close.
Quick questions for you:
How does VelesDB handle production-scale workloads today (e.g. concurrent queries, persistence strategy)?
Are you planning on-premise / self-hosted server mode soon for teams that need more than embedded use?
This is the kind of deep-tech project Europe should celebrate. Respect for building it from France.
Will check out the repo. Good luck with VelesDB!"
Thanks a lot @vuleolabs for your comment !
To answer your two questions:
Persistence and concurrency (open-core, available today):
VelesDB uses a Write-Ahead Log (WAL) for crash recovery, so writes are durable and atomic. The vector index is HNSW (Hierarchical Navigable Small World) for O(log n) approximate nearest neighbor search, with 5 distance metrics (cosine, euclidean, dot product, Hamming, Jaccard). The engine also ships with built-in BM25 full-text indexing and hybrid search combining vectors and keywords via Reciprocal Rank Fusion, so no separate search service needed.
On benchmarks:
sub-millisecond search at 10K vectors (384D), ~1.5ms at 50K, and ~19,000 vectors/sec bulk insert. Full methodology and reproducible scripts are here: github.com/cyberlife-coder/velesdb-benchmarks
=> I also take care of performance, so this can change over time
The concurrency model is single-process with file-level locking (one writer, concurrent reads). This is a deliberate design choice for the embedded/edge use case. For RAG pipelines, agent memory, or regulated applications running inside a single process, the throughput is more than sufficient.
On-premise server mode (enterprise edition, planned):
The open-core edition is and will remain embedded, local-first, source-available under Elastic License 2.0.
The enterprise edition will extend this with features designed for team and production deployments: RBAC (role-based access control), SSO, encryption at rest, audit logging, database snapshots, and GPU acceleration for large-scale vector workloads. (I think to add GPU acceleration on the open-core too)
On-prem server mode with multi-process access is part of that roadmap. Same engine, same sovereignty guarantees, but with the operational features that security teams and DPOs expect.
My approach is: ship a solid engine in the open-core first (WAL, HNSW, hybrid search, graph traversal, agent memory SDK, GPU, AVX512 + AVX2+FMA + ARM NEON acceleration all in ~3MB), then build the enterprise layer on top. Not the other way around.
Thanks for checking out the repo. If you test the open-core version, I would love to hear what breaks first - that is always the most useful feedback. Please also note that if you use velesDB for a public commercial (or not) project, I can share a link on velesDB.com