Julien L for WiScale

Posted on Apr 4 • Edited on Apr 5

I built a database in France because the Cloud Act makes EU data sovereignty impossible

#ai #python #privacy #database

Local-first architecture for AI privacy

In January 2025, the US administration fired three members of the Privacy and Civil Liberties Oversight Board (PCLOB), the independent agency responsible for overseeing US mass surveillance. The European Commission had cited this board 31 times in its adequacy decision for EU-US data transfers. Without a quorum, PCLOB can no longer function.

That same month, an executive order launched a review of all Biden-era national security decisions, including Executive Order 14086, the legal foundation of the EU-US Data Privacy Framework.

If you are a European developer storing user data on AWS, Azure, or GCP, these are not abstract policy events. They affect the legal basis of your data processing.

This is the story of why I built VelesDB®, and why I think the answer to the sovereignty problem is architectural, not legal.

Who is this for?

If you recognize yourself in one of these profiles, this article is for you:

Regulated SaaS builders (healthtech, legaltech, fintech): you process sensitive EU data and your DPO keeps asking where the vectors live.
AI/ML engineers at European companies: you are building RAG pipelines or agent memory and your current stack sends embeddings to three US services.
CTOs and architects evaluating sovereign infrastructure: you need a data layer that satisfies GDPR, the EU Data Act, and upcoming AI Act requirements without stitching together five vendors.
Edge and on-premise developers: you deploy AI on devices, in hospitals, on classified networks, or in locations with no reliable internet.

If none of these apply and you just want to understand why EU data sovereignty is harder than "pick an EU region," keep reading anyway.

Three US laws, one problem

TL;DR: The PATRIOT Act, Cloud Act, and FISA 702 give US authorities three independent legal paths to your data on any US provider, anywhere in the world.

Most developers have heard of the Cloud Act. Fewer realize it is just one piece of a layered legal framework that gives US authorities broad access to data held by US companies, anywhere in the world.

The PATRIOT Act (2001) started this. Section 215 allows the FBI to request "any tangible things" - including electronic business records - for terrorism or intelligence investigations. It was reauthorized through the USA Freedom Act with essentially the same powers. If your cloud provider is a US entity, your data falls under its scope.

The Cloud Act (2018) went further. It requires US companies to hand over data to US authorities when served with a warrant, regardless of where that data is physically stored. If your data sits in AWS Frankfurt or Azure Amsterdam, it does not matter. The obligation follows the provider, not the location. Unlike the PATRIOT Act, the Cloud Act explicitly addresses the extraterritorial question: US jurisdiction follows the company, not the server.

FISA Section 702 is the broadest of the three. Reauthorized in April 2024 with an expanded scope, it now covers any entity "with access to equipment that is or can be used to transmit or store wired or electronic communications." That language is wide enough to include virtually any cloud service, data center operator, or SaaS platform under US jurisdiction. FISA 702 runs through April 2026, and renewal pressure is already building.

Together, these three laws create a layered system where US authorities have multiple legal paths to reach data stored by US providers, regardless of geography.

The collision with European law

TL;DR: GDPR says foreign warrants need a treaty. US surveillance laws bypass treaties. No compliant path exists for US providers in the EU.

On the other side of the Atlantic, the legal framework says the opposite.

GDPR Article 48 states that court orders from third countries are only enforceable in the EU if based on an international agreement such as a Mutual Legal Assistance Treaty (MLAT). The Cloud Act and FISA 702 both bypass MLATs entirely.

The European Data Protection Board has been clear: service providers subject to EU law cannot legally base data transfers to the US solely on Cloud Act requests.

This creates a situation where US cloud providers operating in Europe face two contradictory legal obligations. Comply with a US warrant, and you violate GDPR. Refuse the warrant, and you face US legal penalties. There is no compliant path.

For years, the industry relied on legal patches: Safe Harbor (invalidated by Schrems I in 2015), Privacy Shield (invalidated by Schrems II in 2020), and now the Data Privacy Framework (survived its first legal challenge in September 2025, but an appeal is pending before the CJEU since October 2025). Each patch addresses symptoms. None resolves the structural conflict between US surveillance law and EU privacy law.

Europe is building its defenses. The EU Data Act, applicable since September 12, 2025, now requires cloud providers operating in the EU to implement measures that prevent unlawful third-country government access to data. Providers must challenge conflicting foreign orders and publish the jurisdiction of their infrastructure. The EU AI Act, fully applicable from August 2026, adds data governance and documentation requirements for AI systems, including mandatory registration of high-risk systems in an EU database. These regulations make data sovereignty a compliance requirement, not just a preference.

Why "EU region" does not solve the problem

TL;DR: Cloud Act and FISA 702 follow provider control, not server location. AWS eu-west-1 is a latency decision, not a sovereignty decision.

This is the part that surprises most developers. You might think: "I host on AWS eu-west-1, my data is in Ireland, problem solved."

It is not. The Cloud Act and FISA 702 follow provider control, not data location. If the provider is a US company, US authorities can compel access to any data that company controls, wherever it is stored. Hosting in an EU region is a latency decision, not a sovereignty decision.

Standard Contractual Clauses and Data Processing Agreements help with compliance documentation, but they do not change the fundamental power dynamic: a US court order can reach data stored on US-controlled infrastructure, anywhere in the world.

What this means for AI developers

TL;DR: A typical 2026 AI stack sends embeddings, vectors, graphs, and metadata to four US-controlled services. Every one of them is reachable by a FISA warrant.

Now apply this to the AI stack. A typical AI application in 2026 might look like this:

Embeddings    -> OpenAI API (US company, subject to FISA 702)
Vector store  -> Pinecone (US) or Weaviate Cloud (US)
Graph store   -> Neo4j Aura (US)
Metadata      -> PostgreSQL on AWS eu-west-1 (US provider, Cloud Act applies)

Your AI agent's memory, your users' embeddings, your knowledge graph of business relationships: all of it lives on infrastructure where a foreign government has multiple legal paths to compel access.

For a European SaaS handling health data, legal documents, or financial records, this is not a theoretical risk. With the EU AI Act requiring data governance documentation and the EU Data Act mandating resistance to unlawful foreign access, it is becoming a compliance gap that no amount of contractual paperwork can fully close.

How the alternatives compare

Before building VelesDB, I evaluated the obvious options. Here is what I found:

	VelesDB	Postgres + pgvector	Qdrant (local)	Neo4j (on-prem)	SQLite + extensions
EU sovereignty	Full (no US dependency)	Full (self-hosted)	Full (self-hosted)	Full (self-hosted)	Full (self-hosted)
Vector search	Built-in	Via pgvector extension	Native	Via plugin	Via sqlite-vss
Graph traversal	Built-in (BFS/DFS + MATCH)	Recursive CTEs (limited)	No	Native (Cypher)	No
Agent memory	Built-in SDK	Manual implementation	Manual implementation	Manual implementation	Manual implementation
Single query language	VelesQL (SQL + NEAR + MATCH)	SQL only	REST API only	Cypher only	SQL only
Install complexity	`pip install velesdb`	Server + extension setup	Docker or binary	JVM + server setup	pip + compile extensions
Binary size	~3 MB	~200 MB+	~50 MB	~500 MB+	~5 MB + extensions
License	Elastic License 2.0	PostgreSQL License	Apache 2.0	GPL / commercial	Public domain

Every alternative can achieve sovereignty if self-hosted. The difference is what you get in a single engine versus how many pieces you need to assemble. If you only need vectors, Qdrant is excellent. If you only need graphs, Neo4j is the reference. If you need vectors, graphs, structured queries, and agent memory in one place without running four services, that is the gap VelesDB fills.

Why I built VelesDB

I am based in France. I spent years working with cloud databases, and I kept running into the same problem: every time I needed vectors, graphs, and structured data in a single application, the answer was "stitch together three cloud services from two jurisdictions."

I started building VelesDB because of four convictions:

1. I wanted to stop depending on infrastructure I do not control. Not because US providers build bad technology. They often build the best. But depending on a single country's legal framework for your data infrastructure is a form of technical debt that no amount of engineering can repay if the legal ground shifts. And with FISA 702 up for renewal in 2026 and a potential Schrems III decision looming, the ground is shifting.

2. I wanted my data under my hand, on my own infrastructure. Not behind an API, not on someone else's server, not subject to terms of service I cannot negotiate. Whether it is a local file on my laptop or a database on my company's own server, I want to encrypt it, back it up, audit it, and delete it on my terms. No foreign warrant can reach it because there is no foreign provider in the chain.

3. I wanted one database for the full AI data layer. Vectors for semantic search. Graphs for entity relationships. Structured metadata for queries. Agent memory for conversational intelligence. All in a single engine with a single query language, instead of four services with four APIs and four bills.

4. I wanted to show that deep tech can come from Europe too. France has world-class engineers, strong data protection laws, and a growing AI ecosystem. There is no reason the tools we build our AI systems on should all come from the same two zip codes in California.

The architectural solution: local-first by design

TL;DR: Remove the US provider from the chain entirely. No Cloud Act, no FISA 702, no compliance gap.

VelesDB is a database engine written in Rust. Today it runs embedded inside your process, stores everything in a local directory, and never makes a network call. No account, no API key, no data processor.

pip install velesdb

import velesdb
import numpy as np

# Everything lives in this directory, on your machine
db = velesdb.Database("./my_sovereign_data")

# Vector collection for document embeddings
docs = db.get_or_create_collection("documents", dimension=384)

# Graph store for entity relationships
graph = docs.get_graph_store()

# Agent memory for conversation intelligence
memory = db.agent_memory(384)

Three capabilities in four lines of Python. The entire database is in ./my_sovereign_data, on your machine or your own server. You can encrypt it with your OS tools. You can delete it with rm -rf. No API call required. No Cloud Act applies because there is no cloud. No FISA 702 applies because there is no US provider in the chain.

Working example: a GDPR-compliant document store

Imagine you are building an AI assistant that handles EU regulatory documents. With the EU AI Act requiring documentation of data governance, you need to know exactly where your data is and how it flows:

import numpy as np
import velesdb

db = velesdb.Database("./sovereign_demo")
docs = db.get_or_create_collection("documents", dimension=384)

def mock_embedding(text: str) -> list[float]:
    """Deterministic mock embedding for reproducibility."""
    rng = np.random.RandomState(hash(text) % 2**31)
    vec = rng.randn(384).astype(np.float32)
    vec = vec / np.linalg.norm(vec)
    return vec.tolist()

# Index regulatory documents (VelesDB uses integer IDs)
documents = [
    {"id": 1, "text": "The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay.", "label": "GDPR Art.17"},
    {"id": 2, "text": "The data subject shall have the right to receive the personal data concerning him or her in a structured, commonly used and machine-readable format.", "label": "GDPR Art.20"},
    {"id": 3, "text": "All customer embeddings must be stored within EU jurisdiction and deleted within 30 days of account closure.", "label": "Internal Policy"},
]

for doc in documents:
    embedding = mock_embedding(doc["text"])
    docs.upsert(doc["id"], vector=embedding, payload={"text": doc["text"], "label": doc["label"]})

print(f"Indexed {docs.count()} documents")

# Search by semantic similarity
query_vec = mock_embedding("How do I delete user data?")
results = docs.search(vector=query_vec, top_k=2)

for r in results:
    print(f"  [id={r['id']}] {r['payload']['label']} - score={r['score']:.4f}")
    print(f"    {r['payload']['text'][:80]}...")

At no point did any data leave your machine. If a user exercises their right to erasure under GDPR Article 17, you have exactly one place to look. If an auditor asks you to document your data flows for EU AI Act compliance, the answer fits in one sentence: "It is in a local directory on our server."

Knowledge graph without a cloud dependency

Your AI agent needs to understand relationships, not just text similarity. VelesDB's GraphStore lets you model entity graphs in the same database:

# Continuing from above
graph = docs.get_graph_store()

# Model organizational relationships (edge IDs are integers)
graph.add_edge(100, source=1, target=2, label="REPORTS_TO",
               properties={"since": "2024-01"})
graph.add_edge(101, source=2, target=3, label="MANAGES",
               properties={"department": "engineering"})
graph.add_edge(102, source=1, target=3, label="WORKS_IN",
               properties={"role": "senior_developer"})

# Traverse the graph from node 1
path = graph.traverse_bfs(source=1, max_depth=2)
print(f"Found {len(path)} connections from node 1")
for step in path:
    print(f"  {step.source} --[{step.label}]--> {step.target} (depth={step.depth})")

Same database directory. Same sovereignty guarantees. No Neo4j Aura, no connection string pointing to a US-controlled service.

Agent memory that stays under your roof

When your AI agent builds memory about users, that data is the most sensitive in your stack. Preferences, conversation history, learned procedures: this is exactly the kind of intelligence data that FISA 702's expanded scope was designed to reach.

import time

memory = db.agent_memory(384)

# Semantic memory: facts the agent knows
memory.semantic.store(
    1001,
    "User prefers dark mode and French language",
    mock_embedding("User prefers dark mode and French language")
)

# Episodic memory: events that happened
memory.episodic.record(
    2001,
    "User asked about GDPR compliance for their SaaS product",
    timestamp=int(time.time())
)

# Procedural memory: skills the agent has learned
memory.procedural.learn(
    3001,
    "Handle GDPR deletion request",
    steps=["Identify all user data stores", "Execute deletion in each store", "Generate compliance certificate", "Notify user within 72 hours"],
    confidence=0.9
)

# Query semantic memory
results = memory.semantic.query(
    mock_embedding("What does the user prefer?"),
    top_k=3
)
print(f"Semantic results: {len(results)}")

# Recall recent episodes
recent = memory.episodic.recent(limit=5)
print(f"Recent episodes: {len(recent)}")

# List learned procedures
procedures = memory.procedural.list_all()
print(f"Learned procedures: {len(procedures)}")

When a user exercises their right to be forgotten, you delete one directory. Not four services with four different deletion APIs, four different retention policies, and four different legal jurisdictions.

Querying with VelesQL

VelesDB includes VelesQL, a query language that extends SQL with semantic search (NEAR) and graph traversal (MATCH). You can query your collections with familiar SQL syntax:

# Query all indexed documents using VelesQL
results = db.execute_query("SELECT id, payload FROM documents LIMIT 10")

for r in results:
    print(f"  [id={r['id']}] {r['payload']['label']}")

VelesQL also supports NEAR for vector proximity search and MATCH for graph pattern matching, letting you combine structured queries, semantic search, and graph traversal in a single language. One query language. One database. One jurisdiction.

The bigger picture

The Cloud Act is not going away. FISA 702 is up for renewal. The PATRIOT Act's surveillance provisions remain active. On the EU side, the Data Privacy Framework might survive Schrems III, or it might not. The EU Data Act and AI Act are raising the bar for data governance, not lowering it.

I am not suggesting that every European company should stop using AWS tomorrow. That would be unrealistic and, for many use cases, unnecessary. But for the data layer that gives your AI agents their intelligence - the layer that stores user embeddings, conversation history, and business knowledge - there should be an option that does not require trusting a foreign legal framework that has been invalidated twice in ten years.

VelesDB is that option for developers who want it. A single engine written in Rust, source-available under Elastic License 2.0, that you can read, audit, and deploy on your own infrastructure. Today it runs as an embedded database, local-first by design. Tomorrow, on-premise deployments will extend the same sovereignty guarantees to team and enterprise workloads. The principle stays the same: your data, your servers, your jurisdiction.

It is not designed for multi-node clusters at web scale. It is designed for the use case where sovereignty matters more than horizontal scaling: edge devices, local AI agents, on-premise enterprise systems, regulated applications, and any system where the answer to "where is my data?" should be "right here, on infrastructure I control."

Diversity in infrastructure matters for the same reason biodiversity matters in nature: monocultures are fragile. Europe needs its own tools, not out of protectionism, but because resilience requires alternatives.

Try it in 10 minutes

You do not need to read a whitepaper or schedule a demo. Copy this into a terminal:

pip install velesdb
python3 -c "
import velesdb, numpy as np
db = velesdb.Database('./quickstart')
docs = db.get_or_create_collection('demo', dimension=384)
docs.upsert(1, vector=np.random.randn(384).tolist(), payload={'text': 'hello sovereign world'})
print('Stored:', docs.count(), 'document(s)')
print('Data location: ./quickstart (your machine, your jurisdiction)')
"

Three lines. No Docker, no API key, no account. Your data is in ./quickstart, on your machine.

Want to go further? The quickstart guide on GitHub walks you through vectors, graphs, agent memory, and VelesQL in a single script.

Getting started

VelesDB® on GitHub - source-available under modified Elastic License 2.0 (VelesDB core 1.0 License)
Documentation and examples

A star on the repo helps other developers find the project. We are looking for partners and contributors building sovereign AI infrastructure in Europe - check velesdb.com for details.

Where does your AI agent store its memory today? If a FISA warrant reached your cloud provider tomorrow, would you even be notified? Drop a comment below.

Top comments (8)

Jonathan Murray • Apr 6

The Cloud Act problem is real and I think a lot of EU companies are still underestimating their exposure. Even if your data is physically in Frankfurt, if the provider is a US company the legal jurisdiction question is genuinely unresolved. Running your own infra adds operational overhead but the legal picture is at least clearer. What did you end up using for the DB itself - Scaleway, OVH, something else?

Julien L WiScale • Apr 7

You nailed it : "EU region" is a physical guarantee, not a legal one.
To answer directly : VelesDB is the database. A +/- 6MB binary (vector + graph + columnar + Agent Memory SDK) that runs in-process wherever your code runs : server, WASM, mobile, embedded, Tauri, LangChain integration. Scaleway, OVH, bare metal, your own server room. Your call.
License: VelesDB Core License 1.0 (modified ELv2), source-available, not open-source. Free for production and commercial use. Main restriction: no reselling it as a managed service to third parties. Enterprise edition is in development for on-premise at scale, and a managed cloud is planned, hosted by us, in Europe.

Admin Chainmail • Apr 5

This is a great point about data sovereignty. Similar reasoning drove me to build a desktop email client -- your emails live on your machine, not in a browser tab where every extension, tracker, and ad network can observe your behavior.

The local-first approach is underappreciated. When your data stays on your device, you do not need to trust a third party's compliance promises. You just own your data.

Interesting that you chose to build infrastructure rather than rely on existing cloud providers. What has the performance been like compared to the big cloud providers?

Julien L WiScale • Apr 6 • Edited

Thanks! Desktop email client makes total sense for the same reasons : the browser has become a surveillance surface in itself.
On performance: for the use cases VelesDB targets (RAG pipelines, agent memory, knowledge graphs), local-first actually has an advantage : zero network latency for vector search and graph traversal. An HNSW lookup on local NVMe is sub-millisecond, whereas a round-trip to a managed vector DB adds 20-50ms minimum. You can check also the repo velesdb-benchmarks, or our benchmark in the velesdb repo.

Where cloud providers win is horizontal scaling : if you need to search across billions of vectors distributed on a cluster, that's not what VelesDB does. It's designed for workloads that fit on a single machine (and honestly, most RAG/agent use cases do). The tradeoff is intentional: you lose elastic scaling, you gain full data ownership and predictable latency. That said, enterprise and cloud editions are on the roadmap for teams that do need to scale beyond a single node.

What's your email client called? I'd be curious to check it out.

Adity Kushawaha • Apr 6

One of the most underrated ideas in infra right now
Where data lives is not the same as who can legally reach it.
This gets even more serious for AI systems, because we’re now building products that store:

memory
context
embeddings
behavioral traces
user knowledge

That’s exactly the kind of architecture problem I’ve been thinking about while working on MemoryOS and CodeMap.

Really strong post — more builders need to understand this.

vuleolabs • Apr 4

"Very strong and timely post. The legal conflict between the Cloud Act / FISA 702 and GDPR/EU Data Act is real and often underestimated by developers.
I especially appreciate that you didn't just complain about the problem — you actually built something (VelesDB) as an architectural solution: local-first, embedded, no foreign provider in the chain. That’s the kind of pragmatic sovereignty approach Europe needs more of.
A few honest thoughts:

Hosting in “EU region” on AWS/Azure really is mostly a latency choice, not sovereignty. The jurisdiction follows the provider, not the data center.
For sensitive use cases (health, legal, financial, or high-risk AI under the EU AI Act), depending on US-controlled infrastructure does create a real compliance gap that contracts alone can’t fully close.

Quick questions for you:

How does VelesDB handle production-scale workloads today (e.g. concurrent queries, persistence strategy)?
Are you planning on-premise / self-hosted server mode soon for teams that need more than embedded use?

This is the kind of deep-tech project Europe should celebrate. Respect for building it from France.
Will check out the repo. Good luck with VelesDB!"

Julien L WiScale • Apr 4 • Edited

Thanks a lot @vuleolabs for your comment !

To answer your two questions:

Persistence and concurrency (open-core, available today):
VelesDB uses a Write-Ahead Log (WAL) for crash recovery, so writes are durable and atomic. The vector index is HNSW (Hierarchical Navigable Small World) for O(log n) approximate nearest neighbor search, with 5 distance metrics (cosine, euclidean, dot product, Hamming, Jaccard). The engine also ships with built-in BM25 full-text indexing and hybrid search combining vectors and keywords via Reciprocal Rank Fusion, so no separate search service needed.

On benchmarks:
sub-millisecond search at 10K vectors (384D), ~1.5ms at 50K, and ~19,000 vectors/sec bulk insert. Full methodology and reproducible scripts are here: github.com/cyberlife-coder/velesdb-benchmarks
=> I also take care of performance, so this can change over time

The concurrency model is single-process with file-level locking (one writer, concurrent reads). This is a deliberate design choice for the embedded/edge use case. For RAG pipelines, agent memory, or regulated applications running inside a single process, the throughput is more than sufficient.

On-premise server mode (enterprise edition, planned):
The open-core edition is and will remain embedded, local-first, source-available under Elastic License 2.0.
The enterprise edition will extend this with features designed for team and production deployments: RBAC (role-based access control), SSO, encryption at rest, audit logging, database snapshots, and GPU acceleration for large-scale vector workloads. (I think to add GPU acceleration on the open-core too)
On-prem server mode with multi-process access is part of that roadmap. Same engine, same sovereignty guarantees, but with the operational features that security teams and DPOs expect.

My approach is: ship a solid engine in the open-core first (WAL, HNSW, hybrid search, graph traversal, agent memory SDK, GPU, AVX512 + AVX2+FMA + ARM NEON acceleration all in ~3MB), then build the enterprise layer on top. Not the other way around.

Thanks for checking out the repo. If you test the open-core version, I would love to hear what breaks first - that is always the most useful feedback. Please also note that if you use velesDB for a public commercial (or not) project, I can share a link on velesDB.com

Some comments may only be visible to logged-in visitors. Sign in to view all comments.