Many self-hosted file-storage applications solve the problem only superficially — they provide a REST API but do not address data protection. Let’s build a file service for local use, built around encryption. Everything — the database, directory names, file names and contents, and metadata — will be stored exclusively in encrypted form. A REST API with authentication and role-based access control will provide access to the data. In essence, the application will be a REST wrapper around an encryption layer with some additional capabilities. Any client application can interact with the service through the public API.
Choosing the stack
The application will be deployed with a single command as a separate Docker container and will be ready to use out of the box. Under the hood, we use:
FastAPI + Pydantic — a fast asynchronous Python framework. Responsible for the REST API and routes, validation of incoming data, and the overall operation of the application.
SQLAlchemy + SQLite — storage for all application entities: users, collections, documents, metadata.
Redis — used as a cache to speed up access to object data (and to reduce the load on the database).
LRU — used as a cache for file access (will reduce the number of file read operations and significantly reduce disk load).
gocryptfs — the heart of the application. It will provide full encryption of the directory with files. All data, including the database and files, will be stored exclusively inside the gocryptfs cipher directory (i.e., in the normal state everything is encrypted). To allow the application to access the original data, a secret key must be available.
Docker — for packaging the application and delivering it to any environment.
Application schema
We will use Pydantic for validating incoming data; it is included in FastAPI by default. Additionally, we will add our own registration and authentication.
We will build the application core following the microkernel pattern — each application router will trigger execution of a corresponding hook, which will be intercepted by connected add-ons to add extra logic.
To work with data, we will use a common repository through which any read/write operations will be performed. Internally, it will use entity managers and a cache manager. That is, from anywhere in the application, data access will be performed only through the repository, which hides the database and cache mechanics.
Additionally, we will add an encryption manager and a file manager for working with raw data.
At the same time, for the application to function properly, the secret key must remain continuously available.
External Docker Container
┌────────────────────┐ ┌───────────────────────────────────────┐
│ Public API │───────│ Routing │
└────────────────────┘ └───────────────────────────────────────┘
│
┌───────────────────────────────────────┐
│ Business Logic │
└───────────────────────────────────────┘
│
┌────────────────────────┬──────────────┐
│ Core │ Addons │
└────────────────────────┴──────────────┘
Volumes │
┌────────────────────┐ ┌───────────────────────────────────────┐
│ Secret Key │- - - -│ Data Access │
└────────────────────┘ └───────────────────────────────────────┘
│ │
┌────────────────────┐ ┌───────────────────────┐ ┌─────────────┐
│ Encrypted Data │- - - -│ Protected Storage │ │ Cache │
└────────────────────┘ └───────────────────────┘ └─────────────┘
The container will expose the following volumes:
hidden-secrets
— a volume with the application’s secret key (gocryptfs passphrase).hidden-data
— a volume with the application’s data (exclusively in encrypted form). It can be used for data migration, backups, or emergency recovery.hidden-logs
— a volume with logs (mainly used for development; in normal mode it can be completely disabled).
Why gocryptfs
gocryptfs is a reliable and fast user-space filesystem built on FUSE. Unlike solutions such as eCryptfs or LUKS, it allows encrypting individual directories rather than the entire disk and does not require root privileges.
Our application uses gocryptfs as an “internal disk”: each file, when written, ends up in the encrypted directory (cipher), file names and directory structure are obfuscated, and reading is only possible after mounting with a key.
Even if the host is compromised, an attacker will only obtain a set of ciphertexts.
Application layers
Properly separating the application into layers will greatly help with long-term maintenance and save you from endless technical debt:
┌───────────────────────┐
│ Business logic layer │
└───────────────────────┘
│
┌───────────────────────┐
│ Data access layer │
└───────────────────────┘
│
┌───────────────────────┐
│ Model layer │
└───────────────────────┘
│
┌───────────────────────┐
│ Storage layer │
└───────────────────────┘
Business logic layer. Since the client will be a separate application, the first layer on the backend side is the API. This is the coordination level between the client and internal services. Here we will have routers, validation, authorization, business rules, and application logic. At this level, we will also implement hooks for key operations, which will later allow us to extend functionality without touching the core code.
Data access layer. At this level we will implement a universal repository that provides a single interface for working with all entities. It encapsulates interaction with the database and the cache. The repository will be completely isolated from the business logic; its task is to provide a set of operations for working with entities of any class. The same repository will be reused for all entities.
At the same level, there will be a file manager responsible for asynchronous interaction with the filesystem. It will provide an interface for file operations: reading, writing, deletion. It operates independently of the business logic and fully abstracts interaction with the physical storage.
Model layer. Here, using SQLAlchemy, we define the database models. Each model will describe a specific entity and its mapping as a table, including relationships with other tables. The models will not contain business logic and will be intended solely for describing the data schema. They will serve as the basis for working with the repository, providing typing and integrity at the storage level.
Storage layer. At this level we will have the actual storage mechanisms: SQLite for structured data storage; the filesystem for file storage.
Main features
Full encryption — everything, including the database, directory names, file names, and metadata, is not directly accessible. Access to the original data is only possible through the application’s REST API.
Secret key (gocryptfs passphrase) — generated randomly at the first application start and written to a dedicated volume. An internal watchdog checks the presence and correctness of the secret key every few seconds and, upon any mismatch, immediately unmounts the gocryptfs mount, cutting off access to the decrypted view. In addition, before executing any request, the application also checks the key, and in the event of any mismatch, execution of all operations stops immediately and the cache is cleared. The key can be hot-removed without restarting the application.
REST API — all file operations (upload, move, rename, delete) occur over HTTP.
RBAC and MFA — a role-based access model and two-factor authentication (via TOTP) are added for working with the application. Four permission levels are provided: reader
, author
, editor
, and admin
.
Modularity — the system is implemented according to the microkernel principle; behavior can be extended via add-ons and hooks without changing the main application code.
Threat model
The application is designed with the scenario of full host compromise in mind (including theft of the device itself). If an attacker gains access to the filesystem or the database, they still won’t be able to read the contents — without the secret key, decryption is impossible (it is assumed that the secret key is stored outside the application).
What’s next
Next, we’ll look at the code of individual parts of the application and discuss the chosen solutions.
Project repository — github.com/artabramov/hidden
Top comments (0)