Over the course of the past month, I’ve built the first version of FelloWage—a website that allows users to share their salary information and view information shared by others.
Of course, I wanted to keep the information of my users private and anonymous. When somebody is looking at the wage entry, they shouldn’t be able to tell who this entry belongs to.
On the other hand, to keep the quality of the shared data high, users should be able to update their salary entries when they change over time. Thus the wage records need to be somehow connected to the respective user accounts.
This poses an issue: if I implement it in the most obvious way (a database foreign key relation), then the website operator (me), effectively, has access to this connection between the user account and a salary record. Also, legal authorities coming with a court order will be able to see this as well, and in the unfortunate case of a successful cyber attack, the hackers will get their hands on this data too.
This is not good enough! How can we do better?
We are looking for a solution where:
- the wage entry is readable by all users of the system,
- the connection between wage entry and the user account is readable ONLY by the user account owning that entry,
- user accounts are readable to the system (at least partially) for the purposes of the login system.
In my scenario, since the user sign-up verification is a manual process, I needed the system to be able to write this connection without the presence of the user.
If we sum it up: only the user should be able to read a connection record, and the system should be able to write this record (but not read it).
Assymetric encryption
This sounds to me like asymmetric encryption, where the system knows the user’s public key, and the user knows their own private key. The system uses the public key to encrypt the information when it needs to write it, and the user can read that information using their private key.
Of course, the next challenge is UX. We can’t have the users use private keys every time they want to login.
That’d be too clunky.
Passphrase-encrypted private key
Then it has stricken me: what if I do the same thing SSH keys do when you set them up with a passphrase?
Now the system will store both types of keys:
- the public key, and
- the private key encrypted using the user’s password.
Now, even if I were to drop into the raw SQL in my database console, I wouldn’t be able to tell who owns which wage entries anymore! And the user can still see and manage their own entry.
As a bonus, I’ve found this method of securing the user’s data quite convenient, and I used it for other information as well, where I’m sure that only the user will need access to this data.
Of course, we still need to make sure that the users create strong passwords that are not vulnerable to dictionary attacks, and weren’t part of any breach. I’ve used zxcvn by Dropbox library and “Have I Been Pwned” API for that.
Thank you for reading!
I’m glad you’ve gotten to the end of this post! If you are interested in more behind-the-scenes posts like this, you should subscribe to our newsletter.
The next post that is in the making is a deeper dive into the implementation details and challenges of the solution from this article. Don’t miss it, grab our newsletter here! 🚀
Thank you for your support!
Top comments (11)
This is great. Seems a lot of people, including me, are thinking about the same problem.
Maybe it's my lack of understanding, but I would like to ask: where does decryption happen? Since you keep the private key passphrase encrypted, I would assume this happens on the server side. Does it mean that when data needs to be accessed by the user you get the password from the request, here assuming you are not keeping the user password plain text, then decrypt the data and send it back. What I'm trying to understand is: can the system owner alter the system to gain access to the user's data?
What I was thinking about is to keep the private key passphrase encrypted within the client app ( + mailing it as a backup) and provide the user with ability/instructions to transfer the private key, between clients/browsers. So, essentially the server keeps only encrypted data, the client handles decryption, etc. This still doesn't guarantee 100% privacy, i.e. as a service owner I can alter the client to send the data I want to the server.
Anyway thanks for the post, it gave me a different perspective to think about.
Yes, it does happen on the server, and then the result of decryption (private data) is stored in the encrypted session that is available only when the client communicates with the backend.
Of course, as the owner of the system, I can always modify it to eventually get access to anything (when the user finally logs in).
I would be violating my own terms of service and privacy policy though by doing that and will be liable for this. And I don’t want that kind of liability on my shoulders.
Now, the client-side encryption is awesome! Especially, if you are dealing with power users or niche, where folks will overcome the initial UX struggles to start using the authentication key like this. In fact, certain bank systems allow you to log in only if you use a separate private key file on a special crypto-device that you have to connect to your computer.
And for banks or other crucial systems, and especially in B2B and for enterprises, that might work.
For the B2C software that’s going to be a huge block preventing people from using your product, unfortunately… Especially, if the product, like FelloWage, relies on a lot of user sign-ups and data shared before it becomes valuable for other users.
How do you manage (if you do) the 'lost password' process ? Is the user allowed to change his password without giving the old one first ? It should break the 'link' between the encrypted record and the actual record, isn't it ?
I store the same encrypted data in 2 forms:
When users create their password, I create 2 types of asymmetric key: password-based and recovery-file-based (this one is essentially a super-strong generated password).
As soon as this file is generated, it gets sent to the user over the email. It’s not stored anywhere else.
Now, when the user wants to use the “Forgot Password” feature, they’ll have to upload the recovery file.
The system then decrypts the data using the recovery-code-based private key, re-encrypts that using new user’s password and issues a new recovery-code for the user.
This is actually a great concept. I was wondering how to protect my users' data. <3
Be careful with this technique, as it is a double-edged sword. This type of encryption is very CPU intense. That’s why it is so secure.
In my stress-testing setup (3-5 most expensive Heroku dynos), I’ve seen registration and login take up to 30 seconds when the system is loaded with 300-500 simultaneous users trying to signup or login. And that’s with a quite fast statically compiled language.
(unfortunately or fortunately, we didn’t get close to these numbers during the launch on PH 🙈)
Okay.. :D good to know :D
An interesting read. Loved the concepts
I actually have already visited the site and I wondered how you were securing everything. This clears things up!
Nice!
I wonder how this works with data laws.
If a user requests data deletion, is it okay to just remove the user data and then keep the now unrelatable data, or would we need to delete that too?
We allow the user to delete their wage entry in the UI of the application if they so choose when they are logged in. Because remember, the entries are still connected, it’s just that the user only can see and manage this connection.