Every year tens of thousands of respondents trust the State of JavaScript and State of CSS surveys with their data, some of it quite personal and sensitive, and I'm fully conscious of the responsibility this represents.
So ever since starting to run the surveys, I've hoped that I would never have to write the dreaded "data leak" post. But sadly today is the day I need to address this issue.
TL;DR
An encryption key that makes it possible to decrypt publicly-available encrypted email addresses and link them to survey responses was mistakenly committed to a public GitHub repo.
Key Points
- This is a human error, not a malicious attack.
- The leak is now closed.
- You are concerned if you answered the State of JS or CSS surveys before and up to 2020 (the 2021 JS and CSS surveys are not affected).
- So far there is no evidence that the mistake was actually exploited, but I'll keep monitoring the situation.
- Passwords were not affected as they use a completely separate hashing mechanism.
What Happened
This situation resulted from three separate mistakes:
- I made the decision two years ago to add email hashes (or so I thought) to publicly available survey responses datasets (for surveys up until 2020; 2021 datasets were not published yet) in order to use it as an ID and make it possible to track how a given respondent's answers were evolving over time.
- An open-source contributor contributed the function that generate those "hashes" and used a 2-way encryption function. Somehow over time I made the assumption that it was instead a 1-way hashing function.
- About a month ago, another open-source contributor committed private credentials -which included the encryption function's encryption key– to a public repo while working on a separate project. Although the contributor noticed the issue and scrubbed the history right away, the faulty commit apparently stayed accessible by itself as a "ghost commit" outside of a branch.
Both because of the holidays, and because I didn't realize the consequences of the leak right away, the encryption key stayed accessible in theory for about a month.
What This Means For You
The risks to survey respondents are two-fold:
- Someone could use the dataset to generate an email list used for spamming purposes.
- Someone could link personal data (salary, etc.) to the email address you used.
Was the Leak Exploited?
The "good" news is that the repo the key was committed to is very low traffic and had no forks, watchers, or stars, making it less likely that ill-intentioned people randomly stumbled on the encryption key.
Moreover, even with the key in hand an attacker would've had to then figure out where the key was being used (which happens in a separate repo); what it was being used for; and where the relevant encrypted emails were made available; none of which is obvious unless one is already familiar with the project.
So while I don't have any way to tell with certainty if anybody actually went through the process of decrypting the encrypted emails and correlating responses with them, I personally think the probability of this happening is fairly low. But I apologize for not being able to give you more certainty.
Steps Taken
I've taken the following steps:
- Stop using the leaked encryption key.
- Make the repo private so that the encryption key is not accessible anymore.
- Take down the public datasets containing the encrypted emails until I can re-upload versions without them.
Note: if you happen to have a copy of the datasets or are hosting a mirror, please get in touch or delete your copies if you can!
In the future, I will also focus on making it possible to complete the survey without having to provide an email, which is something that survey respondents have often asked for.
Ironically enough, the leak happened in the process of migrating the survey app to a newer, more robust codebase in order to make it easier to change the way accounts work.
Going Forward
The surveys are an open-source project, created in the open by a mostly-volunteer group of contributors from around the world. And while this can sometimes make it tougher to properly coordinate and avoid situations like this one, I also think being community-driven is one of the project's major strengths.
So while it's totally understandable if a leak like this one makes you question sharing any data with us in the future, I hope you'll be able to give the project another chance.
And if you're not fully comfortable sharing personal information just yet, here's a reminder that you can always skip any question in any survey. Another thing that might put you more at ease might be to use an email alias that can't easily be tied back to you.
I deeply apologize again, and if you have any questions about this whole thing, just leave a comment here and I'll do my best to answer.
Note: I am very grateful to Troy Hunt for pointing me to this great article about the proper way to handle such matters. I recommend it if you ever end up in the same situation!
Oldest comments (34)
Twitter thread if you have any additional questions: twitter.com/ericbureltech/status/1...
Thank you for being so transparent and honest about this! Everyone makes mistakes and I appreciate the effort that goes into these surveys every year.
Thanks for your kind words!
Honest mistake, commendable recovery. (Who’s ever gonna misuse that data anyway. Let's hope only people who still use too many float:left's and too many !important's get spammed with beginner CSS tutorials! Sorry stupid joke.)
This is a good reminder to not store encryption keys in a repo. Ideally use something like Hashicorp Vault, but at least don't store them in files within the repo.
Hosting systems like netlify, azure etc let you provide secrets via their UI and can be accessed from code through the process environment (process.env in node)
I'm not a huge fan of this solution either (it can lead to a lot of unsecure copy/pasting into Slack or Dropbox when you need to share the secrets, multiplying the number of places the secret exists) but it's true it would have avoided the problem in this specific case.
It always comes back to that human error of the postit on the monitor with password. Lol
Thanks for being so transparant about this! I reckon most companies don’t even bother disclosing anything until they know for certain data was actually decrypted by someone. Hell I’ve seen companies actively downplay the severity of a situation even when they know for sure passwords have been leaked.
Well, without people’s trust the surveys can’t really work. So I’ve always tried to do everything in the open from the start. Thanks for the kind words!
Thanks for the transparency and clear communication. I would imagine it's a tough and nerve-wracking experience to post this article, so thank you also for your courage to show the (IMHO) right way to handle this.
A+++ would answer survey again.
Please ensure you consult experts on security and privacy before choosing a new approach, and also seek community feedback once you come up with a new plan.
For example, it’s not enough to simply use an ordinary one way hash of email addresses, because nothing stops an adversary simply applying the same function to some publicly known email addresses and looking for matches in your dataset. I suspect this is probably what the original developer had in mind when they chose an encryption function instead.
Yes, we will not publish hashes at all going forward. We do need to store one way email hashes privately for log in purposes, but they won’t be part of any public dataset.
Human errors happen all the time, unfortunately. On the other hand, transparency is a rare value, thank you very much for being worthy of trust because of your honesty!
+1000
Everything is fine if lesson was learned)) Sometimes such things can happen with best of us..
Transparent disclosures are always appreciated. You may also want to look at tools like gitleaks to prevent secrets from being committed.
Yes we are setting it up: github.com/VulcanJS/vulcan-next/is...
It's not so obvious to setup though, and I still need to test if it actually would have caught this one leak for instance, or more probables one (eg leaks in dotenv files), which explains why this tool is not common enough