Every year tens of thousands of respondents trust the State of JavaScript and State of CSS surveys with their data, some of it quite personal and s...
For further actions, you may consider blocking this person and/or reporting abuse
Thanks for being so transparant about this! I reckon most companies don’t even bother disclosing anything until they know for certain data was actually decrypted by someone. Hell I’ve seen companies actively downplay the severity of a situation even when they know for sure passwords have been leaked.
Well, without people’s trust the surveys can’t really work. So I’ve always tried to do everything in the open from the start. Thanks for the kind words!
Thanks for the transparency and clear communication. I would imagine it's a tough and nerve-wracking experience to post this article, so thank you also for your courage to show the (IMHO) right way to handle this.
A+++ would answer survey again.
Honest mistake, commendable recovery. (Who’s ever gonna misuse that data anyway. Let's hope only people who still use too many float:left's and too many !important's get spammed with beginner CSS tutorials! Sorry stupid joke.)
An e-mail address in combination with your development preferences could be used to target customized phising attacks agains Devs. We can be an attractive target, given IT is one of the best paying industries out there. That being said, we're also one of the most aware and thanks to Sacha's quick and honest reaction, we're now aware that things like that can take place.
"given IT is one of the best paying industries out there."
Lol, not where I work at... 😅🥲😣
Human errors happen all the time, unfortunately. On the other hand, transparency is a rare value, thank you very much for being worthy of trust because of your honesty!
+1000
Thank you for being so transparent and honest about this! Everyone makes mistakes and I appreciate the effort that goes into these surveys every year.
Thanks for your kind words!
Please ensure you consult experts on security and privacy before choosing a new approach, and also seek community feedback once you come up with a new plan.
For example, it’s not enough to simply use an ordinary one way hash of email addresses, because nothing stops an adversary simply applying the same function to some publicly known email addresses and looking for matches in your dataset. I suspect this is probably what the original developer had in mind when they chose an encryption function instead.
Yes, we will not publish hashes at all going forward. We do need to store one way email hashes privately for log in purposes, but they won’t be part of any public dataset.
This is a good reminder to not store encryption keys in a repo. Ideally use something like Hashicorp Vault, but at least don't store them in files within the repo.
Hosting systems like netlify, azure etc let you provide secrets via their UI and can be accessed from code through the process environment (process.env in node)
I'm not a huge fan of this solution either (it can lead to a lot of unsecure copy/pasting into Slack or Dropbox when you need to share the secrets, multiplying the number of places the secret exists) but it's true it would have avoided the problem in this specific case.
It always comes back to that human error of the postit on the monitor with password. Lol
what was the original motivation to « track how a given respondent's answers were evolving over time»?
Let’s imagine that in 2020 Famous React Developer Foo shares the survey and brings in their audience; and then in 2021 Famous Vue Developer Bar shares the survey and in turn brings in their audience. In theory you could have shifts in survey answers just because different people are answering the survey. By adding these ids my idea was that you could isolate a constant cohort of respondents if you wanted to remove the influence of audience shifts.
Thanks for the quick reply. Two concerns:
People generally have the expectation that such surveys are anonymous and that results are only gathered in aggregate. It is also safer. It looks like you have realized the value of these propositions.
So: Wouldn't you still be able to achieve your intention by survey respondents revealing their previous framework exposure? Like checking "I have mostly experience with..." React / Vue / Angular, etc. Then you could see the influence of audience shifts.
I don't think that achieves quite the same thing. I think the simplest solution is to have two datasets, one without any kind of identifiers for the general public and one with (secure) identifiers which we would only make available to data researchers who want to specifically do a cohort analysis if they get in touch with us.
What would be the limitation? Unless you actually want to model the relationships between the influencers and their audiences, I don't see how you actually need to track personally identifiable information to track trends in demographics...
If we want to track how cohorts evolve over time then we should just track that in a secure manner; or not track it at all if we can't do it right. It just seems like a simpler approach than finding some other more "fuzzy" metric to use as a proxy.
The question is if it's really necessary to track 'cohorts' per se? With the security risk and disfavored UX it entails. If you can get a decent enough statistic from other more aggregate means.
I appreciate the disclosure and the transparency, and I sympathize with the incident. However, I don't see key steps in this post that would make me trust the survey going forward. To be blunt, the fact that you mistook a 2-way encryption for a hash makes me think that you do not have the security expertise to be responsible for this data.
The "Steps Taken" section still talks about mitigating the encryption mechanism. Is that the same 2-way encryption that caused the issue? Why isn't the first step to remove the encryption mechanism and replace it with a 1-way hash? If you still need to continue using keys, is there a better option for key management than simply making the repo private? The "Going Forward" section doesn't mention security improvements at all.
Before I'd trust the surveys again, I'd like to see you talk about third-party security audits, and how you're going to verify security-related contributions going forward.
Thank you for the write up.
I don't think I received an email about this but a friend who also took the survey said he got an email about it. Were all participants emailed about the data breach?
Can you explain what a "ghost commit" is?
Yes, all participants were emailed. Maybe you unsubscribed from the mailing list in the past?
And as I understand that "ghost commit" was a commit that was not part of a branch or linked from anywhere on GitHub but still independently accessible if you had the direct URL.
You know you can use BFF, right?
docs.github.com/en/authentication/...
I've done it too, but in luckily on a low exposure system. I seen to recall finding a way to strip the orphan commit, but probably had to recreat the GH repo. I also seem to recall GH also added some checks for secrets, but I guess not foolproof.
We all make mistakes so it best to try to mitigate, even at the expense of DX. Eg tighten up access permissons so no rm -rf /, don't use eval() or otherwise make ìt hard to parse expressions that may contain unsanitised user input (eg JSX dangerouslySetHTML())
Thanks for the healthy handling of this issue.
I know others have said it, but really appreciate the transparency on this. You could have easily decided the risk was small and kept it to yourself. More people need to have your progress mindset :)
Transparent disclosures are always appreciated. You may also want to look at tools like gitleaks to prevent secrets from being committed.
Yes we are setting it up: github.com/VulcanJS/vulcan-next/is...
It's not so obvious to setup though, and I still need to test if it actually would have caught this one leak for instance, or more probables one (eg leaks in dotenv files), which explains why this tool is not common enough
Everything is fine if lesson was learned)) Sometimes such things can happen with best of us..
Twitter thread if you have any additional questions: twitter.com/ericbureltech/status/1...