DEV Community

Cover image for How does Bokeh, the Python Interactive Visualization Library, Secure its Open-Source Repositories?
Ziad for GitGuardian

Posted on • Originally published at blog.gitguardian.com

How does Bokeh, the Python Interactive Visualization Library, Secure its Open-Source Repositories?

Open-source is everywhere, it is one of the driving forces of software innovation from the academic to the enterprise world (75% of codebases audited by Synopsys in the 2021 OSSRA report rely on open-source components. Its prevalence in commercial software is reaching unprecedented levels, to the extent that the European Commission has recently identified it as a public good, in a recent study assessing its impact on the region's economy.

But the interstitial nature of open-source in modern software also makes it a subject of security and compliance concerns, as it is capable of exposing organizations that use it to a host of unknown risks and vulnerabilities. Most discussions we are hearing today around security in this space are focused on the identification, fixing, and remediation of vulnerabilities - all seen from the "consumer" perspective.

This time, we decided to go on the other side of the fence. We had the pleasure to exchange a few words with Bryan Van de Ven, co-creator and core maintainer of the Bokeh project, a Python library for data visualization. Bryan gave us an insider look at how open-source maintainers such as himself shield their projects against the attempts of malicious actors trying to exploit security gaps. The goal of attackers is straightforward: introduce vulnerabilities downstream, and in turn, attack the software supply chains that depend on the same open-source packages and libraries.
Bokeh, the interactive visualization library for the modern browser

Bokeh (pronounced /ˈboʊkeɪ/ BOH-kay) is an interactive visualization library for modern web browsers, written in Python. It provides elegant and concise construction of plots while maintaining high-performance interactivity over large datasets. Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications.

Standalone examples of data plots made with the Bokeh library

Before starting his endeavor with Bokeh in 2012, Bryan was no stranger to open-source libraries. He authored the conda package manager and worked full-time at Anaconda on its distribution, simplifying package management and deployment for more than 25 million users worldwide. Inspired by his previous contribution to Chaco (Python data visualization library) and the rise of JavaScript-heavy frameworks for frontend in the early 2010s, Bryan teamed up with Peter Wang to offer an alternative for Python developers, who were working on interactive data applications for the modern browser. The rest is history.

Why is security important for open-source projects like Bokeh?

With 37,000+ public GitHub repositories declaring its use and 2.5 million monthly downloads, Bokeh has made a name for itself. Bryan believes this has a lot to do with the project's early days, where patience, responsiveness, and receptiveness to the contributions from a community in its embryonic stages play a determining role in later successes.
Now comes the difficult part, ensuring Bokeh is open-source code on which individual developers and enterprise teams alike can safely build. Together, we discussed some of the threats keeping open-source maintainers up at night:

Typosquatting

This attack implies bad actors pushing malicious packages with similar names to the original one to a trusted registry and crossing fingers for users to fall for their dirty trick. Packages hosted on the npm and PyPI registries have been notable targets, reminding us that developers too can fall prey to a different breed of phishing.

With more than 2.5 million monthly downloads shared between conda and pip and ~150 million requests for BokehJS resources at cdn.bokeh.org every year, the bokeh library looks like it is ripe for the picking, at least from an attacker's perspective.
Once the malicious packages are installed and executed in runtime, attackers can:

  • Siphon environment variables for further lateral movement (crossenv attack)
  • Hijack cloud computing resources for crypto-mining (PyPI attacks in 2021)

Unfortunately, there is not much the package maintainers can do here. It is recommended users scan their project dependencies to verify their integrity with tools such as WhiteSource, Sonatype, Snyk or Vdoo.

Compromised build & deployment pipelines

Like other developers, core maintainers may not be application security experts. They are not particularly immune to introducing bugs in their code or inadvertently committing secrets in their project's git repository. Insecurely handling credentials can lead to their public broadcasting on GitHub, with consequences for projects like Bokeh ranging from Cloud Jacking (e.g abuse of AWS cloud computing resources) to compromising their publishing rights on PyPI or Anaconda.

With a team of 5 to 6 core maintainers and a total number of ~500 contributors spanning the project's life, things can get messy for Bokeh if credentials are spilled. Bryan assures us only a handful of contributors are cleared to handle the project's secrets and the team uses GitHub's Encrypted Secrets to store and retrieve credentials in their GitHub Actions CI pipelines.

In addition to this measure, Bokeh is hardening the CI pipelines with GitGuardian, effectively catching any leaked secrets as soon as the code wrapping them reaches the build stages. More on this later.

Threats to Asset Integrity

Open-source software is not just a public code repository hosted on GitHub. It is also a collection of assets, guiding both starters and experienced members of the community through their learning journey. To name just a few from Bokeh:

  • Website
  • CloudFront CDN for distributing the BokehJS runtime at cdn.bokeh.org
  • Public documentation
  • Blog (hosted on Medium)
  • Social media accounts (YouTube, Twitter…)
  • Community and support (Discourse)
  • Other tools (Zapier for automation, Pingdom for forms and surveys)

Unlike open-source code, such services will never run in the users' production environments. But they are equally important in terms of the protection they deserve since they could offer openings to attackers with a strong taste for Social Engineering tactics.

For everything other than the documentation, Bokeh is using the free tier 1Password offers to open-source projects to securely store and manage all their credentials.

How does GitGuardian shield Bokeh's public repositories?

Bryan tells us his first encounter with spilled credentials was in a coffee shop, coincidence? While testing new automation functions and uploading documentation to the CloudFront CDN, he accidentally pushed an AWS secret key and its token.

Incident view example in the GitGuardian dashboard (fictitious data)

Luckily, GitGuardian had his back. Our pro-bono service caught this secret in the public bokeh repository and sent him a timely alert with the incident's details.
Bryan was able to remediate in time, revoking and rotating his secret, and finally rewriting the git history to remove all evidence of the leak. In less than an hour, his fast response allowed him to nullify the damage and close the window for any exploits.

Since this incident, his team has had GitGuardian's CLI, gg-shield, running on every Pull Request, in their GitHub Actions workflows. The secret scanning checks and their results are also displayed in the VCS, making sure all contributors have full and equal visibility - without ever leaving their developer environments.

What's next for Bokeh and GitGuardian?

The Bokeh team is actively adding core contributors and one of the recent joiners is a git hooks power-user, which works out perfectly since GitGuardian offers secret scanning at the pre-commit level. The Bokeh team can now shift further left and prevent secrets from reaching their VCS and CI pipelines.
Bryan was also excited about our latest product updates, namely the Presence and Validity checkers.

These features provide hints as to whether the leaked secrets are still present in the git history or not and if they are valid, meaning they can still be exploited by malicious actors.

Interesting! This will help teams modulate the fervor of the response to incidents and focus their efforts where they are most needed.

What about you? Are you actively maintaining an open-source project or looking to make your source code public?

Here are a few resources on securing open-source code that can get you started:

Top comments (0)