I’ll try and make a habit of writing summaries for our contributing.today meetups. For background, read this post.
April 13 Suzanne Daniels, Developer Relations Lead at Spotify, hosted our session about open source databases. You've probably heard the phrase that every company is a software company these days, but really every company is a data company, and we all collect and produce so much data we don't know what to do with it and how to make sense of it all.
I mean, us commoners don't, it's a good thing that our guests do! Rain Leander, Python developer and AppDev Technical Evangelist at Cockroach Labs, the distributed SQL database designed for speed, scale and survival - hence the cockroach, those little buggers survive everything. Rain loves gaming companies’ use case for CockroachDB, and does a semi-regular show “Playing w/ Roaches” in which they play games that run on - you guessed it - CockroachDB.
Gregory Stark is a Senior Open Source PostgreSQL Developer at Aiven. Postgres has a reputation for being stable and reliable. “It does all the things SQL databases are traditionally good at - live backups, replication, storage.” Originally an academic project, Postgres’ open source stewards were interested in its extensibility, the ability to merge it into any project and with any codebase. “SQL databases have been a great fit for business needs, PostgreSQL is different in that it’s developer-oriented, but today serves both needs.”
James Blackwood-Sewell is a Senior Developer Advocate at Timescale, which “takes PostgreSQL and gives it superpowers to deal with time series data”. Timescale is an extension for, not a fork of, Postgres.
Laura Ham is a Machine Learning Product Researcher at SeMI Technologies, building the open source vector database weaviate. For an introduction to Vector Databases and weaviate, definitely read this blog, but vector databases are great when you’re working with unstructured data, using semantics to store and make available data based on different criteria than you would simple strings in SQL databases. Think: semantic search, automatic data classification, (reverse) image search, … Kinda like what Google does for their search engine, but open source.
Gregory attributed the popularity of open source databases, and Postgres in particular, to the growing popularity of the internet. People wanted to use a database that didn’t force you to use a particular (commercial) development platform. Postgres was built by peers, people from the same generation, who you would talk to on the mailing list and influence the roadmap.
It’s still early days for vector databases, but with AI and machine learning gaining interest, Laura hopes weaviate will follow a similar path to popularity and adoption, like Postgres. “Being able to see (and modify, distribute, …) the source is what makes open source databases so popular.”
James comes from an architect role where he needed to convince people to move to open source from proprietary solutions. He thinks the DevOps movement got developers to use open source databases because it allowed them to move quicker.
Rain got us started on the licensing topic, when they said that CockroachDB isn’t technically open source. Instead it uses the Business Source License (BSL, written for the MariaDB project originally), and only switches to Apache 3 years after a release. While created under an open source license, when big companies started offering CockroachDB as a managed service, Cockroach changed their license and took back their business.
Rain mentioned how they got critiqued by former colleagues at Red Hat for joining a company that is not open source. But they can understand why someone would go the “restrictive route”. Rain will always be upfront with folks looking to contribute to CockroachDB. “Your contribution won’t be public until 3 years after a major release. You’ll be credited of course, but it’s not the same as contributing to Postgres.”
(Amazon) Aurora is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. Greg says: “It’s a great project, and they talk very openly about their internals and how PostgreSQL could benefit from some of the things they’ve implemented, but you can’t fork Aurora.” Proprietary software has as a downside that you never know what a business’ long term strategy is for the project, whereas you can be sure that in 4-6 years from now PostgreSQL will still be here and still evolving.
“Companies like Red Hat, and also Aiven, rather than creating a new product, are the go-to experts for existing projects, and will continue to be for a very long time.”
Timescale develops in the open, but their Timescale License (TSL) prohibits as-a-service offerings. Forks are allowed, but the license would still stand. James wasn’t sure whether using Timescale at a company for internal consumption is allowed.
Several projects adopted more restrictive licenses over the last few years, specifically targeting cloud vendors.
Weaviate is licensed under BSD 3, which is a very permissive license. PostgreSQL never considered going more copyleft style because the community is what kept the project going, there was never a commercial drive.
In reply to Suzanne’s question whether DBaaS (Database-as-a-Service) is a threat to open source, Laura answered that DBaaS is at the same time a model for these projects to continue to thrive. James notes that DBaaS, like offering consulting or support, is a way for projects to finance their development.
Most all projects accept help and contributions from service providers, but their feature requests can’t outweigh what the community is looking for in the project.
If you’d like to get involved in the Timescale community, James suggests you check out timescale.com/community. Timescale contributes to Postgres as well, and encourages their community to give back as well. Recently they made sure a bunch of people contribute upstream full time, before that casual contributors kept being pulled in to work on Timescale.
Gregory, at Aiven, is part of the Open Source Program Office, where he spends all his time working on Upstream Postgres. To join the conversation, Greg suggests checking out the mailing lists, and/but to use filters. He recalls his first contribution to PostgreSQL well, and how it was a great experience - which can make all the difference in terms of gaining long term contributors to a project.