PostgreSQL at AWS re:Invent

#postgres #conferences

This month's PGSQL Phriday #014 is about PostgreSQL Events and the publication date coincides with the end of one of the biggest IT conferences, and therefore one of the largest PostgreSQL community gathering, AWS re:Invent, where I've meet a lot of PostgreSQL users and contributors

PostgreSQL is a highly popular database that not only appears at events organized by the PostgreSQL community of contributors but also at any event that brings together a community of PostgreSQL users. These events include gatherings of CTOs, system architects, developers, data analysts, DevOps engineers, IT managers, and others. Those who consider the database as one infrastructure component, among many others, may not attend PostgreSQL-only conferences, and the presence of the PostgreSQL community at general events becomes crucial.

Numerous AWS users process their data with PostgreSQL code. While some manage it by installing PostgreSQL on an EC2 instance or a container, others utilize a managed service. In this scenario, the security barrier between cloud providers, managed service providers, and database users often necessitates the use of a PostgreSQL fork with added security features, rather than granting full superuser access to the host. Moreover, modifications may be made to optimize database usage in a cloud environment, whether they are minor or extensive.

As an AWS Hero, this is my second re:Invent that I am attending in person. For me, the primary goal of attending a conference is to meet people. As I don't often travel to the US, I've also met people not attending the conference but fortunately being around Las Vegas, such as PostgreSQL community contributors and YugabyteDB users. I attended some keynotes and sessions at the conference, but their value lay in interacting with other attendees. I am interested in knowing why they are passionate about the topics being discussed and what questions they have. As a Developer Advocate, it helps me to broaden my perspective, learn new things, and understand better how the databases are used.

Unfortunately, I missed OPN302 | AWS open source strategy and contributions for PostgreSQL, which was at a different hotel than where I spent most of my time. AWS contributed to a few opensource extensions: https://github.com/search?q=org%3Aaws+postgres&type=repositories

Gen AI is everywhere, and pgvector made embeddings in the database a popular topic. DAT407 | Best practices for querying vector data for gen AI apps in PostgreSQL was also at another hotel. However, I was able to attend the excellent live coding session DAT413-R | Using LangChain to build gen AI apps with Amazon Aurora and pgvector:

Similarity search in SQL databases has become popular but also brings some unusual behavior that users must be aware of, like the result being different when using an index or not.

Conversions from other databases, especially Oracle, to PostgreSQL are always a popular topic. DAT415 | Convert a Java app and database to PostgreSQL and fix common issues has plenty of examples of rule #1 for migration between databases: instead of mapping features one by one, it's important to understand the business needs and find effective ways to implement them in PostgreSQL.

Finally, DAT344-NEW | [NEW LAUNCH] Achieving scale with Amazon Aurora Limitless Database was a great example of PostgreSQL popularity. Although Amazon Aurora is not PostgreSQL, it uses a fork that users and developers often perceive as PostgreSQL. It offers the same syntax, protocol, and runtime behavior as the genuine PostgreSQL. To avoid disclosing a new feature before the keynote, the session was added to the agenda just a few hours before it was scheduled. The large room was packed. Aurora Serverless scales out only read replicas, and Aurora Limitless goes further with sharding. The high interest in this session means that many organizations dealing with Terabytes or Petabytes of data want to use PostgreSQL and need to scale it out. Its popularity is increasing in the enterprise world.

Aurora Limitless is a read-write load-balancing of PostgreSQL. When coordinators push queries to shards via Foreign Data Wrapper, Citus comes to mind. However, Aurora Limitless offers more capabilities and is not built on Citus. While Citus is an extension that can integrate well with PostgreSQL, it comes with certain limitations where no predefined hook can extend the behavior. Aurora Limitless offers a simpler syntax than Citus, which requires creating a non-distributed table and calling a procedure to do what was impossible to add to a CREATE TABLE command. Rather than eventual consistency of Citus global reads, (see Citus is not ACID), Aurora Limitless implements read consistency by using timestamps to build the read snapshot:

Aurora Limitless is not Citus and is not Distributed SQL either as it lacks many global-level SQL features, like global unique indexes indispensable in OLTP where tables usually have more than one key, or serializable, the golden standard of SQL isolation levels. It also doesn't provide full resilience: failover, upgrades, and re-sharding are downtimes.

Even though Amazon Aurora is a closed-source, proprietary database that doesn't give back its improvements to the upstream PostgreSQL, it still provides valuable insights for the future development of PostgreSQL. The features that AWS implements always come from customer feedback. The PostgreSQL community must meet with those users and understand their requirements so that the Open Source PostgreSQL can address them. I hope to see more PostgreSQL forks in PostgreSQL events and more PostgreSQL contributors in other Developer, DevOps and Cloud events.

I've mentioned multiple sessions by their name, I'll add the link to the recordings when they will be published. The goal of this post is to encourage people to go to cross-technology conferences, and PostgreSQL is popular in all of them.

Thanks to a two-hour delay from British Airways, I was able to attend the Las Vegas PostgreSQL Users Group meetup, which was the cherry on top, one of the best PostgreSQL meetup I attended with no talks but a lot of discussions about PostgreSQL and its adoption by developers. This PGSQL Phriday #014 is about PostgreSQL events. Having met many PostgreSQL users and contributors at the AWS conference and this meetup, it feels like attending a large PostgreSQL conference.