DEV Community

Postgres Is Underrated—It Handles More than You Think

Jason Skowronski on October 09, 2019

Thinking about scaling beyond your Postgres cluster and adding another data store like Redis or Elasticsearch? Before adopting a more complex infra...

Read full post

thojest • Oct 9 '19

Hi, this is a really nice article. The funny thing is, it comes at the right time. We use Postgres and are currently considering to add Redis behind our rust backend with websockets to enable push notifications for turtle (turtle.community), but are not quite sure if this is the right way.

Does anyone have some inspiration or experience on that topic?

zchtodd • Oct 9 '19

I'm not sure if it fits your use case perfectly, but take a look at NOTIFY, introduced in PG 9.0 -- it's basically a push notification mechanism built into PostgreSQL.

thojest • Oct 9 '19

thx for the info, will have a look!

Utkarsh Kumar Raut • May 10 '21

Really nice project, Turtle.

Brian Johnson • Oct 11 '19

I think another really important PG plugin to add is TimeScaleDB -- the ability to add efficient time series data collection/query to Postgres is awesome.

Mark Pieszak • Oct 9 '19

I'm loving your articles Jason, great job again !
It's hard to find in-depth and super high-quality articles like these in this day and age :)

Jason Skowronski • Oct 10 '19

This is an amazing complement! Mind if I use this quote on my site?

Mark Pieszak • Oct 10 '19

Of course! My pleasure :)

Anytime, keep up the great work!

Rafiullah Hamedy • Oct 15 '19

Great write up Jason. I wasn't aware of PL/Python until now. Definitely a big fun of the JSON data types and have never used hstore but, thank you for sharing the use case.

There is also Postgres-XL that seems to offer a lot and only 5 years old compared to 3 decade-old PostgreSQL.

Rob Conery • Oct 16 '19

Hey Jason great post! Quick thing on the full text example you have: if you use plianto_tsquery it will split the words for you and add the AND. If you use phraseto_tsquery it will apply a positional argument (<->) instead of the &, which is great if you're looking for a name or place. The new websearch_to_tsquery (in PG 11) is a great general purpose query builder as well.

David Howell • Oct 16 '19 • Edited

This is a great article!

You mention “Use multi-column indexes sparingly“, which is generally good advice, however I would qualify that by saying avoid redundant indexes. It’s important to know that multicolumn (btree) indexes have a specific ordering to the columns. If you don’t filter or join a table using one or more of the columns in that order then it can’t use the index.

For example an index on columns a,b. If you only filter on column b then this index can’t be used. It can be used if you filter by just a, or both a and b. Following on from that, if you frequently filter by both a and b then this is a good index to have. If you also have both a single column index on a, and a multicolumn index on a,b then in that specific case the single column index is the redundant one.

Nguyen Kim Son • Oct 10 '19

Great article Jason! It's really important to limit the number of technologies used in a project and not follow trends blindly! This is quite a similar analogy to the microservice vs monolith debate.

Scott Watermasysk • Oct 9 '19

100% especially on search. Search in Postgres is very underrated.

Corey Cleary • Oct 9 '19

I wasn't aware of tsquery, thanks for pointing that out! - I'm working on a project right now where full on Elasticsearch is probably overkill, but needed something more robust than just doing LIKE

Joe Zack • Oct 10 '19

Thanks for the post, I learned a lot of new things! I'd also like to mention that OLAP style queries can be rough with lots of data.

David Howell • Oct 16 '19

I am also thinking about this. RDBMS generally are good at most types of workloads but mixed workloads like OLTP and OLAP on the same system will interfere with each other.

How do people do embedded reporting AND transactions in modern web apps?

NoSQL solutions like ElasticSearch are mentioned but they seem more appropriate for search. Data warehouse solutions like Snowflake, BigQuery, Redshift are good for internal analytics and reporting but they just don’t have the concurrency to support direct queries from customer facing apps.

How else to do this without complex data pipelines or complex infrastructure involving Kafka?

Joe Zack • Oct 16 '19

I do it with a complex data pipeline and infrastructure involving Kafka!

I've got my eye on Apache Druid, though I haven't spent any real time (sorry, pun totally intended) with it.

David Howell • Oct 17 '19

I’ve looked at Druid, also considering MemSQL , ClickHouse and others

Charles Reace • Oct 18 '19

Don't forget Materialized Views: can be an awesome way to optimize searches on things that you might be tempted to dump into some NoSQL text-based search tool.

Aurel • Oct 21 '19

Thanks for this post,
I wrote about using postgres to setup a distributed database. Here is the link for those who are interested dev.to/sh1ftsh/setting-up-distribu...

Ido Shamun • Oct 27 '19

Wow! An awesome review of Postgres important features.
I can't stress out more how devs sometimes just follow the hype without thinking. When someone suggests a NoSQL data store, I always ask her/him to convince me why SQL doesn't work here.

David Howell • Oct 16 '19

Is “forking” count some special operation or just a nice word in place of swearing?

Regarding (exact) count, pretty much every system has trouble doing this quickly and on most cases you really don’t need an exact count. This is especially true for medium to large data.

Table/index statistics that are kept up to date will give a good approximation.

HyperLogLog was one option mentioned which will give good-enough approximations, another approach is log-normal histograms. I don’t think this challenge is unique to PG.

Stephanie Bergamo • Nov 16 '19

Hey, many thanks for this article, I'm working with Postgres for a while now, but still, I've learned a lot of great stuff here !

I just wanted to add that Postgres also offers HyperLogLog as a data type, just by adding an extension (see this great article for more details.)

Alejandro • Nov 22 '19 • Edited

This article was definitive, it was the last straw. I have been thinking about switching my Java application to postgres from mysql and now I'm in the middle of the process using pgloader and everything has worked as expected.

Yet I have an unresolved question: does it make sense to have a 2nd level cache in Hibernate when postgresql already has caching? thanks!

Jason Skowronski • Nov 22 '19

Thanks I'm glad enjoyed it! The Hibernate second level cache lives on your application server, whereas the postgresql cache lives on the database of course. This matters because it's faster to retrieve or update data already stored on your application server. It reduces network and database load by removing duplicate queries and batching writes. This is great for applications with many reads and infrequent writes, or cases where eventual consistency on writes is acceptable.