What I learned from only using Select in PostgreSQL

jen chan on April 09, 2018

I was tasked to gather detailed metrics for a project at work and advised to only use select (as a newbie to Postgres and also querying on prod... [Read Full]
 

Couple tips :)

  • Views are just stored queries that get run when you invoke the view. The space and power consumption is negligible unless you create a materialized view which actually stores its results (and therefore has to be manually refreshed).
  • GROUP BY is only necessary if you're actually doing aggregate calculations. If you're just pulling individual records without counting, summing, or otherwise deriving a value from multiple rows, you don't even need to specify it. If you just have duplicate rows you want to filter out, you can SELECT DISTINCT instead.

More generally: relational databases are the Swiss Army knife of data storage. The real question is which use cases they aren't suited for -- and that's where you'll find some other model being used. RDBMSs are overkill for key:value storage, which is why Redis exists; they can't handle planet-scale flat data, which is why you have HBase, Cassandra, and so on; and they're traditionally terrible with hierarchical data, which is how we got MongoDB (although Postgres' JSON support is extremely powerful & lets you blend relational and document models quite effectively these days).

I've used RDBMSs in general for web and desktop applications across my career, and Postgres in particular mostly for web. I maintain a moderately popular data mapper for Postgres and Node, so I get some framework-level development in for fun too.

 

re: Materialized view. Maybe this is what my manager meant.
I could see if you had a huuuggggee database maybe it's not worth the time searching through everything to create a view.
They had said it wasn't worth creating a view unless I were to use it all the time or it would slow everything down.

I suppose I was compelled to use group by as I'm a visual person and seeing all similarities in a table is more satisfying and easy to comprehend than individual records.

Thank you for your comprehensive response by the way. I've not heard v much about Hbase or Cassandra.

 

Think of views as a way to cut down on copy+paste. There's no point in creating them for one-off queries, but if you have a few:

SELECT c.name, o.id, p.name
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_products op ON op.order_id = o.id
JOIN products p ON p.id = op.product_id;
SELECT p.id, p.name, COUNT(DISTINCT o.country) AS countries_shipped_to
FROM products p
JOIN order_products op ON op.product_id = p.id
JOIN orders o ON o.id = op.order_id
GROUP BY p.id, p.name
ORDER BY countries_shipped_to DESC;

Both of these queries use the same orders-order_products-products information. They do it in a different order since the first is starting from customers and the second from products itself, but it's the same data. Instead of joining those three tables, you could create a view with the query:

SELECT o.*, p.*
FROM orders o
JOIN order_products op ON op.order_id = o.id
JOIN products p ON p.id = op.product_id;

Then you can use the view in your queries and suddenly they're a good bit simpler -- the complexity of the order-to-product relationship is encapsulated in the view, so your queries don't have to manage it themselves.

 

The "relational" in a relational database is just talking about the relations inside a table, i.e. the fields which all belong to a table form a relationship. What you're talking about with regards to other tables are foreign keys.

I think that's something which is quite confusing to people coming to them for the first time!

The words used to describe Mongo always confuse me. I have to figure out what a Document is and what it would be like in a different system and then remind myself again that they're not really the same.

 

YES. You say that more accurately than I did. I would like to update my article with this definition. I was prone to mixing up foreign key names with primary keys. In my head the former exists as the label in a table; the latter is the name of something unique in a table, locally... and this confusion happens when the names are different. (i.e bread_id as a foreign key of a oven table vs bread.id in the original table... otherwise just id local to the bread table.)

Starting from mongo with as a scatterbrain: my logic was just shove something in something. And I had only used it on small projects so I think the grievance comes in having to type long queries to find something instead of syntactic SELECT ___ FROM ___ commands, which make much more sense now that I've used it.

 

Regarding joins, you will primarily use 2: left outer join & inner join. A distant third is outer join. (BTW, the only difference between left outer & right outer are more in the head of the query's author than in any physical sense. IMHO, anyone who writes from left-to-right will probably only ever use left outer join.)

And advice with aliases: Keep them short yet obvious. Too many people use single letters for aliases, which are OK, but can quickly lose meaning--especially between queries. Aiming for 3-5 letter aliases are still reasonable to type yet are still descriptive enough for you when you come back to a query after a long time or hand over the query to someone else. And most importantly, be consistent with your aliases! So for your employee table, don't use e one time & emp on another query. (Yes, it's allowed but it creates confusion later.)

code of conduct - report abuse