- Index nearly everything, index early. Not indexing results in poor performances. Indexing too late can create downtime or inconsistent data. Do not consider removing/avoiding index to preserve disk space, buy more space instead. Remember that index may have an impact on write performance though.
PostgreSQL is your best choice, it has a skilled community, including StackOverflow heroes. There is no MySQL, MariaDB, or SQLite user group where I live.
ORM are great for simple CRUD operation. Don't use it to query specific stats or any complex query, use raw queries. BTW, for TS/JS devs, Prisma 3 is cool, Sequelize is obsolete.
SQL syntax is case insensitive, use lower case everywhere (and underscores), do not name your tables with camelCase. ALSO, YOU DON'T HAVE TO YELL.
select id from users
is the same asSELECT id FROM users
. For string comparison, case sensitivity depends on collation.Never miss a foreign key. Implicit links and relations between tables should be avoided.
Follow a pattern when naming tables and columns. Example:
users
,posts
,comments
(plural).post_user
,comment_post
,comment_user
for junction tables (singular + alphabetical order).id
for primary key,id_user
,id_comment
, andid_post
(singular + id first) for foreign key. Reading a database schema is hard for newcomers. Be super consistent when naming tables.1 model ≠ 1 table. Unfortunately, when designing a database you have to think at a lower level than "models" and "objects". You will have to deal with technical tables.
Use the right data type. Do not create "custom" data types defined by an implicit contract.
JSONB is great but don't overuse it. Also, learn to query it it's complex yet powerful. When you iterate a lot on designing a schema, you can use JSONB to avoid modifying the schema over and over.
PostgreSQL does not scale automatically. Expect migrations and downtime.
CTE is more readable than subqueries, you can read it from the top to the bottom. Also, you can understand the way it was designed.
Keep tables schema simple, avoid patterns such as EAV that makes tables hard to understand and read; try to keep most tables as simple as vertical CSV sheets, with one line for each record.
Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient.
The order of the words in a query is not the order in which they are evaluated. Know the order of execution: from, where, group by, having, order by, select, limit.
120 tables are way too much, even for big products. You can build a robust application with 25 tables.
Thank you for reading 🙏
Top comments (31)
Don't go nuts with this. Too many indexes can slow down writes.
Also:
Don't forget you can cast certain types to others with a double colon (
::
). Ex:Understand that sometimes anti-join patterns using EXISTS or NOT EXISTS with a subquery can sometime execute faster that a LEFT/RIGHT JOIN where col is null.
-- 40-year DB developer
Wow! Thank you so much for your answer! Your comment is actually a gold mine and contains original content and really useful thoughts! It should be an actual article by itself.
I agree with all points. The last point in particular saved my dev life more than once: it could be considered a detail but actually, it is a game-changer! (and it's not well-known)
Let me know if you write an article with your contents, I will be glad to share it (or review it before publication if you want)! It could be a "SQL tips for
beginnersintermediate" :DThank you!
Being almost a newbie in SQL, I don’t understand the subtility here… can you tell a bit more what you mean ?
Thank you for your feedback! The most obvious case is junction tables (n..m relationship), i.e:
users
have ngroups
,groups
contain musers
, then you need agroup_user
table (a.k.a a junction table). This kind of table represents a relation between two models, not a model itself.Aah. This threw me off thinking you mean one model ought not to be constrained in 1 table. What you really meant is every table isn't necessarily a representation of a model.
In regards to technical tables when using models with an ORM, do you mean helper tables? Like if there could be a relationship between 2 entities of the same model (i.e. a forum or chat app, 2 users having a friend relationship). Or perhaps other tables that contain additional details related to a particular model?
If not, could you elaborate on that a bit, or suggest a source I should look into myself?
A good read. I'm currently working on a project where I am creating models and have implemented one of the examples I mentioned above (a friends helper table with a primary key made up of the foreign keys from each user, and a date created field to represent friends in a chat app).
I am constantly amazed by how much I have learned, in contrast with how much I'll never be able to learn. There's just so many facets to development and related disciplines.
Thank you for your feedback! 🙏
That's it, I mean both!
The most obvious case is pivot tables (n..m relationship), i.e:
users
have ngroups
,groups
contain musers
, then you need agroup_user
table (a.k.a a pivot table). This kind of table represents a relation between two models, not a model itself.There can also be 1..1 tables (a subset of properties of a model split in another table). Or shared properties between two models that does not need to be represented themselves (maybe
addresses
forusers
andcompanies
, when address is not considered a model in application, just a property).Thanks for sharing! That's dizzy and that's what I love in dev! I still consider I'm a beginner (let's say intermediate-beginner) in SQL and I'm always amazed by thing I discover via other peoples' work.
Thank you for the post. I need to go and have another look at ctes.
Thank you for your feedback. Yes, CTE is a game changer in writing queries. I sometimes have 4 or 5 parts broken down before the actual query, and it's still easy to read. Side note, I worked with a junior colleague who easily wrote super complex query in a readable way thanks to this syntax.
I feel like the example is documentation is good enough to highlight the benefits: postgresql.org/docs/current/querie...
Also, this course helped me: masterywithsql.com/ (still it's not free)
Thanks for mentioning it. Before i had the false assumption, that CTES are only usefull if i wanted to do a recursive query.
This could be just a personal quirk but I'd really prefer SQL keywords and functions to be uppercase, while table and column names as lowercase as that help me to easily distinguish the boilerplate from actual data
I always used spacing for that. I find that it's easier (for me) to keep case changes to a minimum.
I prefer writing statements in this fashion:
This way I always have my actions on the left and my columns, values, expressions on the right.
But that's just me, possibly. I find it neat and clean and easier to focus on instead of case changes confusing my attention.
Great article and I agree with most of your points.
SQL syntax is case insensitive
This depends on Collation. Some are case sensitive also can also affect where nulls appear in an index
+1 for choose a naming convention and stick to it. I prefer singular table names and table.id for primary key and table.othertable_id for foreign key but I am database centric. If you use an ORM then puralusied table names (to match the model) and table.table_id makes auto-linking easier
15 tables for an application? Depends on the app. Make sure you follow 3NF and use as many tables as necessary to model the business problems. Yes avoid EAV tables but also avoid sparsely populated columns. Use a core table and extension tables (1:0..1 mapping) for additional data that is not required in many cases. E.g. main table is product. Product_wine has same key and strength, terroire vineyard, releaseDate, dryness, ABV. Product_book has same key and author, pageCount, publisher
Thank you for your detailed and useful answer!!
Most of my points are opinions, still I should have been less assertive and mention this. I will edit my article thanks to your comment for things that are not clear or should be corrected. About the "case insensitive syntax" I wanted to talk about
select
,Select
,SELECT
and tables names, not text comparison. As far as I know, collation has only impact for strings not the syntax of the query itself, but maybe I missed something.yes key words are case insensitive. As for shouty-case I am OCD about code layout and for me readablity is improved by keywords and built in functions being UPPERCASE along with proper indentation of fields in the SELECT and tables in the WHERE...JOIN clauses.
That's a valid point, thank you for sharing! I was not aware of this issue (people sometimes ignore issues when they are not concerned), this should be considered indeed! Out of curiosity, how do you deal with language that does not encourage shouting? Is syntax highlighting helping in some way?
Again you pointed out an opinion: I feel like it's harder to READ TEXT IN UPPERCASE, but it may be only a matter of taste and personal comfort. I was ignoring real issues.
I should dig into why in the first place uppercase was used for SQL. There should be a reason!
Again thanks you for your feedback!
SQL is a 'very' old language (early '70s if I'm not mistaken). Terminals at the time only used capital letters. The manuals reflected that and it became a historical norm. It was never a choice nor mandatory to have them capitalized.
Thank you so much for the sql tips specifically on the PostgreSQL ones. I recently did some research and wrote an article to compare PostgreSQL and MySQL syntax. Sharing here to further improve knowledge.
tipseason.com/postgres-vs-mysql-sy...
Hi @tipseason! Thank you for your feedback and the link to the article! (bookmarked ✨)
Thanks Raphael.
Let me correct some of these,
Thank you for your feedback! (for first point, I guess case is also a matter of opinions: dev.to/redhap/comment/203hn)
Yes it is. Everyone have own style of formatting and preferences :)
Agree with all points with a small note on "Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient.": in my experience, having small one-thing-per-query SELECTs in the code leads to much more robust performance at scale as caching is more explicit and leads to less stress on DB overall. My 2 cents.
Thank you for your feedback!
We may have a different use case! Maybe my use case is more for things like specific stats (let say when a kind of a big introspection in database is required to only display a chart by month for the last 6 months). I agree with you for other cases: sometimes it's better to write a first query, then pass the resulting ids to a second query.
I have to say that I don't agree with many of your points.
Indexing is a subtle operation that needs understanding of what and why one does it. The space and write performance costs are important enough to invalidate your assumption on indexing everything.
I would rather invite database designers to separate the tables that require reading from those that are heavily written in and construct some mechanism for synchronizing them. Indexing for reading is very different from indexing for writing and table structure is very different for reading and writing as well.
Also... PostgreSQL being the best. What trolling :) I actually loved PostgreSQL 20 years ago, but then I've worked on SQL Server almost exclusively. It's amazing how much stuff they have added every year to it. I expect most other engines have had a similar spectacular growth. Again, here I would say that the engine should fit your requirements.
About SQL being case insensitive in keywords or the styling... I find that irrelevant. There are tools that can change the SQL code style automatically. Basically you can write whatever you want and then shape it to fit... your company's code style. I know people who come into a company where everyone writes in one way and they stubbornly write in another way because "this is the way". They are just annoying, not right.
I do agree with ORMs being a shortcut to simple things and when you want something done right or specific you need to "get your hands dirty".
I personally would go even so far as to say ORMs are harmful. Because even if one follows coding guidelines and abstract the DAL correctly, maintaining code that handles data in multiple ways is difficult and messy. If ORMs are good only for simple things, that means that SQL queries for the same effect would be simple. Therefore there is no need to use one.
Thanks for sharing your viewpoint and inviting a discussion on them.
Thank you for your detailled feedback! I'm glad you don't agree with most of my points since it helps me reviewing my thoughts, so thank you!
As a disclaimer, this article has (at least) two big flaws:
Here is a try to answer all your points!
You are right! This is the most posted comment I received both here and on reddit, so I guess I'm wrong. I wrote this because beginners (in my experience) often forget to index things, so it was something like "hey, don't forget to index obvious things you query all the day".
Yes, sorry about that. I should have write: "as a beginner who don't know what SQL DBMS to choose, PostgreSQL could be a good candidate".
I have no strong opinion on this. When I use Datagrip, and when columns and table names have upper case, it does not auto-correct, so it's annoying. Your point is valid: a new dev that would come and says "let's change all our habits because mine are better" would be ... uh... annoying I guess!
That's an interesting point!
Again, thank you for your feedback, I hope readers will see it too. BTW, I recommand the feedback by @redhap : dev.to/redhap/comment/203hk which is like yours: a deeper questioning about SQL querying than mine!
Thanks for writing this post. I agree with almost all of the statements, except 3rd one. Could you explain in more detail, why using an ORM are not recommended in big projects? In my opinion, it helps to save code clean and readable, even helps to reuse redundant code compared to raw SQL queries.
It's not for "big projects", it's for complex queries. It's hard to write complex query with query builder, and an optimized query is really sometimes 1K more rapid than 4 queries made to the ORM via its objects or a weird query built by the ORM via its query builder. You should still use your ORM on big projects, but for simple CRUD operations:
Users.findById(123)
is fine!Great article, agree on many things!
Regarding indexing I would be at bit wary though when to add them. Most DB's has queryplans and such to determine when to use an index, guess it is based on rowcount and cardinality and other things.
Also composite indexes are worth mentioning,
i.e an index of (columnA, columnB) might be used for
WHERE columnA = value AND columnB = value
and also:
WHERE columnA = value
Thank you for your feedback.
I guess you are right about index (you are not alone, many comments are similar, here and on reddit). Thank you for the precision about composite indexes!