Database naming conventions, despite SQL having been around for decades, are ad hoc. When moving from project to project, you often have to learn their particular quirks about database design and naming. This is because most developers don't understand databases. In fact, with my years of hiring developers, only a handful of times have I met developers who can properly normalize a database, even amongst those with strong SQL skills.
The database, for many developers, is an afterthought. It's just some place to save your data until the important (*cough*) work is done.
But this article isn't about database normalization. If you want to learn more, here's a short talk I gave which explains the basics.
Instead, when you have a working database, the question we want to know is "what standards can we apply which make it easier to use that database?" If these standards were to be widely adopted, databases would be easier to use because you wouldn't have to learn and remember a new set of standards every time you work with a new database.
Let's get this out of the way quickly. I routinely see database examples online where we see table names like
customer_orders. Which should you use? You probably want to use whatever standard is already in place, but if you're creating a new database, I recommend
using_undercores for accessibility. The words "under value" have a different meaning from the word "undervalue", but the former, with underscores, is always
under_value, while the latter is
undervalue. With CamelCase, it's
UnderValue which, since SQL is case-insensitive, are identical. Thus, underscores help reduce ambiguity.
Underscores also improve accessibility. If you have vision problems and are constantly playing around with fonts and sizes to distinguish words, the underscores are much easier to read. And CamelCase is anecdotally harder to read for people for whom English isn't their first language.
That being said, this is a personal preference and not a strong recommendation. Many people love their CamelCase and this quickly becomes a "tabs versus spaces" debate (e.g., a waste of time).
Plural or Singular Tables?
There's long been a huge debate amongst experts in database theory about whether or not database tables should be singular (
customer) or plural (
customers). Without going into the theory, let me cut the Gordian Knot with a healthy dose of pragmatism: plural table names are less likely to conflict with reserved keywords.
Do you have users? SQL has a
user reserved word. Do you want a table of constraints?
constraint is a reserved word. Is
audit a reserved word but you want an
audit table? By simply using the plural form of nouns, most reserved words won't cause you grief when you're writing SQL. I've even tripped up PostgreSQL, which has an excellent SQL parser, when I've had a
user table and later, an ORM automatically aliased that to
user (yes, the same name) and PostgreSQL got confused.
Just use plural names and you're far less likely to have a collision.
Don't name the primary key column
This is a sin I've been guilty of for years. When working with a client in Paris, I had a DBA complain when I named my id columns
id. I, a mere developer, thought our DBA—who was a very good DBA—was just being pedantic. After all, the
customers.id column is unambiguous, but
customers.customer_id is repeating information.
And later I had to debug the following from the Tau Station MMORPG:
SELECT thread.* FROM email thread JOIN email selected ON selected.id = thread.id JOIN character recipient ON recipient.id = thread.recipient_id JOIN station_area sa ON sa.id = recipient.id JOIN station st ON st.id = sa.id JOIN star origin ON origin.id = thread.id JOIN star destination ON destination.id = st.id LEFT JOIN route ON ( route.from_id = origin.id AND route.to_id = destination.id ) WHERE selected.id = ? AND ( thread.sender_id = ? OR ( thread.recipient_id = ? AND ( origin.id = destination.id OR ( route.distance IS NOT NULL AND now() >= thread.datesent + ( route.distance * interval '30 seconds' ) )))) ORDER BY datesent ASC, thread.parent_id ASC
(As an aside, yes, I really am that careful about my SQL formatting)
Do you see the problem? If the SQL had used full id names, such as
station_id, the bugs would have stood out like a sore thumb while I was typing out this SQL, not later when I was trying to figure out both what I did wrong and why I don't drink as much as I should.
And by request of a few people who couldn't see the errors, here's the SQL after it's corrected. It's very clear that
station_area_id are probably not valid comparisons. If SQL had a decent type system, this SQL would not even have compiled.
SELECT thread.* FROM email thread JOIN email selected ON selected.email_id = thread.email_id JOIN character recipient ON recipient.character_id = thread.recipient_id -- station_area_id = character_id is probably wrong JOIN station_area sa ON sa.station_area_id = recipient.character_id -- station_id = station_area_id is probably wrong JOIN station st ON st.station_id = sa.station_area_id -- star_id = email_id is probably wrong JOIN star origin ON origin.star_id = thread.email_id JOIN star destination ON destination.star_id = st.star_id LEFT JOIN route ON ( route.from_id = origin.star_id AND route.to_id = destination.star_id ) WHERE selected.email_id = ? AND ( thread.sender_id = ? OR ( thread.recipient_id = ? AND ( origin.star_id = destination.star_id OR ( route.distance IS NOT NULL AND now() >= thread.datesent + ( route.distance * interval '30 seconds' ) )))) ORDER BY datesent ASC, thread.parent_id ASC
Do yourself a favor and use full names for IDs. You can thank me later. Your DBAs will thank you now.
As much as possible, name columns very descriptively. For example, a
temperature column doesn't make sense for this:
SELECT name, 'too cold' FROM areas WHERE temperature < 32;
I live in France and for anyone here, 32 would be "too hot". Instead, name that column
SELECT name, 'too cold' FROM areas WHERE fahrenheit < 32;
Now it's unambiguous.
Also, when you have foreign key constraints, you should name the columns on each side of the constraint identically, if possible. For example, consider this perfectly reasonable, sane, SQL.
SELECT * FROM some_table s JOIN some_other_table o ON o.owner = s.person_id;
That looks fine. There's really nothing wrong with it. But when you consult the table definition, you discover that
some_other_table.owner has a foreign key constraint against
companies.company_id. That SQL is, in fact, wrong. Had you used identical names:
SELECT * FROM some_table s JOIN some_other_table o ON o.company_id = s.person_id;
Again, we've avoided ambiguity. It's immediately clear that we have a bug and you can see it on a single line of code and don't have to go consult the table definition.
However, it should be noted that this isn't always possible. If you have a table with a source warehouse and a destination warehouse, you might want a
source_id and a
destination_id to compare with your
warehouse_id. Naming them
destination_warehouse_id will make this easier to follow.
It should also be noted that in the example above,
owner is more descriptive of the intent than
company_id. If you feel this is likely to cause confusion, you can name the column
owning_company_id. That can still embed the meaning of the column in the name while giving you a strong hint as to its intent.
Naming standards are important because it helps keep your code consistent and predictable. Anything violating that predictability tends to be suspect. However, well-chosen naming standards can make it easier to understand your code and, if done well, can often make it possible to see a error in a single line of code.
- Table names should be plural
- Spell out id fields (
- Don't use ambiguous column names
- When possible, name foreign key columns the same as the columns they refer to
While not perfect, the above database naming conventions will make your database world a better place.
Top comments (19)
Srry to be a pain, but the part with the primary key nit being id but tablename_id ... Well in school and a lot of books the teory sais its ok id,
Maybe what you presented was a local situation, but i don't see the ambiguity when you write the query, cuz you use the alias.id ... And tablename_id normally is a foreign key in other tables.
But as i sayed i think its a matter of situations.
I for one am with team just id as a primary key and tablebame_id as a foreign key when used in other tables.
When I first published this article a long time ago, people complained that they needed to see the second version of the
idquery because they didn't see what was wrong with the first version. After all,
origin.id = thread.idlooks fine.
In the second version, because we use explicit names for the ids, you can immediately see that
origin.star_id = thread.email_idis wrong. That's the point of this: if your naming conventions let you immediately see that a mistake is made, that's a huge win.
I've been a SQL developer for 30 years. And this is SO wrong.... Table.id is the PK (or at least the unique constraint) on this table. Table.other_id is the Foreign key to the related table. Origin.id = email.id immediately stands out as being incorrect because at least one side should reference the FK field unless one of them is a straight 1:1 extension table.
Table names should be singular as each row represents a single occurrence of the modelled entity, for the same reason your class names are singular. If you are going to conflict with reserved words, pick better table names: AppUser, BillingConstraint, ShippingTable.
I personally don't like underscored table names it makes code scanning harder because fields do tend to be underscored as are system configurations.
Right at the top you said SQL databases are case insensitive. This depends on the collation settings for the instance, database and table. Case sensitive collations are a nightmare!
While I've laid out my reasoning and you should feel free to disagree. I point out my id naming reasons, you don't like them, that's fine.
As for table names being plural, as mentioned, many experts disagree strongly on this. Joe Celko, world-famous database expert and one of the authors of the SQL-89 and SQL-92 standards, argues in SQL for Smarties, that table names should be plural because they represent collections, not individual entities.
As for case-insensitivity, section 5.1 of the SQL-92 standard states (where the angle-bracketed words refer to a previously defined grammar):
SQL the language (not the data it manipulates) has been case-insensitive for decades.
I'll grant that there is no one perfect answer but I felt it was important to put across a different set of arguments. The most important thing is to be consistent with whatever naming convention you decide to use. The ANSI standard SQL-92 - The clue is in the title. That standard will be 30 years old next year, the world has moved on. Each vendor has proprietary extensions to the standard, my preferred flavour is MSSQL which allows you to define the collation rules for your database which includes whether the database and the data it contains is case sensitive, whether it uses the standard ASCIII character set or allows for 'foreign' characters such as Cyrillic, turkish or accented characters, whether Upper and lower case characters should be treated differently and whether NULL appears at the start or end of the index. This may not be standard, but it is 'real world'.
Please don't quote 'World Renowned' Joe Celko: I have had enough run-ins with him on SQLLServerCentral over the years to know that he is an absolutist and simply cannot accept that data is impure and that applications and their databases are not always (usually) developed by people with Doctorates in data management and who have the full picture of what the end solution will need to look like in 10 years time when the business needs are not well defined and you don't have an infinite amount of time. Joe has some good points and his books are comprehensive but on a personality basis he is way of the mark with understanding how to help people sove the problems they find themselves in. His attitude is - well you shouldn't have gotten there in the first place...
I know I am late to the party, but I am going to jump in here. I understand that it is common practice to name the PK, ID. But I have always disagreed with this. Reason being I don't like to spend time researching stuff that should be plainly obvious by reading the code. Here is an example:
WHERE tblUser.ID = tblCar.ID
Obviously, there is a problem here, we are looking at two PK fields, which shouldn't happen except in special cases. But which one is wrong? Unless I coded this, I can't 100% know which side is wrong. The policy may be that the PK is always on the left side, but who is to say that Joe Blow actually did that. We already know he messed up the criteria, so maybe when he messed up he did it all the way wrong and put the actual key he wanted to use on the wrong side. The bottom line is I can't actually know that without doing research on the table designs. This is a complete waste of time. Given that we know the intent, what is supposed to be accomplished here, helps. It may not give us the full picture. and for giggles, lets ay the intent is to see which mechanic is assigned to the car. Now we have a little light, but it is still a mess, Perhaps this is supposed to be the PK tblCar.ID to tblUser.CarID. After all, one mechanic can be assigned to fix many cars in one day. Or maybe, it is supposed to be Pk tblUser.ID to tblCar.UserID, after all, many mechanics can be assigned to fix a car if it is tough job. See my point? These situations are rare, but as programmers shouldn't we prepare of all issues as opposed to ommitting the rare cases?
WHERE tblUser.UserID = tblCar.ID
This is clear, this states plainly that the userID field is the PK field. If our policy is to always name Key fields, Pk and Fk respectively, we know the issue is in the design of the tblCar as there is missing a name for the key field. Perhaps this field is a PK, or maybe someone didn't name the FK. The result though, is we know EXACTLY where to find the issue from just reading the code. In this case, we open the design of tblCar and name this field appropriately. If it turns out this field is the wrong field for our criteria, then good for us, we just fixed two issues by finding one.
OK. Lets unpack your comment as the arguments are common and as always - Make a decision and stick to it.
Firstly - don't name for tables tblUser and tblCar. table is the default object in the database. Yes User is probably a reserved word so find a better table name (like systemUser or employee). Also when you need to 'replace' the table with a view, for example if the fields get split into two separeate tables, to prevent changing ALL your code you will now need to create the view (or a synonym) called tblUser which is doubly confusing because it now isn't a table.....
A.ID = B.ID is patently wrong in the vast majority of cases. It would only really be valid if you have aliased a field as part of a view or CTE or table B is an extension table to A and is a 1:1 relationship so it is an instant code-smell and that makes it easy for a seasoned developer to spot (so it is a good thinig...) Yes you have to interrogate the code to find the intention but you are doing that anyway.
In you example you suggested that the User table in this context is the mechanic. If this was the case then I would be aliasing the tblUser and (tbl)Mechanic (I wouldn't use the tbl - but each to their own) and then presumably the tblCar should have a field MechanicID or MechanicUserID. Under your proposal the JOIN would look like
tblMechanic.UserID = Car.MechanicID
Under my proposal it would be
tblMechanic.ID = Car.MechanicID Which to me is much clearer and confirms that we are joining a PK to an FK. Your way requires more intimate knowlege of the database to be sure that UserID is the PK on the aliased table.
This becomes especially important if the tblUser is going to be used multiple times in the query (Mechanic, Salesman, Valet, Delivery driver, Service receptionist etc).
That is a many to many relationship and therefore should be modelled appropriately in the database with a bridging table (e.g. ServiceTeam)
ID INT IDENTITY(1,1), /*yes I know....*/
Car_ID INT NOT NULL, /*FK to Car*/
Mechanic_ID NOT NULL /*FK to User in the context of the role of Mechanic*/
(alternately as the same person may take on many roles...)
Role_ID /*FK to job role to identify Mechanic, Valet, Driver, Receptionist*/
User_ID /**FK to User table./
Feel free to respond as these discussions help junior developer to avoid pitfalls later on in the project.
Your example is already wrong, wrong wrong. origin.id = thread.id you should immediately know there is something wrong as it both points to primary keys in different tables, so you do know something is wrong already.
And with origin.star_id = thread.email_id, yeah you immediately see something is wrong, but it's wrong due to 2 things (at least if we're going from the notion at least one of the two is the primary key), so it should be origin.star_id = thread.star_id or origin.email_id = thread.email_id or origin.origin_id = thread.origin_id or origin.id = thread.origin_id (and so on).
Sorry about this but, from this i see that the problem us more of ambiguous aliases than column names.
But as i saud is a matter of opinion and project.
I really like this article, I've used pretty much the same conventions, mostly because this is what it was taught to me, and would say some things are common sense, that common sense that sometimes we miss when working on a project(s). (I'm guilty of the
idusage instead of the descriptive one but changed it on a later project, and it definitely helps)
At some point, I was feeling that naming columns shouldn't be wordy but realized it's better to be descriptive and simple if possible than trying to oversimplify things unnecessarily.
In programming, we often suggest that long variable names be used for variables that have a wider scope. It's important that those be descriptive. It's even more important, in my opinion, for databases, because many people will be staring at those names over and over and over again. They tend to persist even longer than that library I wrote last week. Since databases are often the foundation of apps, I try hard to put more effort into making them as understandable as possible.
My god, it seems that every single point you set as THE naming convention is exactly opposite as what I think is a good naming convention.
*Table names should be plural
Nop, use singular as every record is a single object, not multiple, and the more likely to be the same as a keyword is bullocks ofcourse, as it would also assert to fieldnames
*Spell out id fields (item_id instead of id)
This is certainly a topic that many people disagree on, many people think every table should have an Id field and only the reference from a foreign table should have the Id. Especially if you want to make things more generic. And ofcourse there are also a lot of people who think you should have the fullname in both tables.
Personally at the moment I opt for the Id variant as it makes a lot much easier.
*Don't use ambiguous column names
Ok, I give you this, but to me this is more due to logic then anything else.
"While not perfect, the above database naming conventions will make your database world a better place"
Well, that's all in the eye of the beholder, as many people have different idea's about what makes it a better place.
"Use underscore_names instead of CamelCase Nope, Use CamelCase as it reads more fluent and conforms better to the actual code it's being used."
I guess if the only language you use is C# then this makes sense. Otherwise why are you tying naming to the programming language in use? It should be totally agnostic. This is just bad practice in the first place.
It's got nothing to do with the language I'm using, I'm using even multiple languages. But it's got all to do with what I think reads better and looks more clean, and even less typing. everything lowercase and using underscores is like writing ancient english to me, it's just ugly.
And calling lowercase/underscore best practice, is all in the eye of the beholder. Most people I work with use the CamelCasing with the database designs now.
In reality, in these cases there is no real best practice, it's just what you prefer.
"conforms better to the actual code being used" directly implies the language you're using because it's not true in all cases (that it conforms) by a long shot.
I've never in my life seen that kind of casing in database design, like ever. Not sure who you're working with or where but yeah, I think this ain't the hill to die on.
Not sure where the ideas some of the commenters have regarding singular naming and camel case are coming from, but I strongly STRONGLY disagree. Plural table names imply that you're storing a collection of objects. Singular implies the opposite. Can you imagine if in code we called a collection "user" instead of "users"?
As for PascalCase or camelCase naming...this is just ugly.
The argument that it better represents what folks are coding for falls apart easy. Java uses camelCase, as does OO PHP. C, Python, and Ruby use snake_case. C# uses PascalCase. Go uses both camelCase and PascalCase depending on what you're doing. All those considered, when it comes to database columns, underscores make the most sense when we really need to focus on ultimate readability of the name in a data context. (Don't apply this to the programming language being used!!!). Heck most ORMs will convert this automatically in the first place. If anything, using the _ convention, as it's so common, is only going to help coding efforts. Take Laravel's Eloquent ORM for example It already knows that if you have underscores that it can turn those in to camelCase properties when you access the generated object. It's just such a bad argument that you tend to see from crotchety old DBAs that were working since before best practices were even an idea in tech.
I'm 100% with this article. The only disagreement I have is on the naming of the id column. This isn't one I've come across before, but here is why I disagree: It actually increases ambiguity rather than decreases it. If I see a tablename_id pattern, I'm going to assume a 1:1 relationship, not a PK. You could have a self referential column which, while I can't stand the idea, happens in legacy databases or in newer databases created by novice engineers. I get what you're saying, but I think this is where aliases really shine.
Full disclosure: I'm not a DBA, just an engineer that has to work with them. I can tell you right now which I'd prefer, as would all the engineers I have ever worked with.
FYI, first link is dead
Thank you, Tib. Fixed!
very helpful! 👍🏻