DEV Community

Cover image for A Grammar-Based Naming Convention
Basti Ortiz
Basti Ortiz

Posted on

A Grammar-Based Naming Convention

I recently read an article from @rachelsoderberg about what it means to write good variable names. In her article, she discusses the many strategies and considerations involved in writing descriptive variable names.

It was definitely a great read, but once I had finished, I suddenly realized how truly difficult it is to accomplish this seemingly mundane task in programming. As programmers, we frequently struggle to name our variables, not because it is inherently difficult per se, but because we have to ensure that the names we choose are clear, descriptive, and maintainable enough for the next person reading our code (which may or may not be ourselves).

To save myself some time and effort, I use a simple naming convention for all my projects. Today, I wish to share it to the community so that we can all spend less time thinking of variable names.

NOTE: The code examples I use in the article are written in JavaScript, but they apply to any programming language since they're just naming conventions after all.

Basic Rules

All variables, functions, parameters, and identifiers are written in camelCase unless you're a Rustacean. Constants are written in SCREAMING_CASE. It is important to make this distinction so that we can tell which variables are immutable and read-only by nature and design.

In programming languages where immutable variables are strongly encouraged (or even enforced), we have to make the distinction between immutable variables and true constants.

Any static value that does not depend on runtime variabilities (such as user input and other dynamic values) can be classified as a true constant. For example, the value of PI is considered to be a true constant, therefore it has to be written in SCREAMING_CASE. Otherwise, camelCase is used to denote mutable and immutable variables that store temporaries, aliases, calculations, and the output of a runtime variability.



// Immutable Variables
const userInput = document.getElementsByTagName('input')[0].value;
const hasDevto = /dev\.to/g.test(userInput);

// True Constants
const WEBSITE_NAME = 'dev.to';
const TAU = 2 * Math.PI;


Enter fullscreen mode Exit fullscreen mode

It should be noted, though, that context matters. The criteria for the distinction between immutable variables and true constants can change depending on the situation. For example, one may use SCREAMING_CASE for userInput if they were to treat it as a static value throughout the entire program (even if it may vary per runtime on different devices). At the end of the day, it is up to us as programmers to discern which variables we wish to communicate as immutable variables or true constants.

Semantic Data Types

Data types communicate what methods and operations can be performed on some variable. It is thus in our best interest to name our variables with a type system in mind, especially for weakly typed languages. Doing so will help us imply what data type a variable may have and its respective methods, properties, and operations. In turn, this leads to more readable code.

Numbers, Strings, and Objects

In most cases, numbers, strings, and individual objects are named with the most appropriate singular noun.



const usernameInputField = document.getElementById('username-field');
const username = nameInputField.value;
const hypotenuse = Math.sqrt(a**2 + b**2);
const profileData = {
  name: 'Presto',
  type: 'Dog'
};


Enter fullscreen mode Exit fullscreen mode

Booleans

The names for booleans are usually in the form of a yes-or-no question, as if we are personally asking the boolean variable itself about its state.



// Yes-or-no questions
const isDog = true;
const hasJavaScriptEnabled = false;
const canSupportSafari = false;
const isAdmin = false;
const hasPremium = true;

// Functions or methods that return booleans
// are also named in a similar fashion
function isOdd(num) { return Boolean(num % 2); }


Enter fullscreen mode Exit fullscreen mode

Arrays and Collections

Arrays and other collection-like data structures (such as Map and Set) are named with the most appropriate plural noun in camelCase. If the plural and singular form of the noun seem too similar, we can substitute the plural form for an appropriate collective noun. That way, the corresponding singular form of these nouns can be used as variable names during iteration.



// We use plural or collective nouns for arrays.
const dogs = [ 'Presto', 'Lucky', 'Sparkles' ];

// We can use the singular form of the
// variable name of the array
// in callback functions.
dogs.forEach(dog => console.log(dog));

// We can also use it in `for...of` loops.
for (const dog of dogs)
  console.log(dog);

// Here, we can use collective nouns
// for better readability.
const herdOfCows = [ 'Bessie', 'Bertha', 'Boris' ];
herdOfCows.forEach(cow => console.log(cow));
for (const cow of herdOfCows)
  console.log(cow);


Enter fullscreen mode Exit fullscreen mode

Functions

Functions are written with the intent to associate them with actions. This is why they are usually named as a combination of two parts: a transitive verb and a direct object. In other words, the names for functions are usually in the form of verb + noun. This communicates to us that the name is a command, or rather a function, that we can call whenever we want.



function getSum(a, b) { return a + b; }
function findBanana(str) { return str.indexOf('banana'); }
function getAverage(numbers) {
  const total = numbers.reduce((prev, curr) => prev + curr);
  return total / numbers.length;
}


Enter fullscreen mode Exit fullscreen mode

PowerShell, the Windows-equivalent of Bash in Linux, is a great example of a language that enforces this naming convention for functions (or cmdlets as one would call it in the language).

The script below calculates the total memory allocated for all currently running Chrome processes. The syntax is not the friendliest, but PowerShell's enforcement of the verb + noun convention for its cmdlets is evident. The example below only makes use of the Get-Process, Where-Object, and Measure-Object cmdlets, but rest assured, the naming convention is followed by the other cmdlets provided by PowerShell. This site lists them all out for reference.



# Get all processes currently running
$processes = Get-Process;

# Filter to retrive all Chrome processes
$chromeProcesses = $processes | Where-Object { $_.ProcessName -eq 'chrome' }

# Sum up all of the memory collectively
# allocated for the Chrome processes
$memoryUsage = $chromeProcesses | Measure-Object WorkingSet64 -Sum;

# Log the result to the console
"{0:F2} MB used by Chrome processes." -f ($memoryUsage.Sum / 1mb);


Enter fullscreen mode Exit fullscreen mode

Classes

Classes are named with an appropriate proper noun in PascalCase. This communicates to us that the variable is not just like any other variable in our program that follows the camelCase naming convention; rather, it is a special variable that stores a user-defined type with special properties and methods.



class User { }
class Admin extends User { }
class Moderator extends Admin { }
class Player extends User { }


Enter fullscreen mode Exit fullscreen mode

Class Fields and Methods

Class fields are named according to the immutability and data type conventions discussed earlier.

On the other hand, class methods are named in a similar fashion to functions. They still use the verb + noun convention, but in some cases, they can get away with omitting the direct object (noun) part of its name. The performer of the transitive verb (action) is thus implied to be the object instance of the class that owns said object method.



// Class
class Player {
  constructor(name) {
    // String
    this.username = name;

    // Number
    this.level = 100;

    // Boolean
    this.isAdmin = false;

    // Array
    this.weapons = [
      'bow',
      'sword',
      'spear'
    ];
  }

  // Class Method (with noun)
  initiateBattle() { }

  // Class Method (without noun)
  attack() { }
}


Enter fullscreen mode Exit fullscreen mode

To wrap it all up...



const TRUE_CONSTANT = Math.PI;
const stringName = '';
const numberName = 0;
const isBooleanName = true;
const objName = { };
const arrayNames = [ ].map(name => name);
function getFunctionName() { }
class ClassName { }


Enter fullscreen mode Exit fullscreen mode

The code snippet above succinctly summarizes my entire naming convention. It is quite apparent that the grammar rules and semantics of the English language have greatly influenced this convention. Embracing and somehow relating them to programming have made the act of naming variables and implying their data types more intuitive than ever.

If we wanted to, we could simply prefix all of our variables with an abbreviation of its data type—similar to how one would use an adjective to describe a noun—but in doing so, the variable names would become undesirably verbose, as illustrated by the example below. We'd be better off using TypeScript for explicit type annotations.



// This is... eww. ❌
const NUM_TAU = 2 * Math.PI;
const str_Username = 'Some Dood';
const num_Hypotenuse = Math.sqrt(num_A**2 + num_B**2);
const boo_AdminStatus = false;
const obj_ProfileData = { };
const arr_Articles = [ ];
function fun_GetUser() { }
class Cls_Class { }


Enter fullscreen mode Exit fullscreen mode

Case in point, naming variables is one of the most frustrating aspects of programming next to debugging. Following a grammar-based naming convention will certainly make it less unpleasant. Thanks to the linguistic origins of this naming convention, we will be able to write more intuitive and readable code by default, which is always a good thing. Of course, we still have to ensure that the design and architecture of our software is not inherently "bad", but at least we can rest assured that our code is indeed intuitive and readable for whoever may work on it next.

A programming language is called a language for a reason...

Top comments (28)

Collapse
 
jerome4026 profile image
jerome4026

Thanks for this nice article, I definitely agree with the principles exposed.

For the section Arrays and Collections, I like the idea of using plural, because very often we want to loop over a list or a vector and we want to differentiate a single value from the vector.

However, this does not work for some names, such as some acronyms (bers, which is not clear at all (BER stands for bit error rate)), or Greek letters (psis, which may not be that clear).
Do you have any suggestions for these other cases?

I thought about using a suffix, such as "Values" (I like it because it's quite generic: berValues, psiValues), or "Vec" (I work in MATLAB, so we talk more of vectors than list: berVec, psiVec), or "Array" (berArray, psiArray), or "List" (berList, psiList).

I prefer values because I find that it fits with the English language (I write berValues like I would say "the BER values" in a speech, while I would not say "the BER list"), and because I find that the others can lead to confusion since sometimes you could have a list with the suffix Vec/Array/List, and sometimes not, so it looks a little bit inconsistent.

I am open to comments and suggestions. Thanks.

Collapse
 
somedood profile image
Basti Ortiz

I really like the -values suffix! I certainly wouldn't mind seeing it in place of the awkward -s suffix for some words. Thanks for the suggestion!

Collapse
 
magwas profile image
Árpád Magosányi • Edited

Great article!
As a non-native speaker I am struggling with defining naming conventions based on grammar-based approach. We have a rather elaborate implementation pattern, with around 15 different types of units (like DTO, Entity, Service, Test, Test Base, Test Data, etc). In my native language (Hungarian) there is a clear distinction between not just roles of a word in a sentence, but types of word: in most cases if you look at a word out of context you can tell its type based the agglutinations it wears. (This is because the order of words in our language is loose: we do not order them based on their role in the sentence (that is apparent from their type), but based on what to be emphasized, and what is new information).
For example I know that I want names like "RegistrationService", but I have problems from that point on. In Hungarian I would use words for "Registration" which start from a verb as a stem: "regisztrál" = "register", denoting an activity. After that I would add a modifier making it a noun, and my understanding is that English stops somewhere here: "regisztrálás" = "registration", which refers to the process, and "regisztráció" = "registration", which refers to the outcome of the activity. But in Hungarian I would also add a modifier to make it an adjective, which as I understand is implicit or different or ambigous in English: "regisztrációs" ~= "registrational"; something which has to do with the process of registration, or "regisztrálási" ~= "registrational"(?); something which has to do with the outcome of the activity of registration.
Maybe my first problem here is to understand whether "registration" here is even a noun or an adjective: for me it looks like a noun, but its place in the sentence (as "the thing giving the quality of being concerned with the activity of registration to the (noun) service") begs for an adjective in my limited (and influenced by my native language) understanding of these things.
The other part is that I would like to emphasize the "have to do with the activity" over the "have to do with the outcome of the activity" here. Which might be just too much to ask for a naming convention in English?

How would you phrase the naming convention for a Service in a grammar based approach?

(Now you might understand why Hungarians are so weird, and how this horrible idea of Hungarian notation came up: it is in our language :) )

Collapse
 
stepanstulov profile image
Stepan Stulov • Edited

It's a noun that has an adjective function. It's called "Noun Adjunct". And modern English, especially American (which means IT English), favors noun adjuncts over real adjectives, even when those adjectives are available, more and more often.

en.wikipedia.org/wiki/Noun_adjunct
english.stackexchange.com/question...

I believe the problem here is that Hungarian simply has much more resolution/precision than English. Russian guy speaking, where we have 30+ forms of every word.

Collapse
 
magwas profile image
Árpád Magosányi

Thank you Noun Adjunct then. (It was Hungarian, a finno-ugric language. Romanian is an indo-european one from the latin family.)

Collapse
 
somedood profile image
Basti Ortiz • Edited

Before anything else, I would love to thank you for your brief explanation of the Hungarian language. I actually learned a lot from it. I may not memorize the words, but I now understand the subtle semantics behind them.

As for your question, I don't quite follow what you mean by "phrase the naming convention"?

Collapse
 
magwas profile image
Árpád Magosányi

I am the one who should write down (phrase) what our naming conventions should be. And I am struggling with such simple questions, whether "registration" counts as a noun or adjective in this case.

Thread Thread
 
somedood profile image
Basti Ortiz

Ah, I see. In my view, I'd say it's more appropriate to regard them as nouns, especially if you refer to the "process" or "object" as a whole. The "outcome" should be communicated as a consequence of functions (verbs). For example, the "outcome" of a process must be communicated through function names such as getRegistrationStatus or registerUser. That way, we can be explicit on whether we're referring to the "process" or the "outcome".

Thread Thread
 
magwas profile image
Árpád Magosányi

The only problem that in our implemetation pattern the function name does not convey any information, as there is only one function per controller, according to Single Responsibility Principle. It is fixed to be "call".

Thread Thread
 
somedood profile image
Basti Ortiz

Ahh, that's going to be a tough nut to crack then. I suppose my only advice for now is to remain consistent with it. It is ideal to follow the current convention whether it prescribes the "process" or "outcome".

Honestly, it's quite a lackluster piece of advice, but it's really the best one I have right now at the top of my head.

Collapse
 
rachelsoderberg profile image
Rachel Soderberg

Wow, you caught a completely different side of naming conventions that I hadn't even considered when I wrote my post (also thanks for the shout-out, I'll add a link leading to yours as well!)

This is a great article and I'm glad you took the time to lay all of these conventions out there. Many who earned a formal degree will inherently follow these rules, but a number of people are self-taught and may have never realized there was a language behind the language.

Also, I am pro-screaming case for constants. It makes it absolutely clear that I shouldn't be doing any changing of them.

Collapse
 
somedood profile image
Basti Ortiz

Thank you! You didn't have to link it, though. That's too nice. 🙂

Collapse
 
rachelsoderberg profile image
Rachel Soderberg

I know, but they work as perfect compliments to one another - Figured it would benefit everyone to be able to find yours if they want to learn more on the topic 🙂

Collapse
 
rbleattler profile image
Robert Bleattler

I just want to point out that in the PowerShell callout, each line ends with a semicolon. That is incorrect. There is virtually nowhere in PowerShell code where a semicolon is required to end a statement. Otherwise, this was a great read. Thanks a bunch!

Collapse
 
somedood profile image
Basti Ortiz

I'd say "incorrect" is too strong of a word there. Indeed, PowerShell only requires semicolons for one-liners in the shell, but deeming them as "incorrect" syntax is too harsh. Personally, I use semicolons for consistency and as a "visual separator" of sorts. I just find it much more readable to see a semicolon at the end of a statement—similar to how it's easier to read a paragraph with sentences that end with a period.

Also, thank you, as well, for taking the time to read my article. Time is a limited resource nowadays, and I appreciate that you've given some to read my article. 😉

Collapse
 
ionine profile image
Robert Bleattler

Well, to be specific, the only time you need a semicolon, as a command separator or line terminator is when running multiple commands on the same line; typically in the active shell. There’d be little justification for doing so in a script file, as one would simply use a new line. In any case, thanks again for the time put in to write the article. PowerShell is simply my ‘bread and butter’ so to speak. ;)

Thread Thread
 
somedood profile image
Basti Ortiz

Ah, yes. I catch your drift now. I still plop in some semicolons nonetheless. It's just my way to understand the code better. It's a bit harder to read without periods, you know?

Collapse
 
vinceramces profile image
Vince Ramces Oliveros

What about enums? I personally go camelCase with constant variables. And I put SCREAMING_CASE on any enum values.

Anyways I always thought naming conventions should follow the grammar-based naming conventions, even though in some style guides, they're different in some aspects. I don't want them to label me as code-nazi(sorry for germans) for correcting their code even though there are no errors. It's just the readability matters.

Collapse
 
somedood profile image
Basti Ortiz

Yes, that's completely fine! There's nothing wrong about a little variation. As long as your code conveys its intent, and you know exactly how to decipher it, you shouldn't really have to worry about "some dood" preaching about how a naming convention should be... unless you have to consider the other members of a team. In that case, it's probably best to follow their style guide over yours.

Collapse
 
vinceramces profile image
Vince Ramces Oliveros

Thank you for the reply. I think it was my first experience reading legacy code with unorganized naming conventions that I had an intern at a certain company. I had to make a documentation, a lot of refactoring, and deleting dead code just to make sure that anyone understood it.

Thread Thread
 
somedood profile image
Basti Ortiz • Edited

I can imagine the hell you had to go through, man. I 👏 applaud 👏 you for carrying on, though. That's a lot of work.

Collapse
 
stepanstulov profile image
Stepan Stulov • Edited

I disagree about booleans being questions. They need to be affirmative statements that evaluate to true or false. You code doesn't ask its reader a question, it gives an answer. But also, purely linguistically, compare:

if (isUserActive) // Bad
if (userIsActive) // Good

This also chains well with dot notation, compare:

bool isUserActive = user.isActive // Re-arranged, statement became question
bool userIsActive = user.IsActive // Word order preserved, remains a statement

Besides, throwing is/are/has/was/etc. to the beginning of the word smells like Hungarian notation, if you ask me.

"If is I developer" :)

Cheers

Collapse
 
somedood profile image
Basti Ortiz

Interestingly enough, I can definitely agree with this. Personally, I would stick with my convention solely for the fact that I've gotten used to it. As soon as I see a linking verb (such as "is" and "are") in the beginning of an identifier, I can immediately assume a Boolean value at first glance given that this naming convention only allows Boolean values to be represented by interrogative statements.

But this is not to discredit your suggestion, not at all. As said earlier, I can agree with it. It's just that I have grown accustomed to the way of thinking brought about by the naming convention in the article. In other words, it's just a matter of habit for me.

Collapse
 
somedood profile image
Basti Ortiz

Ah, thank you for adding this. I purposely didn't include my naming convention for class field visibility because I felt that it was a bit too far out of the scope of my article. It would also make my already long article much lengthier. 😅

I have never actually encountered anyone using camelCase for private members and PascalCase for public/prtoected members. This sounds new to me, but it's a very interesting idea. I like it. It effectively communicates one's intent with the members and their visibility. Perhaps I'll try it out for my next project.

Collapse
 
peterwitham profile image
Peter Witham

Great article that should almost be a pseudo coding standard defined right there.

Collapse
 
theodesp profile image
Theofanis Despoudis

I would avoid uppercase variable names completly.Nowdays every editor can highlight constants plus they look better in the eye

Collapse
 
somedood profile image
Basti Ortiz • Edited

Well, for me, I find SCREAMING_CASE pretty helpful because it's literally screaming at me not to do anything stupid with it. Even if my text editor highlights constants differently, there would be less "visual aid" for me to see that a variable is indeed a constant.

// To me, this communicates its intent better...
const DONT_DO_ANYTHING_DUMB_PLS = 0;

// ...than this.
const dontDoAnythingDumbPls = 0;

// ...or this.
const dont_do_anything_dumb_pls = 0;

Honestly, it's just a matter of personal preference. The whole point of this article is "being able to communicate your intent" after all. I do get your point, though. It can get pretty intimidating to see variables screaming at your face. Although it is ugly, I have to live with it just so I can have the benefits of "visual aid" in addition to its distinct syntax highlighting.

Now that I think about it, since most variables in JavaScript nowadays are immutable (as good practice), the syntax highlighting might not even help me at all. All of my variables would literally be highlighted the same way, which ultimately defeats its purpose. I guess the SCREAMING_CASE serves as an additional line of defense for me in this case.