DEV Community

Cover image for Good File Structure
Herbert Breunung
Herbert Breunung

Posted on • Edited on

Good File Structure

My last post ended on the note, that code files need a dependable structure. Work becomes more productive and enjoyable if you know where is what, without having to read characters - just from scanning the contours. And you don't even have to memorize the order of sections, as long as the file structure adheres to one overarching rule: FROM THE GENERAL TO THE DETAIL. This rule is intuitive and helps you understand the code in one read. Let me explain.

Even Japanese or Jews start to read a code file in the upper left corner. At that point they may or may not have some knowledge of the content. So it is better to get a broad picture of the content while starting to read. Later smaller and smaller details will be added as needed - FROM THE GENERAL TO THE DETAIL.

The most general parts of a code file are the meta information are the most, like author, date and license. In my opinion the license is way too much text to be included in a code file. Just insert a reference to the file or put the file in an expected place, like the projects root directory, which is also legally binding. And keep the other meta info including its formatting to a minimum, since its rarely of interest. But put it on top or bottom so its easy to skip.

Often you see one or more lines in the meta header, that summarize the content or purpose of the file. The ladder is much more important and should stand just above the name of the namespace / package / object, so they both build a unit that sets up your orientation and expectation. This part is crucial and worth spending time on. It helps the author to sharpen his understanding of what is to be achieved here, which can dramatically improve code clarity and thus quality. It also ensures that the namespace / package / object name is the explicit and concise summary of the summary. Finding such names is a good part of professional programming and this arrangement supports you in that. Because if feel some meaning is missing in the name, you have to add it in the summary. Just don't stop there, but think hard about what aspects of the summary can be deleted by choosing a better name. After you changed the name and shortened the summary, you may want to insert other information into the summary which were left out in order not to bloat. Then the cycle starts again until its good enough.
This technique can also be applied to any other identifier like variables and method names.

The next, less general information is the version number of this namespace and the used programming language. Subsequent are pragmas (optional language features) and library load commands. First the ones from the language core, then third party libs, then the internal ones. Global constants, enums and variables follow. If there are more than a few items, group them for better oversight. Now we have left only the routines, functions or methods. Visual separators are helpful to distinguish between the just described head section and the various types of methods. The following paragraphs will be just about methods, because if you only have functions, you can group them by topic without thinking too much about it. But you should also apply some of the principles that are best taught in an object oriented context.

It is logical to start with the constructor. Not only because it's the first method you will use and probably the first a reader will seek. The constructor will tell you also about many of the arguments used in any method and more importantly about the internal data structure of the class. With that knowledge any subsequent method becomes far easier to grok. And - while we're on the topic of the life cycle of an object - the destructor method and even serialisation (if present) should be part of this section too. If only because the code that employs the same special knowledge should be pooled together to minimize searches and make code more self documenting.

The only other methods that reach into parts of the internal data structure should be the getter- and setter- methods - also known as accessors. These are the content of the second block of methods. They are usually very small and give you a good and fast outline over the internal and external interface (API) and its data flow.

Next are the methods that contain the most lines of code - let's call them workhorse methods. They should be few and well commented.

The fourth section contains helper functions and methods, that are so specific to this class, that they should not be abstracted into an own namespace. Some might protest and call this bad practice and violation of the stated goal: being able to understand the class in one reading.
They want to know the content of a method before it is called, to have a thorough understanding of what is happening. While I sympathize with this stance, I think it contains misunderstandings. First off, most constructors call others methods - so it is not a rule we can comply with while holding to the proposed order. Secondly, an implication of our stated main rule is the sub rule: from the public API on inward. This alone places the auxiliary methods last. But the best defense recalls the purpose of a method. It is a piece of abstraction. If you need to see the internals, you have created a flawed abstraction. Either the function name does not tell you what is going on, or the method is too big (complex) or even worse: it has side effects.

Sure sticking to all this takes discipline at first (read: pain). But it pays off in the long run. Your code gets easier to read, maintain and extend (all what was promised by using OO). Even writing new classes becomes easier, since you no longer think about a lot of details. You are free to concentrate on the irreducible problems that your class solves. So have fun and fine tune the rules to your sensibilities and needs - but be consistent.

Top comments (0)