I've been coding in Java since 2004, professionally since 2009. During these years I've come across several pitfalls that could have easily been avoided upfront - if you had thought of them. In this post, I'm going to list some of them (or more specifically, how to avoid them), in no particular order.
Validate and Sanitise Your Input Data
Input data can be coming from your users, another component within the system or from another system. This means that data validation and sanitation must be done on multiple levels.
The Difference Between Validation and Sanitation
Validation means that you check that your input data is correct. The result of this operation is essentially a boolean: either the input data is correct, or it is not.
Sanitation means that you take the input data and transform it until it becomes valid. The result of this operation is either valid input data or an exception. Let's take a simple example: phone numbers. People write phone numbers in very different ways, using all kinds of separation characters and groupings. However, writing a function that will remove all other characters except the +
sign and the numbers 0-9
from a string is very easy. In addition, it lets the users enter phone numbers in their preferred format and guarantees that the numbers are stored in a consistent format in your system.
What to Validate
What you should validate depends on the use case and the specifications but here is a list to get you started (it is by no means exhaustive):
- Empty values and
null
s - String max lengths (for example: JPA defaults to 255 characters for string fields)
- Allowed and forbidden characters and patterns (such as e-mail addresses, phone numbers, postal codes and filenames)
- Checksums (such as social security numbers and bank account numbers)
- Numerical limits (such as maximum value, minimum value and number of decimals)
- The decimal point character (are you using
.
or,
or both?) - Script injection attacks (such as JavaScript and SQL)
When and Where to Validate
Data should be sanitised and validated whenever and wherever it enters the system. Bad data should be stopped at the gates. Doing validation and sanitation in the user interface is great for improving the user experience, but it should not be the only place where this is done. The real validation and sanitation must always be done in the backend.
Dealing With Bad Legacy Data
Unfortunately it is not always possible to stop bad data at the gates. If you are dealing with a legacy system, the data may already be inside. A typical example could be a date column that used to be a varchar
(yes, that happens) and now needs to be changed into a date
, or an e-mail column that did not use any kind of validation at all and now contains all kinds of garbage.
You can deal with this in several ways:
- Fix the bad data while you migrate it from the old database to the new one
- Use double columns, one for bad values that could not be automatically migrated and another for correct values
- Use a custom value object that can distinguish between bad data and good data (read more about it here)
In any case you should never let bad legacy data be an excuse not to properly validate and sanitise new data!
Design By Contract
I would guess most programmers that have received some kind of formal training are familiar with the design by contract principle. Let's recap:
- Preconditions state what must be true in order for the operation to succeed
- Postconditions state what will be true once the operation has finished successfully
- Invariants state what conditions remain unchanged before and after the operation has been performed
You should clearly define and document contracts for all your public APIs. This does not mean only the methods you write but also the input and output data (such as DTOs) and any exceptions. You can use JavaDocs and ordinary language for this; there is no need to go formal unless you want to.
A fellow developer (or yourself in six months) should never need to make assumptions of how to use your method or how to interpret its result. For example: I don't know how many times I've had to dig into the source code (sometimes quite deep) just to figure out whether certain output values can be null
or not. Which brings me to the next pitfall...
NPEs Can Be Avoided, So Avoid Them!
We have all run into those pesky NullPointerException
s in production and that is embarrassing. We really should not do that. New programming languages such as Kotlin are doing a great job in combatting them, but even in vanilla Java there are a few simple things you can do to greatly reduce the number of NPEs:
- Validate never-
null
parameters usingObjects.requireNonNull()
- Use
Optional
s for return values that can be empty ornull
(and never use them for values that are never empty nornull
) - Never assume an
Optional
will always contain a value (it wouldn't be anOptional
if it could not also be empty in some cases) - Document your code carefully and read the documentation that others (or yourself six months ago) wrote
- Use
@NotNull
and@Nullable
annotations if your IDE supports them - your IDE may be able to warn you about potential NPEs even before you compile your code
If Possible: Reuse by Composition, Not by Inheritance
We programmers are lazy. If we can write something only once and reuse later, we very much try to do it (otherwise we copy-paste it). Especially in my earlier programming days my go-to method for doing this was through inheritance. Now I know better.
When I was a child, I loved to play with LEGOs. My cousin had a Playmobile castle and I loved to play with it as well. I could build all kinds of castles with it - but only castles. With my LEGOs I could build whatever I wanted to: castles, fire engines, space ships, boats, and so on.
In the software world, Playmobile would be reuse by inheritance and LEGO reuse by composition.
Reusing code by inheritance assumes that all future use cases will fit into a specific mold. In my experience this is rarely the case and if you find yourself making changes to your base class in order to support a requirement in a subclass, you are in trouble. If this happens, you probably did not have the correct level of abstraction in your base class (google the Single Level of Abstraction principle for more details).
If you instead go for reuse by composition, you build reusable building blocks that can be combined and used as needed. This is often a far more extendable and future proof approach than inheritance and still allows you to reuse code and not repeat yourself.
Getting the reuse approach right is especially important if you are building a platform or a framework. And if you are doing that, you should ask yourself if you actually need to do that or if you are accidentally over-engineering (been there, done that).
Top comments (0)