This is the English version of my original post in Hungarian. Photo by Unsplash.
In the previous installment of this series, we saw why the null
value can be an annoying source of errors. But there are cases when you positively wish for a variable, that otherwise cannot be null
, to be able to have a null
value. Imagine, for example, a web site that stores the last time each user has logged in.
public class User
{
public string NickName { get; set; }
public string Email { get; set; }
public string PasswordHash { get; set; }
public DateTime LastLoginTime { get; set; }
}
If a user has just registered but they haven't yet logged in, we want the LastLoginTime
property to be somehow empty.
var user = new User();
user.LastLoginTime = null; // Compile-time error
Trying to set LastLoginTime
to null
is futile. DateTime
is a value type, so our code does not even compile.
Special Values
Our first idea to fix this problem is to designate a certain valid DateTime
value to signify a missing login time. We can use, for example, DateTime.MinValue
.
var user = new User();
user.LastLoginTime = DateTime.MinValue;
This of course works, but it is not at all obvious why we are using these magic values. It is always a good practice to introduce constants or readonly fields to explain the purpose of literal values in your code, so let's do that.
private readonly DateTime Never = DateTime.MinValue;
// later in a method:
var user = new User();
user.LastLoginTime = Never;
This is actually quite readable. But just by looking at the User
class, there is no way of knowing that DateTime.MinValue
has a special meaning. Of course we could document it in the XML comment of the property but honestly, when was the last time you read XML comments? Also, while it is virtually impossible that someone logged into our system in 1 A.D. and has never returned since then, and that allows us to safely use that date for our purposes, this is just a happy coincidence. The solution is by no means general. There may be cases when you need the full range of the value type in question, without any values remaining to be assigned a special meaning.
Boolean fields
This leads us to our second idea. Instead of using a magic value, we can move the empty state into a separate Boolean property.
public DateTime LastLoginTime { get; set; }
public bool HasLastLoginTime { get; set; }
Now we can explicitly state that the user has no last login time.
var user = new User();
user.HasLastLoginTime = false;
This solves our immediate problem but the previous one still remains. The consumer of the User
object needs to be aware of their responsibility to check HasLastLoginTime
every time they need the actual property value. Somehow we need to force that check.
Reference Wrapper Class
Our third idea solves exactly that. If we really need this field to behave like a reference type, why not make it one? Then trying to use a null value would immediately cause a NullReferenceException
, which, while unpleasant, pinpoints exactly where the problem is, and it does not allow invalid behavior to happen.
public object LastLoginTime { get; set; }
We could just use object as the type of the property but that way we lose any hint of type safety. If we already use a strongly typed language, let's not give up using the type system at the first sight of trouble.
A much nicer solution is to define a wrapper class, a reference type, that can hold a DateTime value.
public class NullableDateTime
{
public DateTime Value { get; }
public NullableDateTime(DateTime value)
{
Value = value;
}
public static explicit operator DateTime(NullableDateTime ndt)
{
if (ndt == null)
throw new NullReferenceException("Cannot convert null to DateTime");
return ndt.Value;
}
public static implicit operator NullableDateTime(DateTime dt) =>
new NullableDateTime(dt);
}
Defining conversion operators to and from DateTime makes using this type quite natural in most cases. If we update our User
class to use the new NullableDateTime
class as the type of the LastLoginTime
property,
public NullableDateTime LastLoginTime { get; set; }
we can simply assign a DateTime
value to it because an implicit conversion exists between the types. Assigning it to a DateTime
value is only possible using an explicit cast; that's how we warn the caller that this is not an ordinary DateTime
instance and suggest checking for null
.
var user = new User();
user.LastLoginTime = DateTime.Now;
DateTime lastLoginTime;
if (user.LastLoginTime != null)
{
lastLoginTime = (DateTime)user.LastLoginTime;
}
This is not a bad design at all, much better in fact than anything we did before. But it still suffers from a very inconvenient weakness. If we need to have nullable integers, doubles and other value types in our code, we need to define the same wrapper class for each one of them.
Generic Wrapper Class
This problem can be solved very easily using generics.
public class Nullable<T> where T : struct
{
public T Value { get; }
public Nullable(T value)
{
Value = value;
}
public static explicit operator T(Nullable<T> nullable)
{
if (nullable == null)
throw new NullReferenceException(
"Cannot convert null to " + typeof(T).Name);
return nullable.Value;
}
public static implicit operator Nullable<T>(T value) =>
new Nullable<T>(value);
}
By substituting the generic T
type parameter everywhere we previously used DateTime
, we arrive at the above solution which works surprisingly well. Notice that we constrained T
to be a struct, or value type. That also prevents creating nonsensical type declarations like Nullable<Nullable<int>>
, because our Nullable
is a reference type.
By changing the type of the LastLoginTime
property to our new creation,
public Nullable<DateTime> LastLoginTime { get; set; }
we can use it just like before, but we can also declare an infinite number of other nullable types. If we also implement Equals
and GetHashCode
, we feel that there is not much left to improve, and indeed this type could be used in a lot of real-world scenarios successfully.
Generic Wrapper Struct
But there is still a minor issue with performance. Using this nullable type a lot will result in a lot of allocations on the heap as new instances of our class are created. This would not happen if we used the original value types that we wrapped. If we combine our failed attempt that used a boolean property with the idea of the wrapper class, we can create a wrapper structure that has much better performance characteristics.
public struct Nullable<T> where T : struct
{
private T value;
public Nullable(T value)
{
this.value = value;
hasValue = true;
}
public bool HasValue { get; private set; }
public T Value
{
get
{
if (!HasValue)
throw new NullReferenceException();
return value;
}
}
public override bool Equals(object other)
{
if (!HasValue)
return other == null;
if (other == null)
return false;
return value.Equals(other);
}
public override int GetHashCode() => HasValue ? value.GetHashCode() : 0;
public override string ToString() => HasValue ? value.ToString() : "";
public static implicit operator Nullable<T>(T value) => new Nullable<T>(value);
public static explicit operator T(Nullable<T> value) => value.Value;
}
Note that in the default value of this struct, hasValue
is false, exactly how we want it. Almost perfect, except one issue mentioned earlier. This way you could nest Nullables which we wanted to prevent.
Support from the C# Compiler
And now the time has come to admit that I have been deceiving you the whole time. The .NET Framework already contains a Nullable<T>
type with an almost identical implementation that does not have this problem. See it for yourself.
This is achieved by cheating: the meaning of the where T: struct
generic constraint are defined in the C# specification such that:
The type argument must be a value type. Any value type except Nullable can be specified.
The C# language also offers a convenient shorthand notation for using nullable types. Instead of writing Nullable<T>
, you can just write T?
.
public DateTime? LastLoginTime { get; set; }
C# also makes nullable types behave more like a null object than a simple null
value. That is achieved by operator lifting - any operators already defined for the underlying type can also be used on the nullable type, and if any of the operands are null
, the result will also be null
.
int? a = 2;
int? b = 3;
int? c = null;
int? d = 2 + 3; // d == 5
int? e = b * c; // e == null
The so-called null coalescing operator is a binary operator that returns its first operand if it is not null, otherwise it returns the second one.
int? a = null;
int? b = 5;
int c = a ?? 10; // c == 10
int d = b ?? 10; // d == 5
An interesting property of this operator is that it can be chained.
int? x = null;
int? y = null;
int z = x ?? y ?? 10; // z == 10
The bool?
type has special behavior. Here is the "truth table" of the logical and and or operators, when applied to nullable values:
Support from the CLR
All the features of Nullable<T>
presented this far have been made possible either by the .NET Framework or the C# compiler. None of them required support from the runtime itself. But there is one peculiar feature that required a change in the CLR - the boxing behavior of nullable types. Discussing that is beyond the scope of this article but if you are interested, this Stack Overflow answer provides an excellent explanation.
Top comments (2)
Someone commented in your previous post about changes related to null that are coming in C# 8.0.
Includes the ability to have nullable reference types (in addition to value types): github.com/dotnet/csharplang/wiki/...
As someone who fell in love with option in F#, I'm really looking forward to this.
Good discussion of the nullable types, especially how they work with the null coalescing operator.
Might be worth mentioning null-conditional (?. and ?[]) as well. My personal favorite use-case is calling events.
templecoding.com/blog/2017/01/31/h...