An object that is not an object
JavaScript has its charming quirks, and one of them is the typeof
operator, which determines the type of a value.
Expected behavior:
typeof 100 //number
typeof "string" //string
typeof {} //object
typeof Symbol //function
typeof undefinedVariable // undefined
And here's our favorite, which is today's star of this article:
typeof null // object
JavaScript, like other programming languages, has types that can be divided into "primitive" - those that return a single value (null, undefined, boolean, symbol, bigint, string
) and "object" types, which have a complex structure.
Simply put - for example, boolean
in JavaScript is something that isn't a very complicated structure, as it returns only one value: true
or false
.
For instance, in the modern Firefox implementation, a technique called "pointer tagging" is used, where a 64-bit value encodes the type and value or address on the heap.
Let's look at how booleans are handled in this implementation:
const flagTrue = true;
Keyword | Tag | Payload |
---|---|---|
false |
JSVAL_TAG_BOOLEAN (0xFFFE*) |
0x000000000000 |
true |
JSVAL_TAG_BOOLEAN (0xFFFE*) |
0x000000000001 |
You can notice that the high bits are responsible for defining the data type, and the low bits for the payload or the address of the allocated object on the heap.
So in this case, our true/false
is represented in binary as 1/0
.
You're probably wondering what this has to do with typeof null
returning object
instead of null.
To understand this, we need to go back 30 years to the original JavaScript implementation in Netscape, which used a 32-bit tagging scheme - completely different from modern engines.
Brendan Eich, who was hired by Netscape, which at the time was the majority participant in the browser market, due to significant market demands and fierce competition from companies like Microsoft and Sun Microsystems, was tasked with creating a prototype programming language that had to meet key criteria:
- be easy for a wide range of people (without static typing, compiler installation)
- allow users to manipulate the DOM at a basic level
So after 10 days, a programming language was created that bore the names: "Mocha", "LiveScript", and finally JavaScript due to marketing pressure to leverage Java's popularity at the time.
After 10 days, a prototype programming language was born that, despite the later fall of the Netscape browser due to competition from Microsoft and the default installation of Internet Explorer on Windows - survived to this day and evolved.
The Netscape browser was written in C, as was the JavaScript implementation itself.
So let's move on to the typeof
implementation in Netscape Navigator 1.3, which greeted programmers of that time with the help
command with this message:
js> help()
JavaScript-C 1.3 1998 06 30
And the code implementing typeof
looked like this:
JS_TypeOfValue(JSContext *cx, jsval v)
{
JSType type;
JSObject *obj;
JSObjectOps *ops;
JSClass *clasp;
CHECK_REQUEST(cx);
if (JSVAL_IS_VOID(v)) {
type = JSTYPE_VOID;
} else if (JSVAL_IS_OBJECT(v)) {
obj = JSVAL_TO_OBJECT(v);
if (obj &&
(ops = obj->map->ops,
ops == &js_ObjectOps
? (clasp = OBJ_GET_CLASS(cx, obj),
clasp->call || clasp == &js_FunctionClass)
: ops->call != 0)) {
type = JSTYPE_FUNCTION;
} else {
type = JSTYPE_OBJECT;
}
} else if (JSVAL_IS_NUMBER(v)) {
type = JSTYPE_NUMBER;
} else if (JSVAL_IS_STRING(v)) {
type = JSTYPE_STRING;
} else if (JSVAL_IS_BOOLEAN(v)) {
type = JSTYPE_BOOLEAN;
}
return type;
}
The macros defining data types in Netscape 1.3 looked like this:
#define JSVAL_OBJECT 0x0 /* untagged reference to object */
#define JSVAL_INT 0x1 /* tagged 31-bit integer value */
#define JSVAL_DOUBLE 0x2 /* tagged reference to double */
#define JSVAL_STRING 0x4 /* tagged reference to string */
#define JSVAL_BOOLEAN 0x6 /* tagged boolean value */
Which translated to this memory representation (32-bit system):
Type | Tag (Low 3 bits) | Memory (32 bits) | Value |
---|---|---|---|
Object | 000 (0x0) | [29-bit pointer][000] |
0x12345000 |
Integer | 001 (0x1) | [29-bit int value][001] |
0x00006401 (42) |
Double | 010 (0x2) | [29-bit pointer][010] |
0xABCDE002 → heap |
String | 100 (0x4) | [29-bit pointer][100] |
0x78901004 → "hello" |
Boolean | 110 (0x6) | [29-bit value][110] |
0x00000006 (true) |
Based on this information, we can create a simplified program by porting a few macros from Netscape to investigate this problem (the code is simplified for educational purposes):
#include <stdlib.h>
#include <stdio.h>
typedef unsigned long pruword;
typedef long prword;
typedef prword jsval;
#define PR_BIT(n) ((pruword)1 << (n))
#define PR_BITMASK(n) (PR_BIT(n) - 1)
#define JSVAL_OBJECT 0x0 /* untagged reference to object */
#define OBJECT_TO_JSVAL(obj) ((jsval)(obj))
#define JSVAL_NULL OBJECT_TO_JSVAL(0)
#define JSVAL_TAGMASK PR_BITMASK(JSVAL_TAGBITS)
#define JSVAL_TAG(v) ((v) & JSVAL_TAGMASK)
#define JSVAL_IS_OBJECT(v) (JSVAL_TAG(v) == JSVAL_OBJECT)
#define JSVAL_TAGBITS 3
struct JSObject {
struct JSObjectMap *map;
};
struct JSObjectMap {
};
// Helper function to display binary representation
void print_binary(unsigned long n) {
for (int i = 31; i >= 0; i--) {
printf("%d", (n >> i) & 1);
}
printf("\n");
}
int main() {
struct JSObject* obj = malloc(sizeof(struct JSObject));
jsval objectValue = OBJECT_TO_JSVAL(obj);
jsval null = JSVAL_NULL;
printf("Is object %d\n", JSVAL_IS_OBJECT(objectValue));
printf("Is null an object %d\n", JSVAL_IS_OBJECT(null));
printf("Binary representation of object: ");
print_binary(objectValue);
printf("Binary representation of null: ");
print_binary(null);
}
The output of this program is:
Is object 1
Is null an object 1
Binary representation of object: 01011000000010100011000111100000
Binary representation of null: 00000000000000000000000000000000
As you can see, null
and object
return the same value in the JSVAL_IS_OBJECT
macro.
You're probably wondering why null
and object
are indistinguishable in this check.
The explanation for this is the above tagging model and basing on memory as an identifier of object types in JavaScript.
Since JavaScript is a dynamically typed language, type declarations had to reside somewhere, so in this case the creator decided to allocate 3 low bits for type identification.
Setting 000
as the object identifier comes from the mechanism of 32-bit architecture operation and hardware requirements related to memory alignment. Objects and arrays are structures that are more complex than primitive types, therefore they are allocated on the heap.
In 32-bit architecture, the CPU loads data in 32-bit portions (4 bytes), and the memory management system enforces the alignment of object addresses to 4-byte boundaries. This means that every pointer address to an object is divisible by 4, which in binary representation results in object addresses always ending with two zeros in binary notation (because 4 = 100 in binary). However, in practice, three lowest bits were used as tags, so addresses had 8-byte alignment, which guaranteed three trailing zeros.
In the case of null
representation, we can see that it's the value 0
(all zeros), which refers to a null pointer in C, which in most architectures is defined as ((void*)0)
, meaning a non-existent location in memory. Because null
is represented as 0x00000000
, and the three lowest bits are 000
, the JSVAL_IS_OBJECT
macro considers null to be an object!
Was it possible to fix this? - of course!
As we can notice, the null representation is simply 0
, a non-existent location in memory, and an object is something that exists, and horror of horrors, the macro that correctly checked for null was in the code, but was not used in the typeof
function!
#define JSVAL_IS_NULL(v) ((v) == JSVAL_NULL)
So the typeof
function should look like this:
JS_TypeOfValue(JSContext *cx, jsval v)
{
JSType type;
JSObject *obj;
JSObjectOps *ops;
JSClass *clasp;
CHECK_REQUEST(cx);
if (JSVAL_IS_NULL(v)) { //check if value is null!
type = JSTYPE_NULL;
} else if (JSVAL_IS_VOID(v)) {
type = JSTYPE_VOID;
} else if (JSVAL_IS_OBJECT(v)) {
obj = JSVAL_TO_OBJECT(v);
if (obj &&
(ops = obj->map->ops,
ops == &js_ObjectOps
? (clasp = OBJ_GET_CLASS(cx, obj),
clasp->call || clasp == &js_FunctionClass)
: ops->call != 0)) {
type = JSTYPE_FUNCTION;
} else {
type = JSTYPE_OBJECT;
}
} else if (JSVAL_IS_NUMBER(v)) {
type = JSTYPE_NUMBER;
} else if (JSVAL_IS_STRING(v)) {
type = JSTYPE_STRING;
} else if (JSVAL_IS_BOOLEAN(v)) {
type = JSTYPE_BOOLEAN;
}
return type;
}
You can find an example implementation with extracted code that you can compile here:
https://gist.github.com/piotrzarycki/a3713de4e63fd275216900a74c8521e2
If the bug was so trivial to fix, why wasn't it fixed?
Well, millions of pages had already started using JavaScript with this bug, were aware of it, and handled it this way.
Moreover, in 2013 there was an official proposal to fix this behavior in the ECMAScript standard, but it was rejected precisely due to backward compatibility - too much existing code could have stopped working.
Therefore, despite 30 years having passed, this behavior reminds us of the context of JavaScript's creation and historical design decisions. To actually check whether a value is an object and not null, we need to handle it like this:
if (value !== null && typeof value === 'object') {
//this is a real object!
}
Originally published at pzarycki.com
Top comments (0)