loading...

The “Natural order” misconception

peledzohar profile image Zohar Peled ・2 min read

Full version first published on my blog.

A common misconception is that rows in a relational database table has some inherent order. This is not true.

So many developers falls victim of this misconception, even though it’s being repeatedly refuted in multiple professional blogs and forums that I figured it can’t hurt to write about that in one more place.

I want to make this absolutely clear:

Relational database tables are unsorted by nature.

In the relational model published by E.F. Codd, relation are defined as an unsorted set of tuples.
In relational databases, relations are implemented as tables, and tuples are implemented as rows, and while it is true that rows are stored inside the database in a specific order,

there is no relational database that guarantees the order of the rows returned from a select statement without an order by clause.

Even if you do specify an order by clause, if the values of the column(s) listed in the order by clause are not unique, there order of the rows where such duplicate values exists is also arbitrary.

A quick example (T-SQL):


CREATE TABLE Demo
(
    Id int NOT NULL,
    Val char(1) NOT NULL
);

INSERT INTO Demo(Id, Val) VALUES 
(1, 'A'), 
(2, 'A'),
(3, 'B'),
(4, 'C');

-- 1: -- No order by clause at all
SELECT Id, Val
FROM Demo;

-- 2: -- Order by a constant
SELECT Id, Val
FROM Demo
ORDER BY @@SPID
 -- In SQL Server, @@SPID Returns the session ID of the current user process. 
 -- This order the rows using a constant value.

-- 3: -- Order by a non-unique value
SELECT Id, Val
FROM Demo
ORDER BY Val;

In examples 1 and 2 above, (no order by and order by a constant) are completely equivalent - the database is free to return the rows at any order.

In example 3 there's a column with duplicate values (at least for some of the rows) in the order by clause - so the rows that contains the duplicate value are arbitrarily ordered - there are two legitimate results for this query:

Both

Id  Val
1   A
2   A
3   B
4   C

and

Id  Val
2   A
1   A
3   B
4   C

can be returned by the database.

Please note that this rule also holds true for window functions such as row_number() and rank.

Posted on by:

peledzohar profile

Zohar Peled

@peledzohar

By day, try to work. By night, try to sleep.

Discussion

pic
Editor guide