loading...

The case against "natural/localized" sorting

strredwolf profile image STrRedWolf ・3 min read

So my favourite text editor on the iPad updated -- Textastic went version 8, and one of the features is choosable sorting, by name, extension, date, and size. All fine and dandy, I thought.

I then went to find a file... and it wasn't where I was expecting it.

Here I am, on a commuter train going to Baltimore, pulling up my phone and franticly searching for a file I knew I created and named. Not in Textastic. Not in Dropbox, where it's synced. Not in the regular places...

I then relaunch Textastic... and find it... in the wrong spot. Same directory -- they changed how it's sorted.

It's only ASCII! It's so simple!

Let me put this into context. I'm writing a novel, mixing a weird bit of furry culture with the Station to Station nomadic train, called Throng: Going Station to Station. PLUG WARNING You can see the story of a multi-furry band tour by train at my Patreon PLUG WARNING.

The novel is written in sections, each between 5-10K in size. Chapters are at most 8 sections long, so to keep them in some order, I name them with a section number prefix... that's in hexadecimal.

Here's a small example:

01.Boston-1.md
02.Boston-2.md
03.Boston-3.md
04.Boston-4.md
05.Boston-5.md
06.NewHaven.md
08.NewYork.md
09.Secaucus.md
0A.Trenton.md
0B.Trenton-2.md
0C.Trenton-3.md
10.Philly.md
11.Philly-2.md
18.Baltimore.md
19.Baltimore-2.md
20.WashDC.md
24.WashDC.md
28.Pittsburgh.md
30.Cleveland.md
37.Gary.md
38.Chicago.md
39.Chicago.md
3A.Chicago.md
3B.Chicago.md

As the story progresses through various cities along existing real life rail lines, I'm naming the sections after them.

So what does GTK's file picker, MacOS Finder, and Textastic do? It treats the first number as that, a number, and ignores any front-facing zeros. This is so you can have this:

file 8
file 9
file 10
file 11
file 12

Okay, that's fine... but what about my naming convention? Thankfully, Linux's "ls" requires you to give it the "-v" switch.

$ ls -v1 *.md
0A.Trenton.md
0B.Trenton-2.md
0C.Trenton-3.md
01.Boston-1.md
02.Boston-2.md
3A.Chicago.md
3B.Chicago.md
03.Boston-3.md
4A.Dallas.md
4B.Ft_Worth.md
4C.Ft_Worth.md
04.Boston-4.md
5A.Tuscon.md
05.Boston-5.md
06.NewHaven.md
7F.SanFrancisco.md
08.NewYork.md
09.Secaucus.md
9F.Metroburg-Home.md
10.Philly.md
11.Philly-2.md
18.Baltimore.md
19.Baltimore-2.md
20.WashDC.md
24.WashDC.md
28.Pittsburgh.md
30.Cleveland.md
37.Gary.md
38.Chicago.md
39.Chicago.md

And that's what I'm seeing in MacOS, in GTK's file picker, and now Textastic.

Why hate on strcmp?

It's UTF-8, aka Unicode. I'm running the "en_US.utf8" localization, mainly because it's default. MacOS and iOS uses it. GTK uses it. And now Textastic picks it up, because it's using iOS' NSString.localizedStandardCompare routine... and it sorts by context instead of byte-wise. And it's mandated by Unicode's spec.

That's the thing. It may be mandated but it's not right in all cases. I have the breaking case.

What can be done?

As a programmer, give users the option to turn off "natural" sorting for "literal" sorting. The more options, the better.

As a user, file bugs. I did so for Textastic and GTK/XFCE.

Resist the urge to work-around. Linux has it right at the command line -- make it optional.

Posted on by:

strredwolf profile

STrRedWolf

@strredwolf

Furry software engineer. I solve digital problems. Code tailored "WHILE U WAIT". Transit enthusiast & sci-fi/furry artist/author.

Discussion

pic
Editor guide