DEV Community: fluffy

My least favorite question in all of tech recruiting

fluffy — Sun, 16 Apr 2023 23:02:01 +0000

“Are you frontend, backend, or full-stack?”

I really hate this question, for so many reasons.

First of all, it presupposes that there’s only two sorts of things that are done in software anymore: either you’re making websites (frontend) or services called by them (backend), or you’re someone who does both, but still using the frontend/backend dichotomy.

There are so many other kinds of software out there. Not all the world is Building Websites. Just off the top of my head there’s the extremely broad categories of graphics, platform, audio, gameplay, automation, embedded, infrastructure, distributed systems, and so much more.

Even in today’s dystopian push towards blockchain and machine learning, what kinds of engineer works on the underlying systems there? It’s neither backend nor frontend.

“Are you frontend, backend, or full-stack?”

There’s a lot of assumptions baked into this question. It assumes that all engineers are making use of existing libraries and gluing them together, just to build websites. It doesn’t have any room for the people building the frameworks behind those disciplines. Would the people who implemented ReactJS or Node be considered any of those categories? What about people who build language bindings for libraries? Or game engines? Or device drivers and kernel modules? What about the people who are making the basis for how all those other things exist?

Where in frontend, backend, or full-stack do you see people who are building web browsers, or home automation equipment, or devising algorithms that solve problems of scalability? Where do image processing experts go? What about people who are solving problems in computational biology or large-scale storage or information security and privacy?

“Are you frontend, backend, or full-stack?”

While I have done some web development throughout my career it’s never been a focus. I have no interest in building websites professionally, and I’ve only used web technology as part of application platforms where the problems I was solving were more about how to make applications easy to develop for a wide variety of software platforms, including (but hardly limited to) the web. When I do build web stuff I’m doing it largely in plain, hand-written HTML and CSS, with as little Javascript as I can get away with, and as much as possible rendered server-side, without having a fleet of microservices exchanging data around so that I need a billion servers just to render a single page of HTML.

I’ve built frameworks for media applications that run directly on devices; while some of them used Javascript as their language, none of them were really web-based. I’ve built game engines for small, low-powered systems. I’ve built augmented and virtual reality experiences for PCs and mobile devices. I’ve worked on data warehouses, I’ve solved problems of computational complexity, I’ve implemented physics for interactive experiences.

Even when I worked at web-oriented companies I wasn’t building stuff “for the web.” At Amazon I worked on the original Kindle, converting scanned books into display-agnostic eBooks. I worked on the caching team, increasing the performance and decreasing the operational costs of a fleet of tens of thousands of servers. I worked on the image service team, managing the catalog image metadata and writing the image scaler and image processing routines that power major portions of the website. Is any of that frontend? Backend? Full-stack?

"Are you frontend, backend, or full-stack?"

Oh, you're a chef? Do you make pizza, salads, or both?

Oh, you're a mechanic? Do you work on tanks, motorcycles, or both?

Oh, you're an architect? Do you design dog houses, malls, or both?

“Are you frontend, backend, or full-stack?”

The only answer I can truthfully give: No.

Code: Radix sort revisited

fluffy — Sun, 14 Mar 2021 20:40:03 +0000

Around two years ago I wrote an article on the perils of relying on big-O notation, and in it I focused on a comparison between comparison-based sorting (via std::sort) and radix sort, based on the common bucketing approach.

Recently I came across a video on radix sort which presents an alternate counting-based implementation at the end, and claims that the tradeoff point between radix and comparison sort comes much sooner. My intuition said that even counting-based radix sort would still be slower than a comparison sort for any meaningful input size, but it’s always good to test one’s intuitions.

So, hey, it turns out I was wrong about something. (But my greater point still stands.)

Here are my implementations of bucket-based and count-based radix sort, in C++:

big-o-2.cpp (excerpt)

#include <vector>

typedef std::vector<uint64_t> TestCase;
typedef std::vector<uint64_t> Bucket;

template<size_t K = 8> void radix_bucket_sort(TestCase& input)
{
    constexpr size_t slots = 1 << K;
    constexpr TestCase::value_type mask = slots - 1;
    constexpr size_t bits = sizeof(TestCase::value_type) * 8;
    for (int r = 0; r < bits; r += K) {
        Bucket radixes[slots];
        for (auto n : input) {
            radixes[(n >> r) & mask].push_back(n);
        }
        input.clear();
        for (auto& bucket : radixes) {
            input.insert(input.end(), bucket.begin(), bucket.end());
        }
    }
}

template<size_t K = 8> void radix_count_sort(TestCase& input)
{
    constexpr size_t slots = 1 << K;
    constexpr TestCase::value_type mask = slots - 1;
    constexpr size_t bits = sizeof(TestCase::value_type) * 8;

    TestCase output(input.size());

    for (int r = 0; r < bits; r += K) {
        size_t counts[slots] = {0};
        for (auto n : input) {
            ++counts[(n >> r) & mask];
        }
        size_t accum = 0;
        for (auto& n : counts) {
            n += accum;
            accum = n;
        }

        for (auto iter = input.rbegin(); iter != input.rend(); ++iter) {
            output[--counts[(*iter >> r) & mask]] = *iter;
        }

        std::swap(input, output);
    }
}

For the full source, see big-o-2.cpp

And here are the time comparisons between std::sort, and both bucket and counting radix sort using a 4- and 8-bit radix: (raw data)

So, what’s going on here? And why are even the bucket-radix sort graphs different than last time?

It’s hard to do a like-for-like comparison with the previous set of implementations; this time around was running on a very different computer (a Mac mini running on the M1 chip), a newer version of the C++ compiler, a newer version of the C++ standard library, and who knows how many other differences. It’s pretty interesting that the power-of-two allocation overhead from bucketed radix, in particular, has more or less gone away; there’s possibly something about the M1 architecture which makes the vector resize take much less time, and also making use of clang’s robust C++17 support may have also reduced some of the copy overhead due to implicit move semantics being used.

But it’s pretty interesting to me that the following things are pretty apparent:

A 4-bit radix+bucket sort breaks even with std::sort at around N=13000
An 8-bit radix+bucket sort breaks even at around N=44000
Both 4- and 8-bit radix+count sort break even pretty much immediately (around N=600 and N=100, respectively)

Now, all that said, this still demonstrates a problem with just assuming that a lower big-O factor is better. All four of those radix sort implementations are (\mathcal{O}(N)), but the bucket-based ones still are slower than the (\mathcal{O}(N lg N)) std::sort until fairly large input sizes, even with all of the overall performance improvements which have happened (after all, std::sort has gotten faster too).

And, of course, all four radix sorts have the same time complexity as each other, but they all scale with different factors; in particular, it doesn’t take that long for the radix+bucket sorts to overtake the 4-bit counting sort (which is, frankly, pretty surprising to me).

As always, the fact that an algorithm scales better in the long term doesn’t mean it’s suitable for all input sizes. Even in this best-case situation, std::sort still wins for input sizes of a few hundred values, and of course the maintenance overhead of using std::sort is way lower.

It’s also important to remember that these sorting functions can still only work on unsigned integers, or signed integers with a little tweaking. They are not applicable to floating-point values, much less things with more complicated tiebreaking rules (such as database rows).

And, heck, it’s also really easy to write code which is (\mathcal{O}(N)) but not optimal! As we saw with the previous article.

So, my conclusion is still the same: practical concerns trump theoretical complexity. But as a bonus conclusion, it’s okay to revisit things to see if they’ve changed.

Oh, and you might also want to consider that just because your parts of the algorithm have a certain complexity factor doesn’t mean that’s what the runtime performance will be; it’s easy to make something accidentally quadratic.

comments

Advice to young web developers

fluffy — Mon, 15 Jun 2020 16:33:58 +0000

I’ve been making websites in some form or another since 1995. After 25 years of experience I think I’ve accumulated enough knowledge to know a few things. Here’s some things I’d like younger developers to think about, in no particular order:

Sometimes a website is just a website.
The browser is already a client; HTML is its language.
The web is built around server-side rendering.
You can provide your data in more than one way; consider HTML to be one of several possible data representations.
Scaling your server helps everyone. Expecting client-side scaling only helps people with the fastest computers and Internet connections.
Not everyone has (or can use) a mouse.
Not everyone has (or can use) a keyboard.
Not everyone has (or can use) a touchscreen.
Not everyone can see colors or pictures the same way you can.
Not everyone can process information the same way you do.
It is inhumane to move things around on people.
The browser’s native HTML parsing is far faster than anything you can write in JavaScript.
HTML is already an ideal representation of DOM nodes.
HTML is a rich framework.
You can probably do that layout change in CSS.
Before you roll your own UI component, consider that HTML probably provides it. If it doesn’t provide it, that’s probably for a reason. Attaching DOM events to a <div> or <span> is probably not the best way of doing things.
Not everything has to be a “single-page application.”
Even if you need to preserve client state between page loads (for e.g. music or video playback) you can let the browser do most of the heavy lifting by fetch()ing a new page and replacing your content container at the DOM level.
Infinite scrolls are inhumane. People need to be able to reach “the end.” There are forms of eternal torment described in religious texts that are less mean.
If you must do an infinite scroll (and you don’t), make sure that there’s nothing you need to reach at the bottom.
Give people consistent but random stimulus and you will be habit-forming. Getting people hooked on your product might seem like a good idea, but the tobacco industry feels the same way.
If you design with CDNs in mind, then a server round-trip won’t be slow.
It is okay to use multiple languages in a thing. Not everything has to be isomorphic.
Always validate your data server-side; anything that comes from the client is suspect.
To the developer, “isomorphic” code breaks down the barrier between client and server. To a malicious client, it means they have control over the server too.
Browsers change. Relying on browser-specific behavior means you’re relying on that one browser at that one point in time. Code to the standard, and test everywhere.
Use polyfills to support browsers that don’t yet support the standard you’re using.
It’s okay to copy others; it’s how we learn things. Just remember to learn from it.

comments

A peculiar argument regarding accessibility

fluffy — Thu, 11 Jun 2020 00:51:04 +0000

I was reading the article Advocating for a Compassionate UI from Rally Health, a tech company who runs a benefits portal for my insurance company. I was reading it specifically because I’ve had various accessibility issues with their website and I wanted to see what their thoughts were regarding accessibility.

I found some of their arguments to be interesting but not compelling. For example, they talk about how accessibility needs are more prevalent than browser marketshare:

It might surprise you to see that many types of disabilities are more frequent than browsers typically given first-class support from developers and product owners. There’s no sense in supporting IE 11 if you haven’t already given full support to the colorblind and those with visual impairments.

One problem with this line of thinking is that many reasons why people prefer to use alternate browsers is because they have extension ecosystems which allow them to adapt the web to their accessibility needs. For example, I vastly prefer to use Firefox, largely because it has extensions which allow me to address some of my motor and attention dysfunctions (I run an ad blocker for a reason and it’s not to screw over publishers!), and many of the very features they talk about are things that browsers-which-aren’t-Chrome have support for, such as being able to override color schemes and input methods.

Unfortunately, because Firefox doesn’t enjoy first-class support anymore, many, many sites break on it now, and those sites claim to be “accessible” because they support a handful of Chrome-specific accessibility features (which still don’t actually work well with adaptive technologies and the like). Even if more people knew about the accessibility features and add-ons that Firefox offers, they wouldn’t be able to use it as their full-time browser specifically because so many sites break on it.

The web itself is a platform; targeting a specific browser is shortsighted and terrible, and leads to a vicious chicken-and-egg cycle. Supporting marginal browsers is a really good proxy for supporting accessibility, as well. One of the things I do when I design a new site is to make sure that it’s usable from text-only browsers like lynx and w3m; I figure that if it works there, there’s a pretty good chance it’ll work on anything non-Firefox. (I also test on Safari, since that covers the parts of the WebKit universe I care about. Typically if something works on Safari it’ll work on Chrome, but the opposite is not true.)

There’s a few other things that are troublesome about that Rally article; for example, it talks a lot about visual accessibility, but uses medium-gray text and light-orange links (with no underlines!) on a white background.

Rally’s site itself also makes use of a bunch of things that fly in the face of this advice:

Designs, implicitly or explicitly, often require the use of custom elements such as dropdowns to allow for greater control over the appearance of the UI to match the desired brand guidelines. It’s often possible to leverage CSS to reskin native browser controls to match the required designs, but when that’s impossible, you may need to create a custom component.

Push back against these custom components as strongly as possible in favor of their native browser counterparts. The native implementations already support all the required accessibility features, and replicating that in a custom component will not be a trivial task. For reference, see the keyboard interactions that a collapsible dropdown needs to support to achieve AAA compliance.

When first signing in from a new browser, it prompts you for a one-time code. This one-time code entry is done using a custom control, that is not very respectful of the fact that people make typos.

Many elements within their frontend framework are based on dynamically-created, <div>-based components which do not respect tab cycling or browser-native autocomplete, and which move around as pages fill in, which messes both with my focus issues (hello, ADHD, listed as the top single “market share” on that chart!) and with my motor issues (I really hate having to use the mouse for things like that).

To be fair, this isn’t limited to Rally Health’s site. It seems like every single healthcare-related website is like this lately; PRIDE Study is particularly egregious about these issues as well, where their web forms are very much Designed For Mobile™ and involve a lot of scrolling (often without it being clear where one is supposed to scroll to) and commits many other accessibility sins, such as:

Making checkboxes and radio buttons look the same
Not having labels act as click targets for their respective controls
Not supporting tabkey selection
Forms done as dynamically-animating, changing “single page application” stuff when it isn’t necessary or desirable
Having entire sections of forms appear or disappear dynamically based on the selections you’re making (without any indication that they’re going to have such side effects)

The web has an input problem. Please, people, stop reinventing the wheel the long, and wrong, way around.

The danger of big-O notation

fluffy — Thu, 03 Oct 2019 21:42:12 +0000

A common pitfall I see programmers run into is putting way too much stock into Big O notation and using it as a rough analog for overall performance. It’s important to understand what the Big O represents, and what it doesn’t, before deciding to optimize an algorithm based purely on the runtime complexity.

What is Big O?

Big O notation is a commonly-used notation in the field of complexity. Roughly, saying that an algorithm is O(f(N)) means that for an input of size N, the overall time taken will grow at a rate that is at worst f(N).

Or, in other words, if you zoom out on the graph of time-taken by an algorithm, you can overlay a curve of f(N) that is scaled in some way such that the time taken by the algorithm will always be underneath that curve.

The way I learned it (which is slightly different than the formulation given on Wikipedia) is that f(x) is O(g(x)) if there are some values k and C such that f(x) ≤ k∙g(x) for all x ≥ C.

What does being “big-O something” tell you?

Not that much, in practical terms.

If a function is O(N), then it means that for a sufficiently-large input of size N, if you double the input then it will take roughly 2N as long to execute.

Similarly, if it’s O(N²), then for a sufficiently-large input of size N, if you double the input then it will take roughly 4N as long to execute.

And so on.

It allows you to sort various algorithms based on their overall complexity class; whatever term is most-dominating in an algorithm is what will eventually dominate in terms of runtime, for a sufficiently-large input. If your algorithm has two phases, one that’s O(N) and one that’s O(N²), then over the long term, for a sufficiently-large input, the O(N²) portion will dominate in terms of runtime, and so the algorithm as a whole is O(N²).

So if I have two algorithms, one `O(N)` and one `O(N²)`, the `O(N)` one will be faster, right?

Oh heck no. Remember, these are just expressions of the dominant factor over long-term increases.

Bubble sort is O(N²) so we should never use it, right?

Well, not necessarily. If you’re only ever sorting small arrays – like on the order of 10 elements or so – it can quite likely be a lot faster than something like merge sort or quicksort. (Incidentally, quicksort is also O(N²) by a strict definition; the average case tends towards O(N lg N) but that’s assuming pivot selections that cut your sets in half every time. And doing a perfect pivot selection turns out to be really hard.)

Remember, O only describes the long-term growth of an algorithm. For an extreme example, trying to find a string that generates a specific fixed-size hash (colloquially called “reversing a hash” but that’s also a terrible misnomer) is O(1) – after all, the search space is always a constant size. There will only ever be 2¹²⁸ possible md5 hashes, so all you have to do to find a string that produces a particular md5sum (given no other inputs) is iterate over every 128-bit number represented as strings, right?

Well, that’ll take quite a long time to do. But it’ll always take on average the same amount of time! (Technically, a random amount which takes on average the same amount of time as generating 2¹²⁷ hashes.)

For a more practical example: general-purpose comparison-based sorting is provably only ever going to be, at best, O(N lg N). But there are actually sorting algorithms that execute with lower complexity; for example, radix sort can be implemented to take only O(N) time, at least given a fixed maximum key size. So, why don’t we always use radix sort when possible? Because in the general case, it’s hecking slow. It just happens to appear to scale very nicely.

For example, here’s a graph of std::sort vs. a radix sort using both a 4-bit and an 8-bit radix, for various input sizes; consider that regardless of radix size, radix sort is O(N) while std::sort is O(N lg N): (source code, raw data)

So, at least with this simple test, it takes an input size of around 40,000 before a 4-bit radix beats std::sort, and around 200,000 before 8-bit radix beats std::sort – even though both radix sorts are O(N) compared to the O(N lg N) of std::sort.

And it only gets worse since radix sort has various time jumps up around certain power-of-two boundaries; this is because as the buckets get bigger, more allocations (and reallocations) need to occur as they hit power-of-two boundary sizes, as well as the various cache thrashing that occurs as a result. This also reflects the fact that radix sort also requires a lot more memory than a comparison sort; radix sort can take up to two times the size of the input array in addition to the input itself, whereas the comparison sorts generally do as much in-place as possible.

The bucket-size overhead of 4-bit radix also makes it scale up much more quickly than 8-bit, and after a certain point it’s not even worth looking at the 4-bit radix time since it’s so much slower. But at least it scales linearly! Maybe after a few billion numbers it’ll start to beat std::sort again.

And yes, std::sort does scale with N lg N, and radix does scale linearly:

Other code metrics

Another thing to note is that the lower-big O algorithms tend to also be much more complicated to implement and maintain. std::sort is literally one line of code; my radix sort implementation is 15 lines just for sorting numbers, without comments, and using an algorithm that doesn’t make intuitive sense to most people.

A general rule of thumb is that the more lines of code you’re implementing yourself, the slower it’s likely to be in the common case, and the worse it is to maintain. Standard library implementations and idioms exist for a reason. Sure, it isn’t always the case that shorter code will be faster (after all, bubble sort is easily done in around 5 lines of code but is way worse than radix sort, performance-wise) but why implement something that at best only slightly improves performance while taking on extra maintenance burden?

This gets even more extreme in scripting languages like Python; an idiomatic sort of a list in Python is just sorted(input) and goes through mostly-native code, which has been optimized to bits by compilers or even going through standard library functions. Implementing your own sort, no matter how efficient, is going to still go entirely through the Python interpreter.

Conclusion

I hope that rather than worrying only about the theoretical computational complexity of an algorithm, you’ll consider using it as a guideline for implementation and pay more attention to the practical performance.

Where do I go from here?

fluffy — Sun, 02 Jun 2019 01:21:23 +0000

So, I’m finding that I’m not particularly happy at my current job, but my various constraints make it very hard for me to find something else that would suit me better.

In particular, I have fibromyalgia which makes it difficult for me to do a huge amount of typing and, even moreso, limits what I can do in terms of oncall or crunch-type work. I’m also at a point in my career where I’m no longer enthusiastic about relearning everything under the sun for the flavor-of-the-week UI framework or whatever, and the chronic-pain brain fog makes it hard for me to juggle a bunch of external dependencies or manage other people.

I love programming and problem-solving from a “How do I do this difficult thing?” standpoint but the day-to-day of actually Getting Stuff Done is getting harder and harder for me to do. What are my options?

I definitely don’t have the right brain skills to be a manager, for example. And while architecting a new system from the ground up is absolutely in my wheelhouse, that's not really something that's a career path so much as the initiation of a new project — something you do when you're already established at a place.

I already tried the "do my own thing, try to make a career out of it" thing, for two years. While I got a lot of stuff done that I care about (particularly my web publishing pet project and a bunch of games and music, I never managed to turn any of it into a sustainable source of income. I have a decent amount of savings, but not enough to last me the rest of my life by any means.

I'm at a loss for what to do next. I'm in my early 40s, and I'm tired — exhausted, even — and burned out and disabled. I have so many things of my own that I want to make but I don't have the energy to do any of them, and I don't have the wherewithal to make them things that other people do either. I've worked hard all my life — harder than I should have been capable of, if anything — but now I feel completely spent and I don't know what to do.

There are many things I'd love to do and which I think I'd be good at, were it not for the issues my disability brings, but anything that puts me on a rigid schedule that can't accommodate how my body is doing from a day-to-day basis is an absolute no-go.

So, given that, does anyone have any ideas for what I possibly can do?

(As a note, please spare me any medical or ergonomics advice; I'm already working with doctors and trying to address the fibro issues as best I can. There is no one perfect cure for this poorly-understood condition, and if you haven't lived with this condition you definitely don't understand its implications on what I can or cannot do.)

Memories

fluffy — Wed, 10 Apr 2019 19:57:49 +0000

Much has been written about how Electron apps take a lot of memory; after all, each one is running its own instance of a web browser, and pulling in all of the overwhelming amounts of support code that implies. Slack can easily end up taking over 1GB of RAM, and Discord usually takes a few hundred as well. As someone who used to use IRC back in the 90s, when a single task taking even 1 MB of RAM was considered a lot, this feels rather horrifying:

On my iMac, with 24GB of RAM, that means that chat apps — doing the equivalent of an IRC client (granted, with a bit more visual stuff, but not that much) — were taking about 6% of my RAM!

But come to think of it, back in the mid 90s, when a typical computer had 8MB, an IRC client probably took around 400KB of RAM, which is also 6%. So have things really grown proportionally in that way?

Well, I've figured out a way of getting these chat apps to take half as much of my total RAM overall, but first, let's talk about my personal history of memory usage.

In 1983, my family got our first computer. It was a Commodore 64, an 8-bit microcomputer with 64KB of RAM (with no separate video memory) and while we started out with the Datasette tape drive, we soon upgraded to the 1541 disk drive, which let us store a whole whopping 170KB per side (340KB per disk) — albeit with the use of a hole punch.

I mostly used this machine for making art and music, and playing games.

A few years later we upgraded to a Commodore 128, although we kept using the 1541 disk drive. This had 128KB of memory, although I mostly just ran C64 apps on it (although as the family word processor, the C128-enhanced version of Pocket Writer was a huge upgrade for us).

In the late 80s we got a PC AT clone. 80286 at 12MHz (I think), with a whopping megabyte of RAM. This was the machine where I first got online, as well; we had a modem for the C64 but it was only 300 baud, and the 2400 baud modem on the 286 was actually useful. All online access was through a dialup account, and I doubt the modem program used more than 64KB of RAM (because DOS). So the ability to chat used (effectively) around 6%, sort of. We also had a 40MB hard drive for it.

I mostly used this machine for making art, playing games, and chatting online. (I still used the C64 for making music. Not the C128 — its SID chip sadly met a tragic end due to an incident with static electricity.)

In the early 90s we got a 486, with 4 MB of RAM. It mostly ran Windows, and we got online via AOL. I don't know how much RAM the client used, but it seems credible that it would have used around 250KB, which would have been 6%. We eventually upgraded to 8MB (that extra 4 MB of SIMMs cost something like $180 in 1994 dollars!). I believe we started out with a 100MB hard drive and eventually upgraded to a whopping 200MB.

I mostly used this machine for making art, playing games, making music (Pro Audio Spectrum 16, heck yes), and chatting online.

When I went off to college in 1995 I built myself a 486/100. I went all out, equipping it with 16MB of RAM, and had it dual-boot Windows and Linux. I think the hard drive was 1.2GB! It was so amazing.

I mostly used this machine for making art, playing games, chatting online (using mIRC, which I wouldn't be at all surprised if that used around 1MB of RAM — i.e. 6%), and making music.

Okay, so, life moves forward. More and more computers happen, storage gets bigger, RAM gets bigger.

Come forward to 2017 and I buy my current machine, an iMac, with 2TB of built-in storage, with a 1TB external drive for my media and a... rather large NAS. I'm swimming in storage capacity, and it still doesn't feel like enough space.

And I mostly use this machine for... making art and music, playing games, and chatting online.

And the chat clients still take 6% of my memory.

But wait! In the introduction, I said I had a means of making the chat clients only take 3% of my memory. And it was an easy fix!

Did I switch to one of the alternate clients like Sblack or Ripcord? Well, I've tried those out, but their UX doesn't quite work for my accessibility needs.

Did I switch to an IRC-based frontend? No, that removes too much of the functionality to be useful.

How about exotic things like forcing macOS to compress my RAM? Again, no!

The solution was much simpler than that...

Anyway, hopefully this will help me make art and music, play games, and chat online.

The problem with select() vs poll()

fluffy — Mon, 04 Feb 2019 23:21:39 +0000

The UNIX select() API should have been deprecated years ago. While unsafe operations like sscanf(), sprintf(), gets(), and so forth all provide compile-time deprecation warnings, select() is also incredibly dangerous and has a more modern, safer replacement (poll()), but yet people continue to use it.

The problem is that it doesn't scale. In this case, "not scaling" doesn't just mean it's bad for performance, "not scaling" means it can destroy your call stack, crash your process, and leave it in a state that is incredibly difficult to debug.

There are actually two problems, and they are closely-related. The hint to both of them comes from the actual signature of the select function:

void FD_CLR(fd, fd_set *fdset);

void FD_COPY(fd_set *fdset_orig, fd_set *fdset_copy);

int FD_ISSET(fd, fd_set *fdset);

void FD_SET(fd, fd_set *fdset);

void FD_ZERO(fd_set *fdset);

int select(int nfds, fd_set *restrict readfds, fd_set *restrict writefds,
           fd_set *restrict errorfds, struct timeval *restrict timeout);

Do you see that nfds parameter? Here is what the BSD libc documentation has to say about it:

The first nfds descriptors are checked in each set; i.e., the descriptors from 0 through nfds-1 in the descriptor sets are examined. (Example: If you have set two file descriptors "4" and "17", nfds should not be "2", but rather "17 + 1" or "18".)

What this is hinting at is that fd_set's implementation is not a container that contains a maximum number of sparsely-populated file descriptors; instead, it is a bit vector saying whether to check the fds at a given offset. That is, in pseudocode, people expect it to be something like:

struct fd_set {
    int count;
    struct fd_record {
        int fd;
        bool check;
    } fds[FD_SETSIZE];
};

but it's really more like:

struct fd_set {
    bool check[FD_SETSIZE];
};

So this immediately shows what the two problems are, but I'll still explain it further (since this wasn't enough when I was helping diagnose a problem with some critical production code that was breaking). Both of them involve what happens when your process has a large number of file descriptors open (which is very common in large-scale services).

First problem: performance

Let's say you are trying to check the state of a single socket with fd of 907 — which happens to be within FD_SETSIZE on a modern Linux. So your code looks something like this:

fd_set fds;
FD_ZERO(&fds);
FD_SET(907, &fds);
select(908, &fds, NULL, NULL, NULL);

Here is what select() is doing internally (again, pseudocodeishly):

for (int i = 0; i < 908; i++) {
    if (readfds && readfds->check[i]) {
        readfds->check[i] = check_readable(i);
    }
    if (writefds && writefds->check[i]) {
        writefds->check[i] = check_writeable(i);
    }
    if (errorfds && errorfds->check[i]) {
        errorfds->check[i] = check_errored(i);
    }
}

In this case, since only fd 907 is actually being checked, it's looping futilely over 906 entries which don't need to be checked. If you're doing select() a lot, this can add up (remember that the structures described above are just pseudocode, and in reality there's a lot of bitfiddling/masking/etc. going on as well).

But this isn't the dire problem.

Problem 2: How it can destroy your stack

So let's look at the same thing as above, except now the fd in question is, say, 2048, which is quite a lot larger than FD_SETSIZE. Theoretically this is not possible, but in practice it is — it's pretty simple to raise the socket ulimit for circumstances where you need a lot of connections open. (For example, in a large-scale distributed cache. Again, speaking from experience on this.)

Let's annotate the code that calls it, first:

fd_set fds; // allocates a 1024-bit vector on the stack
FD_ZERO(fds); // clears the 1024-bit vector
FD_SET(2048, &fds); // Sets the 2048th bit; congratulations, you've corrupted your stack a little bit
select(2049, &fds, NULL, NULL, NULL); // congratulations, you've just BLOWN AWAY your stack

Why does this blow away the stack? Well, keep in mind what select() is doing internally:

for (int i = 0; i < 2049; i++) {
    if (readfds && readfds->check[i]) {
        readfds->check[i] = check_readable(i);
    }
    if (writefds && writefds->check[i]) {
        writefds->check[i] = check_writeable(i);
    }
    if (errorfds && errorfds->check[i]) {
        errorfds->check[i] = check_errored(i);
    }
}

That is, for all the values up to 2048 (which, remember, is outside of the fd_set and well into your stack at this point), it's checking a bit to see if a garbage fd needs to be checked (which is relatively harmless), checking that fd (which is relatively harmless but kills your performance), and then sets that value in memory based on whether the fd is in that particular state (which is, no matter what, essentially randomizing the stack).

Congratulations, now not only will you get a weird error, you won't even be able to tell where it comes from because your debugger will give you complete garbage on the call trace.

The easy fix: use poll()

Hindsight is 20/20, of course, but poll() is the API that select() should have been in the first place. When select() was designed, UNIX had no asynchronous I/O mechanism at all, and the whole concept of a large-scale network server was unheard of. Everything was its own process, and you'd do something like accept(8) to accept only 8 simultaneous connections at a time, and each of those connections would likely open up a handful of files to serve up static content or whatever. In the early days, FD_SETSIZE was only 16; Winsock raised it to 64, and early Linux raised it to 256. That was still more than adequate for a reasonably-sized server at the time, and it was still generally the case that you'd be checking all of your fds at once in a single-threaded event loop (heck, back then, the whole concept of "thread" was just an abuse of vfork() and not something that any reasonable programmer would expect to use) and that there wouldn't be that many to check; the overhead of managing a variable-sized buffer would be greater than the savings of not having the conditional loop.

But nowadays, any given Internet server can be easily handling thousands, if not tens of thousands, of connections at once, and the FD_SET/select() API is fundamentally broken in a way which can never, ever make this safe.

Fortunately, there's another API, poll(), which is much more like what people expect out of select() in the first place.

struct pollfd {
    int    fd;       /* file descriptor */
    short  events;   /* events to look for */
    short  revents;  /* events returned */
};

int poll(struct pollfd fds[], nfds_t nfds, int timeout);

Essentially, you tell it exactly how many fds you're asking about, and for each fd, exactly which events you want to know about. So now the poll() call only has to scan the actual specified fds, and only check the actual specified events — no checking readable, writable, and error conditions for ALL fds that are ≤ the highest one you're asking about.

In the common case, poll() is a simple drop-in replacement, and is generally simpler and easier to read; this code:

fd_set fds;
FD_ZERO(fd_set);
FD_SET(500, &fds);
struct timeval tv;
tv.tv_sec = 5;
tv.tv_usec = 0;
if (select(501, &fds, NULL, NULL, &tv)) {
    if (FD_ISSET(500, &fds)) { ... }
}

becomes:

pollfd pfd;
pfd.fd = 500;
pfd.events = POLLIN;
if (poll(&pfd, 1, 5000)) {
    if (pfd.revents & POLLIN) { ... }
}

And, obviously, if you want to check multiple types of events on a socket, it becomes even simpler, as you only need a single set of pollfd structures and simple bit masking on the calling side.

Why do people still write code with `select()` then?

I have a couple of theories. select() is the legacy API that was provided in all of the classic textbooks; universities possibly still teach networking using it, and there's a lot of sample code out there that uses select and even the best programmers still have a tendency to copy-paste at times.

Additionally, the fact that gcc doesn't issue a deprecation warning on it means that people haven't had any reason to change their practices. The man pages tend to contain warnings, but people generally just read the signature and return codes.

Current versions of certain libc variants will return an error if nfds ≥ FD_SETSIZE (for example, the one on OSX 10.8 will, although there is apparently no warning to this effect on even current Linux glibc), but at that point the stack is already corrupted and if your server has gotten to the point that a subtle hard-to-understand glibc error is occurring, you're already having a bad day and probably don't realize where the issue is coming from.

For what it's worth, one version of the Linux manpage for select() has this to say:

An fd_set is a fixed size buffer. Executing FD_CLR() or FD_SET() with a value of fd that is negative or is equal to or larger than FD_SETSIZE will result in undefined behavior. Moreover, POSIX requires fd to be a valid file descriptor.

"Undefined behavior" is putting it lightly. (And it makes no mention of how select() itself is also a ticking time bomb in that situation.)

What about `epoll` and libEvent?

I purposefully did not discuss them in this article. While both are far superior to poll, they aren't simple drop-in replacements to select, and at least in the basic case of an Internet server the more important thing is not crashing mysteriously. Obviously with a better asynchronous eventing mechanism you will scale better, but scaling better is often a longer-term goal than scaling at all.

#ifndef guards vs. #pragma once

fluffy — Fri, 07 Dec 2018 02:04:58 +0000

After getting in an extended discussion about the supposed performance tradeoff between #pragma once and #ifndef guards vs. the argument of correctness or not (I was taking the side of #pragma once based on some relatively recent indoctrination to that end), I decided to finally test the theory that #pragma once is faster because the compiler doesn't have to try to re-#include a file that had already been included.

For the test, I automatically generated 500 header files with complex interdependencies, and had a .c file that #includes them all. I ran the test three ways, once with just #ifndef, once with just #pragma once, and once with both. I performed the test on a fairly modern system (a 2014 MacBook Pro running OSX, using XCode's bundled Clang, with the internal SSD).

First, the test code:

#include <stdio.h>

//#define IFNDEF_GUARD
//#define PRAGMA_ONCE

int main(void)
{
    int i, j;
    FILE* fp;

    for (i = 0; i < 500; i++) {
        char fname[100];

        snprintf(fname, 100, "include%d.h", i);
        fp = fopen(fname, "w");

#ifdef IFNDEF_GUARD
        fprintf(fp, "#ifndef _INCLUDE%d_H\n#define _INCLUDE%d_H\n", i, i);
#endif
#ifdef PRAGMA_ONCE
        fprintf(fp, "#pragma once\n");
#endif


        for (j = 0; j < i; j++) {
            fprintf(fp, "#include \"include%d.h\"\n", j);
        }

        fprintf(fp, "int foo%d(void) { return %d; }\n", i, i);

#ifdef IFNDEF_GUARD
        fprintf(fp, "#endif\n");
#endif

        fclose(fp);
    }

    fp = fopen("main.c", "w");
    for (int i = 0; i < 100; i++) {
        fprintf(fp, "#include \"include%d.h\"\n", i);
    }
    fprintf(fp, "int main(void){int n;");
    for (int i = 0; i < 100; i++) {
        fprintf(fp, "n += foo%d();\n", i);
    }
    fprintf(fp, "return n;}");
    fclose(fp);
    return 0;
}

And now, my various test runs:

# gcc pragma.c -DIFNDEF_GUARD
# ./a.out
# time gcc -E main.c  > /dev/null

real    0m0.164s
user    0m0.105s
sys 0m0.041s
# time gcc -E main.c  > /dev/null

real    0m0.140s
user    0m0.097s
sys 0m0.018s
# time gcc -E main.c  > /dev/null

real    0m0.193s
user    0m0.143s
sys 0m0.024s
# gcc pragma.c -DPRAGMA_ONCE
# ./a.out
# time gcc -E main.c  > /dev/null

real    0m0.153s
user    0m0.101s
sys 0m0.031s
# time gcc -E main.c  > /dev/null

real    0m0.170s
user    0m0.109s
sys 0m0.033s
# time gcc -E main.c  > /dev/null

real    0m0.155s
user    0m0.105s
sys 0m0.027s
# gcc pragma.c -DPRAGMA_ONCE -DIFNDEF_GUARD
# ./a.out
# time gcc -E main.c  > /dev/null

real    0m0.153s
user    0m0.101s
sys 0m0.027s
# time gcc -E main.c  > /dev/null

real    0m0.181s
user    0m0.133s
sys 0m0.020s
# time gcc -E main.c  > /dev/null

real    0m0.167s
user    0m0.119s
sys 0m0.021s
# gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin17.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

As you can see, the versions with #pragma once were indeed slightly faster to preprocess than the #ifndef-only one, but the difference was quite negligible, and would be far overshadowed by the amount of time that actually building and linking the code would take. Perhaps with a large enough codebase it might actually lead to a difference in build times of a few seconds, but between modern compilers being able to optimize #ifndef guards, the fact that OSes have good disk caches, and the increasing speeds of storage technology, it seems that the performance argument is moot, at least on a typical developer system in this day and age. Older and more exotic build environments (e.g. headers hosted on a network share, building from tape, etc.) may change the equation somewhat but in those circumstances it seems more useful to simply make a less fragile build environment in the first place.

The fact of the matter is, #ifndef is standardized with standard behavior whereas #pragma once is not, and #ifndef also handles weird filesystem and search path corner cases whereas #pragma once can get very confused by certain things, leading to incorrect behavior which the programmer has no control over. The main problem with #ifndef is programmers choosing bad names for their guards (with name collisions and so on) and even then it's quite possible for the consumer of an API to override those poor names using #undef - not a perfect solution, perhaps, but it's possible, whereas #pragma once has no recourse if the compiler is erroneously culling an #include.

Thus, even though #pragma once is demonstrably (slightly) faster, I don't agree that this in and of itself is a reason to use it over #ifndef guards.

Making a hash of data

fluffy — Thu, 06 Dec 2018 17:26:51 +0000

When I was replacing peewee with PonyORM in my web publishing engine, I was evaluating a few options, including moving away from an ORM entirely and simply storing the metadata in indexed tables in memory. This would have also helped to solve a couple of minor annoying design issues (such as improper encapsulation of the actual content state into the application instance), but I ended up not doing this.

A big reason why is that there don't actually seem to be any useful in-memory indexed table libraries for Python. Or many other languages.

Back in the 90s when I was a larval software engineer, the common practice for teaching data structures always started with the basics: linked lists, then binary trees (first naïve, then self-balancing ones such as AVL or red-black trees, possibly with a foray into B-trees), and then onto hash tables, typically starting with fixed hashes and then moving on to dynamic hashes. Hash tables themselves were often even treated as an afterthought, a performance optimization that wasn't useful in the general case! In the meantime, algorithms based on binary search on sorted lists and the like would also be a major part of the curriculum.

Somewhere along the lines, though, it seems like everyone has moved away from ordered, indexed structures and towards everything being based on hash tables. Somehow as an industry we've all decided that it's only important to be able to do O(1) constant-time lookup on fixed keys with no ordering necessary between them.

As an aside, I don't think it has anything to do with experience, and everything to do with the environment in which people are being taught data structures; my CS education was very much focused on the way that data structures and algorithms work, but it seems like all of the younger programmers I talk to (oh god I sound so old now) were taught that data structures themselves aren't that important beyond being an implementation detail or the basic theory, and furthermore that computers have such vast resources available that you don't really need to care about this (oh god I am so old now).

C++ and Java both provide ordered associative structures. std::map and TreeMap<T>, for example. But whenever you ask a programmer (especially a younger one) about them these days, people just point out that they're "slow" because they're O(log₂ N) for key lookup, and that you should only use std::unordered_map or HashMap<T> instead, because they're O(1).

But focusing on this forgets a few things:

The big-O complexity factor matters for really large values of N, ignoring the constant overhead of the underlying computations (which, in the case of string hashing, can be pretty slow, especially compared to a handful of lexical comparisons)
Single-key lookup isn't the only dang thing you might want to be doing on your data!

Ordered indexes are really freaking useful for a lot of stuff. For example, trying to find all of the keys matching a prefix, or finding values which are greater than or less than a key (or doing a range query in general). Or quickly finding the next or previous entry in a content store.

All of these things require a full table scan -- meaning an O(N) operation -- in a hash-only scenario. Or if you want to do a range query and then sort it at the end, it takes O(N log₂ N), as you have to first filter the table (which is O(N)) and then sort it (which is O(N log₂ N)). How is this more efficient than just using an O(log₂ N) one-time lookup?

Interviewing candidates

One of my recurring duties as a full-time software engineer was to interview other engineers as job candidates. One of my go-to problems was writing a Boggle solver. There are generally three phases to the solution:

Determine an overall algorithm
Given a function for determining if a prefix exists in the word list, traverse the board to find the solutions
Implement the function for determining if the prefix exists

Phase 3 is usually the hard part that most candidates have the most trouble with. There are very simple solutions to this problem, though. You could start out by sorting the wordlist in an array (O(N log₂ N)) and then do an O(log₂ N) prefix search for each word, or you can store it in a tree-type map (also O(N log₂ N) for the initial storage) and then do an O(log₂ N) lower-bound search for each word, or you can do what most candidates go with and either store the word list in a hash table (O(N)) and do a search through the whole table for every check (also O(N)), or they build a trie to store all the words with a flag as to whether a node is a leaf (i.e. if the word is complete), which is essentially an O(N) initial phase and an O(L) lookup (where L is the length of the word). These are all acceptable solutions, but the amount of code you have to write for each thing is... highly variable.

For example, in C++, here is how you do a prefix search on a std::set:

bool has_prefix(const std::set<string>& wordlist, std::string prefix) {
    auto iter = wordlist.lower_bound(prefix);
    return iter != wordlist.end && iter->substr(0, prefix.length()) == prefix;
}

Or here is how you do it in Java with a TreeSet<String>:

boolean has_prefix(TreeSet<String> wordlist, string prefix) {
    String first = wordlist.floor(prefix);
    return first != null && first.startsWith(prefix);
}

Or if you're in a language without a tree-based set concept, such as Python, and you store your dictionary in a sorted list:

def has_prefix(wordlist, prefix):
    import bisect
    index = bisect.bisect_left(wordlist, prefix)
    return index < len(wordlist) and wordlist[index].startswith(prefix)

I'm not sure where along the line this concept got lost, or why whenever I ask people about indexed data structures in their language of choice they look at me like I've grown an extra head or two. Even among engineers who are aware of indexed data structures, the pushback I get is that it's not really relevant, and that we should all just be building everything out of hash tables or doing full table scans all the time. I feel like this all came from about a decade ago when suddenly Map-Reduce thinking took over the industry, and all software development was based around Big Data and Massive Scaling and parallel clustering of stuff. Which is just really weird to me. Like, not all problems exist at that scale, you know?

(And I mean Map-Reduce is useful at small scales too, it's just not the universal way of reasoning about things. Especially when the excuse for it boils down to, "Eh, there's CPU power to spare, why bother?")

Further interview frustrations

When trying to introspect into software engineers who just rely on a hash table for everything, I try to figure out if they even know how a hash table works.

Almost without fail, candidates who are asked about the underlying data structure end up starting out by taking the key, building a hash on it, and then insert the key into a binary tree. This is pretty maddening! It means that they're paying for the complexity of a binary tree and having the limited capabilities of a hash table -- a worst of both worlds situation. No in-order traversal combined with O(log₂ N) lookups. And when I complain to others about candidates not understanding these fundamentals, the feedback I get is that maybe I shouldn't be expecting people to remember introductory computer science courses (usually with an implication that the person I'm talking to doesn't know it either).

How have we gotten to this point?

C++

Anyway, just out of curiosity I decided to do a timing comparison between a few different ways of implementing a Boggle solver; the only difference between these algorithms is the implementation of has_prefix based on the data structure in question. (Also note that the code itself is crusty stuff I wrote many years ago for a very specific challenge and all I've done with it for this article is to change the data structures in use. It is certainly not high-quality code and I suspect if I were to actually look at most of the code I'd be a bit aghast at it right now.)

Using an ordered set:

$ g++ -O3 -Wall --std=c++11 boggle-orderedset.cpp
$ time ./a.out < board.txt > /dev/null

real    0m0.173s
user    0m0.160s
sys     0m0.010s

Using a sorted vector:

$ g++ -O3 -Wall --std=c++11 boggle-sortedvector.cpp
$ time ./a.out < board.txt > /dev/null

real    0m0.048s
user    0m0.039s
sys     0m0.007s

Using a hash table:

$ g++ -O3 -Wall --std=c++11 boggle-hashtable.cpp
$ time ./a.out < board.txt > /dev/null

real    0m44.075s
user    0m43.867s
sys     0m0.110s

Using a hand-rolled trie:

$ g++ -O3 -Wall --std=c++11 boggle-trie.cpp
$ time ./a.out < board.txt > /dev/null

real    0m0.362s
user    0m0.320s
sys     0m0.038s

In the above solutions, the surprising thing is that the sorted vector is so much faster than the ordered set; however, what's not surprising is that both of those are many orders of magnitude faster than the hash table approach, and that the trie approach is somewhat slower than the ordered set. Granted, the trie implementation could be a bit better (for example, the actual search algorithm could track where it is in the trie rather than searching from root every time), but the amount of code written is also important:

$ wc -l *.cpp | sort -n
     131 boggle-orderedset.cpp
     132 boggle-sortedvector.cpp
     137 boggle-hashtable.cpp
     161 boggle-trie.cpp
     561 total

So, the sorted vector is only slightly more complicated to implement (in fact the only line count difference is the call to std::sort after the dictionary is loaded -- which is actually not even necessary if your dictionary is sorted to begin with), and yet it's the fastest of all these by far.

Okay, so that algorithm doesn't actually make use of an indexed data structure. But the thought process that leads to implementing it is along the same lines as the thought processes that leads to using an ordered indexed data structure; in effect, the vector is an index, viewed in the greater sense. And, whenever I've interviewed a candidate with this problem, not one has gone with a sorted array and a binary search! (I have had a couple at least go with an ordered set though. But nearly everyone goes with the trie -- and most of them have never even heard of a trie before, and just sort of, like, invent it on the spot. Which is cool, but still...)

Also, the sorted vector approach only really works performance-wise if your input set is static. Once you start adding stuff to it, well, each addition is a potentially O(N) operation, which can end up becoming incredibly costly very fast. For example, if you're writing, say, a software load balancer, and your table keeps track of your backing servers' load levels, every update to that requires changing a row in the index, and if you have a lot of rows (say, you're working at the sort of scale where map-reduce thinking takes over), every single update starts to add up very quickly.

Anyway, with just standard C++ you can build your own indexes yourself, or you can use Boost.MultiIndex, which maintains arbitrarily many indexes for you. It's pretty neat.

Python

Anyway, back to my original conundrum. I wanted to investigate moving Publ's data store into an in-memory table, with various indexes for the necessary sort criteria. The easiest approach was to stick with a database, despite that having very poor encapsulation (since the object storage is essentially global state). But what else could I have done?

When I was asking around, everyone kept on pointing me towards collections.OrderedDict, which is a dict which maintains the order of its keys. But by "maintains the order" it actually means maintaining the insertion order, not any ongoing sort order. This isn't actually useful for the purpose of maintaining an index. (And even if it were, it doesn't provide any primitives for performing a range query or sibling iteration or whatever.)

However, the blist package is a rather comprehensive B-tree implementation, and in particular it provides a sorteddict, which does indeed maintain the sort order of its keys. It also provides a KeysView which allows you to bisect_left the keys themselves, in order to traverse items starting at a particular point. It's certainly not the worst thing to do. So, in the future, if I ever decide to get rid of an ORM entirely, this is what I will probably focus on using. However, this adds a bit more complexity in that if you know your sort key is going to change at all, you need to remember to remove your item from the b-tree and then readd it after it's been updated. (Fortunately this isn't all that different from a typical CRUD mechanism anyway.)

And of course, Publ's use case calls for a lot of reads and not a lot of writes, so it's also not that big a deal to periodically rebuild indexes as a sorted list and just use bisect_left that way. It's still something that needs to be managed directly, though.

Maybe if/when I decide to de-ORMify Publ I'll just end up writing a library to make indexed tables easier to manage. I sure as heck can't find anything that exists as it is. (If anyone does know of anything, please let me know!)

(I mean, unless I can find something that's similar to Berkeley DB and is actually supported in Python 3 and is MIT-license-compatible...)

(note to self: lmdb is a good candidate)

Java

As mentioned previously, Java provides both TreeMap/TreeSet and HashMap/HashSet. But for some reason people are constantly taught to only use the HashMap/HashSet versions, and this leads to people never even knowing that TreeMap/TreeSet even exist or even consider why they might want to use them. I find this incredibly baffling.

Lua and JavaScript

Lua and JavaScript are very popular because of their simplicity; both of them make the same simplifying assumption in that all data structures are hash tables -- including basic arrays. In many cases these get optimized under the hood to act as basic arrays but there are also many situations where that ends up falling apart, and this is why in Lua in particular you generally want to use ipairs instead of pairs, especially if order matters.

The result of this is that there's also absolutely no concept of an indexed, ordered associative array. Either your array traversal is out-of-order (or, in JavaScript, insertion-order, as JS's array acts more or less like Python's collections.OrderedDict, except when it doesn't), or you're getting all of your keys out of the array, sorting that, and then iterating on that array instead. Which adds even more layers of complexity, and reduces performance further.

In Lua you're not likely to be writing anything which makes use of indexed structures (and if you are, you're probably implementing that stuff in C or C++ and calling into it with ffi), but JavaScript? That's used to run a lot of web services, and while node.js provides an ffi binding, that's generally seen as a last resort. So what do people do when they need to handle indexes in node.js services?

Well, in my experience, they seem to just defer it to an ORM, or shrug and suffer the poor performance.

Go

I haven't touched Go in a long time, but the last time I did, the golang orthodoxy was that you don't need ordered indexes. From some cursory websearches I'm finding that this appears to still be the case.

Fortunately, there are container libraries which correct this. It looks like TreeMap even now provides floor and ceiling which can then be used to implement range queries. It looks like it only gained this functionality incredibly recently (i.e. a month ago as of this writing).

C#

C# provides a Dictionary class.

You know what you can do with a real-life dictionary? You can quickly find a word you want, and then see which words come before and after it.

You know what you can't do with a C# Dictionary?

Okay, so C# does also provide OrderedDictionary but this doesn't, as far as I can tell, provide any iteration methods for getting the next or previous entries, or any sort of range queries at all. Presumably these are possible via LINQ, though.

Oh I guess I forgot to write a conclusion

So yeah uh. It's so weird to me that this is the way software has gone. Everything's a big pile of hash tables and nobody is expected to treat anything differently or learn any algorithms that make use of other storage representations. I find that very sad.

DEV Community: fluffy

My least favorite question in all of tech recruiting

Code: Radix sort revisited

Advice to young web developers

A peculiar argument regarding accessibility

The danger of big-O notation

What is Big O?

What does being “big-O something” tell you?

So if I have two algorithms, one O(N) and one O(N²), the O(N) one will be faster, right?

Other code metrics

Conclusion

Where do I go from here?

Memories

The problem with select() vs poll()

First problem: performance

Problem 2: How it can destroy your stack

The easy fix: use poll()

Why do people still write code with select() then?

What about epoll and libEvent?

#ifndef guards vs. #pragma once

Making a hash of data

Interviewing candidates

Further interview frustrations

C++

Python

Java

Lua and JavaScript

Go

C#

Oh I guess I forgot to write a conclusion

So if I have two algorithms, one `O(N)` and one `O(N²)`, the `O(N)` one will be faster, right?

Why do people still write code with `select()` then?

What about `epoll` and libEvent?