What Is an Array, Really? I'm Writing a Book to Find Out

#opensource #computerscience #beginners #architecture

I've written plenty of code that uses arrays. Looped over them, indexed into them, passed them around. But if you stopped me and asked what an array actually is — not the syntax, not the API, the thing itself — I wouldn't have had a good answer. I knew how to use arrays. I didn't actually understand them.

That bothered me enough to start digging, and the question didn't stay contained. It went through the runtime, through memory layout, through the hardware underneath, until it landed on transistors, voltages, and the basic question of how a physical thing can represent information at all.

What started as a single article grew, over about two years, into a longer essay, then into a multi-chapter book, then into something that outgrew a single volume entirely. That's the project now: ARLIZ — Arrays, Reasoning, Logic, Identity, Zero. The name came first, sounding right on its own; the acronym got fitted to it afterward. It's open source, free, and being written in public, chapter by chapter, on GitHub. There's also a small project site if you'd rather browse than dig through a repo.

Why one book needed to become three

An array doesn't really live by itself. Understanding why an access costs what it costs, or why a particular loop order is faster, means understanding three layers stacked on top of each other. So the project split into three volumes, each one a prerequisite for the next:

Volume I — Zero to Bit. How does a computer encode information at all, starting from a voltage difference across a transistor? This covers binary switching, number systems, integers, floating-point, characters, byte ordering, pointers, alignment, and serialization — the representational vocabulary everything else depends on.
Volume II — Silicon Horizon. The hardware that actually executes code: logic gates, memory cells, cache hierarchies, instruction set architectures, pipelining, SIMD, GPU execution. This is where "why is my loop slow" gets a real answer instead of a shrug.
Volume III — Array Odyssey. Arrays themselves, in full — memory layout, every major variant (dynamic, sparse, bit arrays, circular buffers), the structures built on top of them (stacks, heaps, hash tables, segment trees), parallel and distributed processing, and where arrays actually do their work today: machine learning, linear algebra, signal processing, even quantum computing.

The dependency runs one way only. Volume II assumes Volume I's vocabulary; Volume III assumes both. Each volume is readable on its own, but the order is deliberate: voltage, then hardware, then arrays.

Why this isn't another "arrays in 10 minutes" tutorial

Most resources teach arrays top-down: here's the syntax, here's O(1) access, next topic. ARLIZ goes the other direction, bottom-up. The premise is simple — an array isn't a language feature. It's a mathematical object, a function from an index set to a value set, that happens to map cleanly onto how memory and silicon actually work. Once that mapping is visible, a lot of "why is this fast, why is that slow" questions stop being mysterious.

The one fully written chapter so far doesn't even start with arrays. It starts with a question one layer further back: what is data, before any discussion of how it's represented or stored? That sounds like a detour, but it's intentional — conflating "data" with "information" is exactly the kind of confusion that leads to systems that produce numbers when people actually need answers.

Where the project actually stands

Worth being direct about this: ARLIZ is a living draft, not a finished book.

Volume I has one fully written chapter ("The Nature of Data") and one chapter that currently exists only as a title and a short outline.
Volumes II and III exist as detailed chapter plans — hundreds of topics mapped out in the order they'll be covered — with no prose written yet.

What does exist is the infrastructure to support writing it for real, in the open:

Every push to main triggers an automated pre-release build, so the PDF available for download always matches the latest source.
Each volume compiles independently from one shared LaTeX template plus a per-volume config, so changes to one volume can't silently break another.
latexmk and biber run in GitHub Actions, producing downloadable PDFs without any manual build step.

You can pull the current PDF from the releases page right now, in whatever half-finished state it's currently in.

Why I'm asking for visibility, not just code

A project like this doesn't fail from lack of effort. It fails from nobody finding it before the second or third volume exists to prove the idea works. Right now ARLIZ is one written chapter and a long, honest outline for everything after it — exactly the stage where a small amount of attention goes furthest. If the premise sounds interesting, a star on the repo costs nothing and is the easiest way to help it surface for the next person who's also wondered what an array actually is.

If you want to do more than star it

This is also the stage where outside input changes the book the most. A few concrete ways in:

Read a chapter and say what didn't land. "The Nature of Data" in Volume I is the one finished chapter — if an explanation is unclear or an example doesn't help, open an issue with the file and section.
Suggest where something belongs. Volumes II and III are fully outlined but unwritten; if a topic is missing or misplaced, say so in a discussion.
Write something. The contributing guide covers the LaTeX conventions, branching, and commit-message format for anyone who wants to draft a chapter, an example, or a diagram directly.

The book content is CC BY-SA 4.0 and the tooling is MIT, so anything contributed stays free and reusable.

Repo: github.com/papyrxis/Arliz
Site: papyrxis.github.io/Arliz

If you've ever had a "wait, what is this thing, really" moment about something you use every day in code, that's basically the entire origin story of this project — I'd be glad to hear what yours was.

Top comments (3)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.