Stephen Sisk

Posted on Aug 27, 2021 • Originally published at stephensisk.net

I'm building a modern spreadsheet

#sql #tooling #spreadsheet #showdev

I like spreadsheets.

As a programmer, I find the simple UI and immediately propagating updates to be powerful and refreshing. Considering that it was invented over 40 years ago it's truly an achievement that the core concept is relevant today.

But I also look at the UI and see something that hasn't changed much from the days of VisiCalc.

*VisiCalc, circa 1978*

*Spreadsheet, circa 2021*

We moved away from neon blue towards grey and discovered non-monospaced fonts! 😉

What should spreadsheets look like without baggage from the past? Not designed by a human primarily used to working by writing on a chalkboard or paper, but someone from the far future.... 2021!

The traditional spreadsheet UI is a liability

Mistakes caused by spreadsheets have unfortunately grown alongside spreadsheets. It's hard to overstate that, given that the EU has a special interest group devoted to discussing and reducing the risk from spreadsheets, and I think the scope of the problem warrants a look at contributing factors of those errors and ways to reduce them.

The changes I outline below were originally just a way to scratch my itch as a programmer for a spreadsheet tool that combined the simplicity of spreadsheets with the powerful tools I use every day as a software engineer, but I realized that some of the changes I wanted were the result of lessons in avoiding programming errors that the software industry as a whole has had to learn.

We must continually ask if the software we use every day for critical tasks can be improved - my assertion that when it comes to spreadsheets, the answer is yes.

Let's start with equations and basic math

Let's look at a simple spreadsheet equation:

B2-B1

What does this mean? Should we have used B3 instead of B2? Should it have been B1-B2 instead? These are unanswerable questions without manually checking the other cells in the spreadsheet.

What if instead you could write:

profit=income-expenses

🤯

Variable names in a spreadsheet! [1]

So how can we get this magical world where we get to use variables naturally in spreadsheets?

Let's look at a simple spreadsheet:

Note that the spreadsheet already has labels we could use for variable names, but the spreadsheet doesn't realize there's a connection, and figuring out there's a connection would be tricky and error prone.

Instead, we could think about formalizing it, and let each cell have a label and a value. That makes a lot of sense! However, there's a reason existing spreadsheets haven't done that, and it's called.. the grid. The grid layout is flexible and makes creating ad hoc tables easy, but anything can go anywhere, which makes it hard to guess what's a label and what's a value.

So let's talk about a design that lets us bake labels and values in from the start.

Here, we still have cells (one label/value pair), and we line them up vertically. Users can create new cells below existing ones. And we can use those labels as variable names.

This lends itself more to being a document rather than a traditional grid. We can allow users to type in their labels and values and use = as a natural delimiter to separate the two. This usage of = flows naturally from existing spreadsheet formulas.

Let's see it in action!

I think this design works pretty well. And everything doesn't have to be a single column - a newspaper style layout, with multiple columns would allow for using the full screen just like a regular spreadsheet. Larger items like charts could span multiple columns.

For mobile, the layout lends itself well to flowing and adapting to a compressed layout, resulting in something like:

So a vertical layout incorporating labels and cells gives us the power of variable naming without needing to give up on the nice layout of the grid.

Further thoughts:

We can easily autogenerate a name (eg. cell1, A1, etc) if the user doesn't want to enter a label.
Autocomplete with variable names would make this very user friendly
This UI can be made more user friendly by providing a menu/toolbox of items that can be added to the worksheet (calculations/tables/etc) - this means users don't need to memorize a list of commands.
I didn't discuss user defined functions/higher order functions, but they are definitely on the road map.

[1] I hear someone in the back yelling "Spreadsheets do support naming cells!", but... you'll need to go in and manually name the cell a second time - and if the label text changes, the variable name of the cell doesn't change. Also, I'll bet half the folks reading this didn't even realize this functionality existed, and if they do, they don't use it most of the time. I also hear that one other guy waaay in the back yelling that A1 is still a variable name.

But what about tables?

"The grid is perfect for tables!" I hear you saying. "It is nothing but a table".

But when you want to have columns with uniform calculations it's a bit of a pain to copy the formula into each one, and of course who doesn't love formatting a spreadsheet table for the umpteenth time? [2]

Let's take a look at what a calculation table could look like in our new worksheet format:

This is a design sketch and it needs more work. But the core idea is that you simply state that you want a table by entering TABLE and it pops up a simple table creator. The first and second rows define the initial value and formula for the column. You can see the usage of the prev keyword to get values from the previous row.

You'll note that since the app knows it's a table, it's able to nicely format the table header and properly align the contents without any actions by the user.

Note that once the app knows it's a calculated table, you don't need to show the entire table - the user can expand it to see more or less of the table as they see fit, and the document will naturally flow around it.

[2] Yup, I'm aware Excel has formal table support - I bet even fewer folks know about it and use it regularly than named cells. And google sheets support is abysmal

Data Tables & Manipulation

or: I just want to use SQL instead of trying to figure out what "values" in a pivot table means

The last section ignored a big way we use spreadsheet tables: user entered data tables (eg. "spreadsheet as a makeshift database"). I think it'd easy to create that: it'd be as easy as detecting that the user doesn't add a formula as the contents of the cell on the second row.

Once you have data tables, how are you going to process that data?

When I'm coding, I love using functions like Map/Reduce/Filter to manipulate and produce new tables of data (those are equivalent to SQL's SELECT, WHERE, GROUP BY, etc.. if you're more familiar with those), and I'd love to have that functionality available in spreadsheets.

This is another place where the grid becomes problematic. All of those functions output another table, and you need to put that output table somewhere. Thus, they force you to make a compromise on how to display your output. Are you going to stick it all inside one cell where you can't see it? "Spill" the table out across multiple cells? Hope you don't have any data around the cell doing the calculation that get overwritten! (or overwritten later if the size of the output changes).

This is where our new layout really shines. We can just have one cell per operation and output a new table each time.

Further thoughts:

We also don't have to keep user entered data and calculations in separate tables - you can just do it on a column by column basis (some columns in a table can define a calculation, others can contain user entered data).
Having an easy way to add CSV files, other sources of data from the web, and other worksheets sure seems useful

Units

I'm just going to show an example worksheet here, because I think it's pretty self-explanatory:

So now the app understands that$ is a unit that should be carried through to other values - we didn't have to special that total_income was a dollar, it just inherited that unit from the values it read fro.

The _ symbol is ignored in numbers and is useful for making numbers more readable without interfering with function calls*[3].*

... and now that we know it's currency, we can do smart things with the decimals. The "correct" result for 7/2 is 3.5, but we know to format it with 2 decimal places because it's currency. We didn't need to change any formatting settings - it just knows to display with 2 decimal places for currency if there's any at all. You will also note that above in the other_cost_two example, we didn't need to tell it not to display the cents, it could pick up from context that we didn't need to display decimals.

Percents work with their own special rules, knowing that the $ unit should carry through to take_home_income, and that 15% is actually .0015 when multiplying.

... and we can support other types of units, like distances.

You didn't think I'd forget SI symbols, did you?

I think adding units to spreadsheets is exciting, if only for the reason that it reduces the amount of formatting that needs to be manually done by the user.

[3] Function calls + using comma separators in numbers results in chaos. What numbers are being passed to this function? max(100,000, 2, 000) ? You can interpret that by using spaces, but it's a bit iffy. Using the _ symbol as a separator is a convention becoming popular in programming languages for solving this problem. _ is also culture agnostic, so europeans (number format: 2.000,00) and americans (number format: 2,000.00) can finally get along.

Scenarios

While I like using spreadsheets for all kinds of calculations, my personal biggest use case is for investment/retirement planning.

And in that world, being able to explore different scenarios is pretty important. So let's see how we can explore scenarios with worksheets.

Here's an example of a plain old savings worksheet in action:

savings_at_retirement uses a notation I'm not in love with: tableName.column[row], but I think this communicates the idea pretty clearly: You can read individual cell values out of a table. (I'll probably move to tableName[column][row] since that seems much cleaner)

Now, if we wanted to experiment with various savings plans, we could manually change the savings_rate, income, stock_rate_of_return, etc... But why not allow for easy scenario exploration with multiple values?

This UI is definitely a work in progress.

Here we introduce the VARY keyword in stock_rate_of_return, which allows that variable to have multiple values and causes any values downstream of it to produce multiple outputs, one for each of the values.

Introducing the VARY had two downstream effects:

The Next/Prev controls on savings_over_time allow you to switch through the various values of stock_rate_of_return.
You can also see that there's a nice roll up in savings_at_retirement showing the end result of savings for each of the values of stock_rate_of_return.

But why are we restricting ourselves to one variable at a time? Here I show 3 different VARYs working at once:

You'll note that both savings_over_time and savings_over_retirement are updated to show the new values of the VARYs

...and of course if you change the values of VARY, everything will instantly update.

I think this is a very natural but powerful way to express different scenarios, and a simple tool for explore those scenarios.

Feedback, please!

Would you like to see more? Sign up for the beta waitlist

Subscribe to my dev diary newsletter if you'd like to get ~weekly updates on the nitty gritty of my development work, or follow me here on DEV.TO

I'd love to hear your feedback:

Follow me on twitter

Email: stephen [at] qrunch [dot] studio

Top comments (1)

Mihail Malo • Sep 24 '22

I remember seeing something rather similar around, with floating tables and lists connected by lines. Have you seen it?