Formatting Large Numbers in Python

#python #beginners

Introduction

The Python interpreter can manipulate very large numbers with ease. However, we humans have a slightly harder time reading numbers once they get to a certain size. For example, at the time of writing, 7723151489 was one (unerringly precise) estimation of the world’s population. How readable is that number to you? Certainly, so long as we just want to work with it, we can use it as it is and let the interpreter do its stuff. If we need to display that information to a user, however, as we probably will at some point, we’ll want to present it in a more user-friendly format. That’s what we’ll look at here.

Let’s zoom in a little, though, and imagine we’re writing a simple program to store country information, including population size…

(This article will assume some basic knowledge of Python’s formatted string literals, also known as f-strings. A very brief example should serve as a primer:

name = "mark"
print(f"Hello, {name.title()}. You're looking great today!")

On the second line, we prefix a string literal (anything within quote marks is a string literal) with ‘f’ and, within the curly brackets, include what is known as a ‘replacement field’. The f-string is evaluated at run-time and, from the above example, we get the following output:

>>> Hello, Mark. You're looking great today!

Simple!)

Underscore in Numeric Literals

We’ll begin with the following, capturing some essential information on three countries…

countries = {
    "scotland": {"capital": "edinburgh", "population": 5438000},
    "belgium": {"capital": "brussels", "population": 11250000},
    "germany": {"capital": "berlin", "population": 82665600},
}

We can already see that the population figures are rather hard to scan. Not a problem for the Python interpreter but, employing the excellent maxim that our code should be easily readable by humans, it’s something we should address. Happily, Python 3.6 made this straightforward — we can simply introduce underscores (for more information on this, check out PEP 515 — Underscores in Numeric Literals).

countries = {
    "scotland": {"capital": "edinburgh", "population": 5_438_000},
    "belgium": {"capital": "brussels", "population": 11_250_000},
    "germany": {"capital": "berlin", "population": 82_665_600},
}

Much better! But, what happens when we display this information? If we run the following…

for country, info in countries.items():
    print(
        f"\n{country.title()}'s capital is {info['capital'].title()}. "
        f"{country.title()} has a population of around {info['population']}."
    )

… we receive this:

Scotland's capital city is Edinburgh. Scotland has a population of around 5438000.

Belgium's capital city is Brussels. Belgium has a population of around 11250000.

Germany's capital city is Berlin. Germany has a population of around 82665600.

The formatting we introduced with our underscores hasn’t been preserved (PEP 515 explains that, ‘… the underscores have no semantic meaning, and literals are parsed as if the underscores were absent’), and we’re back to the problem of rather hard to comprehend numbers. The ‘fix’ is very simple, but before we get to it, we need to understand a little about format spec and it’s accompanying format-spec mini-language.

Format Spec and Format-Specification Mini-Language

As is often the case, Python offers a very simple, native way to solve our problem — format strings. They allow us to manipulate our strings in a variety of ways using ‘field name’, ‘conversion’, and ‘format spec’ identifiers within the replacement field. Not all are required and, in fact, the only one we need here is format spec.

Format spec is introduced with a colon. We follow this colon with any of a number of options, each lending our string a particular format quality. These options are described by the Format-Specification Mini-Language — as the name suggests, it’s the syntax available to us in the format spec field. For example, the ‘sign’ option allows us to direct whether all numbers should be prefixed by a relevant sign (positive or negative); whether only negative numbers should have a sign (the default behaviour); or whether negative numbers have a negative sign while positives are prefixed by a leading space. These examples are shown in the code below…

>>> '{:+g}; {:+g}'.format(100, -50)
'+100; -50' // All numbers have a relevant sign
>>> '{:-g}; {:-g}'.format(100, -50)
'100; -50'  // Only negative numbers have a sign
>>> '{: g}; {: g}'.format(100, -50)
' 100; -50' // Positive numbers have a leading space

The options are exhaustive, and I’d encourage you to take a look at the spec for the full range.

The Solution

The solution to our problem is simple. We follow the format spec identifier (a colon) with another of the format-spec mini-language’s many options — a comma (see PEP 378 — Format Specifier for Thousands Separator).

for name, info in countries.items():
    print(
        f"\n{name.title()}'s capital city is {info['capital'].title()}. "
        f"{name.title()} has a population of around "
        f"{'{:,}'.format(info['population'])}."
    )

Running this code produces perfectly formatted information which our users can easily read…

Scotland's capital city is Edinburgh. Scotland has a population of around 5,438,000.

Belgium's capital city is Brussels. Belgium has a population of around 11,250,000.

Germany's capital city is Berlin. Germany has a population of around 82,665,600.

(Note that we could have replicated our code exactly by using an underscore instead of a comma after the format spec identifier.)