DEV Community

Maciej Michalec
Maciej Michalec

Posted on

Pattern matching in Python

The sixth alpha version of the Python 3.10 was released today (the final release is planned for October). It introduces (with PEP 634) new great feature - pattern matching. Let's see how it works!

Like a switch

The syntax a little bit resembles the switch statement, known from the other programming languages:

from random import randint

day_of_week = randint(1, 7)
match day_of_week:
    case 1:
        print("So we're starting...")
    case 5:
        print("Yeah, friday!")
    case _:
        print(f"{day_of_week}. day of the week")
Enter fullscreen mode Exit fullscreen mode

So we have checking the values from top to bottom, there's something like default label (_ wildcard symbol). In other words, we can easily replace many if__elif__else structures by the simpler syntax.

...but much better

Fortunately, pattern matching has so much more potential! I'm pretty sure that software developers using functional programming languages (like Haskell, Scala or OCaml) already know what I mean.

Now we can use Python to match some more interesting patterns:

match mysterious:
    case 1, y:
        doSomething()
    case (_, y):
        doSomethingElse()
    case str(pair) if len(pair.split(",")) == 2:
        processString()
    case Pair(10, _) as pair:
        usePair(pair)
Enter fullscreen mode Exit fullscreen mode

Dealing with different data

Pattern matching can be used with various data types, starting from lists, through dictionaries, to types defined by the user. In the next paragraphs I'll show you some useful examples.

Lists

When we don't know the length of the list we're dealing with, the pattern matching comes with the help.

The code is more readable as we don't need to call len() method explicitly (of course, that method is used under the hood).

match some_list:
    case []:
        print("Empy list")
    case [x]:
        print(f"Single-element list: {x}")
    case [x, y]:
        print(f"List containing only two elements: {x} and {y}")
    case [x, y, z]:
        print(f"List z three elements: {x}, {y} and {z}")
    case [x, y, *tail]:
        print(f"List with more than three elements. Here are the first two: {x} and {y}")
Enter fullscreen mode Exit fullscreen mode

Dicts

The syntax for dicts also tends to be quite intuitive. We can match the specific keys and values like in the example below:

match employee_record:
    case {"age": age, **personal_data}:
        employee.set_year_of_birth(age=age)
        employee.update_personal_data(personal_data)
    case {"position": "engineer", "salary": salary}:
        update_salary(employee, salary)
    case dict(x) if not x:
        raise Exception("no data to process")
Enter fullscreen mode Exit fullscreen mode

Please note, that case {"key": value} doesn't mean that only single-element dictionary will be matched. It will match any dictionary containing "key", but the length of this dict doesn't matter.

That's why we use case dict(x) if not x to match the empty dictionary. We could use also for example case {} as x if not x, but not case {}. The last one will match any dict, not only the empty one.

More interesting is the use of multiple nested sub-patterns:

x = [1, {"foo": "bar", "foo2": (10, 20)}, 3, 4]
match x:
    case [1, {"foo": val} as d, *_]:
        print(f"Foo value: {val}, whole dict: {d}")
    case _:
        print("Not matched")
Enter fullscreen mode Exit fullscreen mode

Imagine how you would have to implement extracting value of "foo" without the pattern matching. I bet it wouldn't be any more readable.

Custom types

Matching dataclasses is the simplest case among user-defined types. In order to perform object-deconstruction, we use a syntax which, intentionally, is similar to the syntax of object-construction:

from dataclasses import dataclass

@dataclass
class Pair:
    first: int
    second: int

pair = Pair(10, 20)
match pair:
    case Pair(0, x):
        print("Case #1")
    case Pair(x, y) if x == y:
        print("Case #2")
    case Pair(first=x, second=20):
        print("Case #3")
    case Pair as p:
        print("Case #4")
Enter fullscreen mode Exit fullscreen mode

As you see, we can use both keyword and positional arguments. The situation changes when it comes to the custom types which don't use @dataclass decorator:

class Pair:
    def __init__(self, first: int, second: int):
        self.first = first
        self.second = second
Enter fullscreen mode Exit fullscreen mode

In such case only keyword arguments are allowed. So you can write case Pair(first=x, second=y) or case Pair as p, but case Pair(x, y) will raise the following exception:

TypeError: Pair() accepts 0 positional sub-patterns (2 given)
Enter fullscreen mode Exit fullscreen mode

What can we do with that? Use __match_args__ attribute! It defines the positional parameters and its order, so for the class below we can use the same patterns like for the dataclass.

class Pair:
    __match_args__ = ["first", "second"]
    def __init__(self, first: int, second: int):
        self.first = first
        self.second = second
Enter fullscreen mode Exit fullscreen mode

Caveats

In most cases the behavior of pattern matching in Python is rather intuitive. However, there are few things which you need to be aware of to avoid bugs in your applications.

Wildcards

As we know from the previous examples, these two syntax constructs are allowed:

  • case {"age": age, **personal_data}
  • case [x, *_]

In both cases we're matching the rest elements of the container. Variable personal_data will be a dictionary containing all dict elements except "age", while *_ wildcard pattern means that we expect a list of length >= 1.

However, the syntax case {"age": age, **_} is forbidden. Why? It would be redundant as the same case {"age": age} means "a dictionary containing minimum one element - age".

Consts

The second problem is using the constants in the patterns. Look at the example below and try to guess what will be the result of its execution:

HTTP_404 = 404
HTTP_500 = 500

response = (500, "error")
match response:
    case HTTP_404, content:
        print("Not found")
    case HTTP_500, content:
        print("Server error")
Enter fullscreen mode Exit fullscreen mode

The message "Not found" will be printed! Moreover, HTTP_404 variable will hold value 500. Surprised? That's how variables binding works in the Python pattern matching.

If you want to use constants inside the pattern, you need to use so called "dotted constants". It can be a member of enum or any other class, for example:

class Error:
    HTTP_404 = 404
    HTTP_500 = 500

response = (500, "error")
match response:
    case Error.HTTP_404, content:
        print("Not found")
    case Error.HTTP_500, content:
        print("Server error")
Enter fullscreen mode Exit fullscreen mode

Now finally the program will print "Server error" and both our constants will not change the value.

Try it yourself

If you want to play around the pattern matching until the final release of Python 3.10, you can download version 3.10 alpha 6 from the official site.

Alternatively, you can use mybinder.org website, which allows to use interactive Python notebooks. Click here to open the one prepared by Guido van Rossum and using the experimental version of Python 3.10.

Summary

The pattern matching is a feature specific to the functional programming languages, like Haskell, OCaml or Scala, but it also appears in many multiparadigm languages - C#, Ruby or Rust.

Personally, I'm very glad that Python is joining to that ranks and becomes even more functional. And I can't wait to get rid of the ifs in favor of pattern matching. ;)

Links

Top comments (2)

Collapse
 
jcopella profile image
John Copella

That dotted-constant thing is infuriating and is going to trip up everyone who tries to use it.

Collapse
 
aminify profile image
aminify

I wish python had a way to declare a variable as constant, so that such constant problems wouldn't even occur