DEV Community

Cover image for Python: Pattern Matching Proposal
Swastik Baranwal
Swastik Baranwal

Posted on

Python: Pattern Matching Proposal

As you Python language keeps evolving every time and adding new features and proposals. This time Python will be introducing pattern matching i.e. match statement.

Origins

The work has several origins:

  • Many statically compiled languages (especially functional ones) have a match expression, for example Scala, Rust, F#;
  • Several extensive discussions on python-ideas, culminating in a summarizing blog post by Tobias Kohn;
  • An independently developed draft PEP by Ivan Levkivskyi.

Draft

This draft is made by Guido Van Rossum which can be found here.

Semantics

The proposed large scale semantics for choosing the match is to choose the first matching pattern and execute the corresponding suite. The remaining patterns are not tried. If there are no matching patterns, the statement 'falls through', and execution continues at the following statement.

Essentially this is equivalent to a chain of if ... elif ... else statements. Note that unlike for the previously proposed switch statement, the pre-computed dispatch dictionary semantics does not apply here.

There is no default or else case - instead the special wildcard _ can be used (see the section on name_pattern) as a final 'catch-all' pattern.

Syntax

Literal Pattern

match number:
    case 0:
        print("Nothing")
    case 1:
        print("Just one")
    case 2:
        print("A couple")
    case -1:
        print("One less than nothing")
    case 1-1j:
        print("Good luck with that...")
Enter fullscreen mode Exit fullscreen mode

Raw strings and byte strings are supported. F-strings are not allowed (since in general they are not really literals).

Name Pattern

A name pattern serves as an assignment target for the matched expression:

match greeting:
    case "":
        print("Hello!")
    case name:
        print(f"Hi {name}!")
Enter fullscreen mode Exit fullscreen mode

Constant Value Pattern

It used to match against constants and enum values. Every dotted name in a pattern is looked up using normal Python name resolution rules, and the value is used for comparison by equality with the matching expression (same as for literals).

from enum import Enum

class Color(Enum):
    BLACK = 1
    RED = 2

BLACK = 1
RED = 2

match color:
    case .BLACK | Color.BLACK:
        print("Black suits every color")
    case BLACK:  # This will just assign a new value to BLACK.
        ...
Enter fullscreen mode Exit fullscreen mode

Sequence Pattern

A sequence pattern follows the same semantics as unpacking assignment. Like unpacking assignment, both tuple-like and list-like syntax can be used, with identical semantics. Each element can be an arbitrary pattern; there may also be at most one *name pattern to catch all remaining items:

match collection:
    case 1, [x, *others]:
        print("Got 1 and a nested sequence")
    case (1, x):
        print(f"Got 1 and {x}")
Enter fullscreen mode Exit fullscreen mode

To match a sequence pattern the target must be an instance of collections.abc.Sequence, and it cannot be any kind of string (str, bytes, bytearray). It cannot be an iterator.

The _wildcard can be starred to match sequences of varying lengths. For example:

  • [*_] matches a sequence of any length.
  • (_, _, *_), matches any sequence of length two or more.
  • ["a", *_, "z"] matches any sequence of length two or more that starts with "a" and ends with "z".

Mapping Pattern

Mapping pattern is a generalization of iterable unpacking to mappings. Its syntax is similar to dictionary display but each key and value are patterns "{" (pattern ":" pattern)+ "}". A **name pattern is also allowed, to extract the remaining items. Only literal and constant value patterns are allowed in key positions:

import constants

match config:
    case {"route": route}:
        process_route(route)
    case {constants.DEFAULT_PORT: sub_config, **rest}:
        process_config(sub_config, rest)
Enter fullscreen mode Exit fullscreen mode

The target must be an instance of collections.abc.Mapping. Extra keys in the target are ignored even if **rest is not present. This is different from sequence pattern, where extra items will cause a match to fail. But mappings are actually different from sequences: they have natural structural sub-typing behavior, i.e., passing a dictionary with extra keys somewhere will likely just work.

For this reason, **_ is invalid in mapping patterns; it would always be a no-op that could be removed without consequence.

Class Pattern

A class pattern provides support for destructuring arbitrary objects. There are two possible ways of matching on object attributes: by position like Point(1, 2), and by name like User(id=id, name="Guest"). These two can be combined, but positional match cannot follow a match by name. Each item in a class pattern can be an arbitrary pattern. A simple example:

match shape:
    case Point(x, y):
        ...
    case Rectangle(x0, y0, x1, y1, painted=True):
        ...
Enter fullscreen mode Exit fullscreen mode

Whether a match succeeds or not is determined by calling a special __match__() method on the class named in the pattern (Point and Rectangle in the example), with the value being matched (shape) as the only argument. If the method returns None, the match fails, otherwise the match continues with respect to. attributes of the returned proxy object, see details in runtime section.

This PEP only fully specifies the behavior of match() for object and some builtin and standard library classes, custom classes are only required to follow the protocol specified in runtime section.

Combining Multiple Patterns

Multiple alternative patterns can be combined into one using |. This means the the whole pattern matches if at least one alternative matches. Alternatives are tried from left to right and have short-circuit property, subsequent patterns are not tried if one matched. Like:

match something:
    case 0 | 1 | 2:
        print("Small number")
    case [] | [_]:
        print("A short sequence")
    case str() | bytes():
        print("Something string-like")
    case _:
        print("Something else")
Enter fullscreen mode Exit fullscreen mode

The alternatives may bind variables, as long as each alternative binds the same set of variables (excluding _). For example:

match something:
    case 1 | x:  # Error!
        ...
    case x | 1:  # Error!
        ...
    case one := [1] | two := [2]:  # Error!
        ...
    case Foo(arg=x) | Bar(arg=x):  # Valid, both arms bind 'x'
        ...
    case [x] | x:  # Valid, both arms bind 'x'
        ...
Enter fullscreen mode Exit fullscreen mode

Guards

Each top-level pattern can be followed by a guard of the form if expression. A case clause succeeds if the pattern matches and the guard evaluates to true value. For example:

match input:
    case [x, y] if x > MAX_INT and y > MAX_INT:
        print("Got a pair of large numbers")
    case x if x > MAX_INT:
        print("Got a large number")
    case [x, y] if x == y:
        print("Got equal items")
    case _:
        print("Not an outstanding input")
Enter fullscreen mode Exit fullscreen mode

If evaluating a guard raises an exception, it is propagated on-wards rather than fail the case clause. Names that appear in a pattern are bound before the guard succeeds. So this will work:

values = [0]

match value:
    case [x] if x:
        ...  # This is not executed
    case _:
        ...
print(x)  # This will print "0"
Enter fullscreen mode Exit fullscreen mode

Note that guards are not allowed for nested patterns, so that [x if x > 0] is a SyntaxError and 1 | 2 if 3 | 4 will be parsed as (1 | 2) if (3 | 4).

Named sub-patterns

It is often useful to match a sub-pattern and to bind the corresponding value to a name. For example, it can be useful to write more efficient matches, or simply to avoid repetition. To simplify such cases, a name pattern can be combined with another arbitrary pattern using named sub-patterns of the form name := pattern. For example:

match get_shape():
    case Line(start := Point(x, y), end) if start == end:
        print(f"Zero length line at {x}, {y}")
Enter fullscreen mode Exit fullscreen mode

Note that the name pattern used in the named sub-pattern can be used in the match suite, or after the match statement. However, the name will only be bound if the sub-pattern succeeds. Another example:

match group_shapes():
    case [], [point := Point(x, y), *other]:
        print(f"Got {point} in the second group")
        process_coordinates(x, y)
        ...
Enter fullscreen mode Exit fullscreen mode

Technically, most such examples can be rewritten using guards and/or nested match statements, but this will be less readable and/or will produce less efficient code.

_ is not a valid name here.

More

This article only covers the main features and syntax. For more information please refer to:

GitHub logo gvanrossum / patma

Pattern Matching

Pattern Matching

Binder

This repo contains an issue tracker, examples, and early work related to PEP 622: Structural Pattern Matching. The current version of the proposal is PEP 634 which was accepted by the Steering Council on February 8, 2021 The motivation and rationale are written up in PEP 635 and a tutorial is in PEP 636. The tutorial below is also included in PEP 636 as Appendix A.

Updates to the PEPs should be made in the PEPs repo.

Origins

The work has several origins:

  • Many statically compiled languages (especially functional ones) have a match expression, for example Scala, Rust, F#;
  • Several extensive discussions on python-ideas, culminating in a summarizing blog post by Tobias Kohn;
  • An independently developed draft PEP by Ivan Levkivskyi.

Implementation

A full reference implementation written by Brandt Bucher is available as a fork of the CPython repo…

Note

This is just a proposal only so minor things will change but most of the design is ready. You can check out the issues listed there.

Discussion (0)