DEV Community

Sofia R
Sofia R

Posted on

A deep dive into coverage checkers for pattern matching

If you’re a programmer that uses functional languages or even some mainstream languages that recently introduced pattern matching, you probably have already seen a message like this one:

This article is about this kind of error message that saves everyone’s time when matching big sequences of patterns that must not throw a runtime exception. We are going to start looking at a paper called “Warnings for pattern match” by Luc Margaret and then we will move to a paper called “Elaborating Dependent (Co)pattern matching” by Jesper Cockx to analyze another strategy.

What exactly is coverage checking?

The coverage checker's main job is to discover if a matrix of patterns is or not covering all the possible cases, but it’s not limited to that, it can check if a row of patterns is useful or not to the rest e.g.

That corresponds to:

// Ignore these 0s and 1s,
match x, y {
    Nil, _ => 0,
    _, Nil => 1,
}

match x, y {
    Nil,        _ => 0,
    _,        Nil => 1,
    One(_),     _ => 0,
    _,     One(_) => 1,
    Cons(_,_),  _ => 0,
    _, Cons(_, _) => 1,
}
Enter fullscreen mode Exit fullscreen mode

If we consider each pattern of the matrix of type List x = Nil | One(x) | Cons(x, List x) then P is a non-exhaustive matrix due to the lack of the pattern One(_) (One _), In contrast, Q is a exhaustive matrix because it covers all possible cases.

Usefulness and Exhaustiveness

One important concept here is that the usefulness of a pattern is closely related to the concept of exhaustiveness. Let's take a look at this pattern match in rust

match x {
    true  => 1,
    false => 2,
    _     => 3
}
Enter fullscreen mode Exit fullscreen mode

That corresponds to the matrix:

The presence of _ (Wildcard) in this pattern is not useful for the rest of the matrix as this matrix is already exhausted by the presence of the true and false matches. That implies that if _ is useful for the rest of the matrix, then the matrix itself has not exhausted all possibilities. Keep that in mind because we are going to use that in one of the next sections.

Completeness of a signature

This is a concept that is useful for discovering if we should or not specialize or not our matrix, for this we only need to get the first column and get all the names of the constructors, e.g.

Here, the Sigma symbol represents the set of constructors that are in the first column of the matrix, and as we can see, it’s only true and false that constitutes a complete signature of the type Bool.

The example

We need to do a bunch of processes to discover if it’s exhaustive for a certain pattern, let's take this next example for the rest of the tutorial:

// Simplified version of a rust code.

enum Bool    { True, False }
enum List<T> { Nil, Cons(T, List<T>) }

pub fn my_test(list: List<Bool>) {
    match list {
        Cons(True, y)    => 1
        Cons(False, Nil) => 2
        Nil              => 3
  }
}
Enter fullscreen mode Exit fullscreen mode

That corresponds to

If you take a closer look you will notice that what lacks is the pattern Cons(False, Cons(_, _)) that we need to discover in the next sections!

Specialization

We specialize the tree in order to discover if it’s exhaustive for a certain pattern, for example, if we want to discover if a matrix like:

Is exhaustive for the constructor false, we simply remove the rows that do not start with false and we remove the first column of the rest. If the matrix is empty (0x0) then it’s not exhaustive for this certain pattern, if the matrix has no columns but has at least one line (Nx0), then it’s exhaustive for this pattern, e.g:

Image description

In this example, we represent the specialization of a matrix R by a constructor c as S(c, R). It looks like there’s no difference between S(false, R) and S(false, T) but the first specialization removed the line with true because we are specializing it to false resulting in 0 lines and 0 columns matrix, the second one just removed the first column because the first column matches the false resulting in a 1 line and 0 columns matrix. This difference means that the first matrix R is non-exhaustive and T is exhaustive.

So the rules that we are going to follow are the following:

  1. If the first pattern Pi1 is a constructor then we just remove the name, and we inline the constructor, e.g Cons(x, y), z would turn into x,y,z only if the name of the constructor is the same as the one that we are specializing.
  2. If the first pattern has a different name c' != c then we just remove the row.
  3. If the pattern is a wildcard then we have to clone it the same number of times as the arguments of the constructor, e.g. if we are specializing _ to Cons then we clone it twice at the beginning of the row.
  4. Ignore this one, we are not going to cover or-patterns in this tutorial.

Another type of specialization is the default matrix that is defined by

  1. If the first pattern is a constructor we simply remove the row
  2. if it’s a wildcard we remove the first pattern of the row
  3. Ignore it again 😭

Splitting

When we have a complete signature we split and specialize the matrix into multiple ones one for each constructor of the type, e.g

Will get specialized to both Cons and Nil, starting with the Cons:

The matrix C is exhaustive because the line with Nil got specialized to a 0 column but one-liner matrix and the matrix B is not finished yet so we need to split it a little bit more to find a non-complete signature and follow another procedure.

Now the matrix B has a complete signature for Bool so we split it both for True and False

Image description

The matrix D only contains variables so we must treat it as a wildcard and apply a specialization for a wildcard that will lead to a 1x0 matrix, so this branch is ok!

Image description

But the matrix E has an incomplete signature (because it lacks Cons) so we must follow another procedure: We need to get the default matrix of the matrix so

Image description

And we found a non-exhaustive case! If we go backward and try to see this branch we will find this:

  • We specialized in Cons
  • We specialized in False
  • We got the default matrix because Cons is not present
  • An empty matrix!

And then using this information we can reconstitute that the pattern that lacks is Cons(False, Cons(_, _).

The algorithm

Let's introduce a new notation loosely based on the Jesper cockx paper, This notation will include three things that we need to check the usefulness of a pattern: the matrix, the case, and the types that from now on we will call “a problem”.

Image description

  1. The first part has a symbol “⊢” that separates the types of patterns (in blue) and the name of the matrix (in red), in the other side we have | that separates the matrix name from the “case” (in green) that is something that we are checking for the usefulness right now.
  2. The pattern matrix that we are already tired of seeing in this article

The algorithm is the following:

  1. If the matrix has 0 columns and 0 lines then the case is useful for the rest of the matrix
  2. If the matrix has 0 columns but more than 0 lines then the case is useless for the rest of the matrix.
  3. If the first pattern of the case is a wildcard then
    1. If all the patterns in the first column are wildcards, we can get the default matrix of the entire problem and check recursively.
    2. If the signature of the first pattern is incomplete then we get the default matrix of the problem and we check recursively
    3. If the signature of the first pattern of the matrix is complete we have to split the problem into sub-problems of the first type of the matrix and the case by the constructor and we check recursively.
  4. If the first pattern of the case is a constructor then
    1. We just need to specialize the entire matrix and the case for this specific constructor :)

Show me the code!!

Ok, let's start by defining the type of a pattern. In this example we don’t care a lot about variables, you can just substitute them for wildcards.

-- The definition of a pattern. Here we don't care about variables because they're the same as
-- variables in our concept
data Pat
    = Wildcard
    | Constructor Label [Pat]
Enter fullscreen mode Exit fullscreen mode

Then we can define the specialization of the patterns and the matrix


-- Type synonyms for a constructor name
type Label = String

-- Here we specialize just the row based on the first pattern
specialize :: [Pat] -> Label -> Int -> [[Pat]]
specialize row label size = do
    let first = head row
    case first of
        Wildcard -> [replicate size Wildcard ++ tail row]
        Constructor name args
            | name == label -> [args ++ tail row]
            | otherwise     -> []

-- Here we specialize the entire matrix
specializeMatrix :: [[Pat]] -> Label -> Int -> [[Pat]]
specializeMatrix matrix label size = concatMap (\s -> specialize s label size) matrix
Enter fullscreen mode Exit fullscreen mode

Now we have to define types, type constructors and substitution to do instantiation

-- Types are either type variables or type constructors
data Type
    -- This type only exists in order to make the type substitution easier
    = TypeVar Int
    -- This type is a type constructor with a name and a list of arguments like `Int´ or `List x`
    | TypeCon String [Type]

-- Type substitution in order to make "instantiation"
substitute :: Int -> Type -> Type -> Type
substitute name replacement (TypeCon name' args) =
    TypeCon name' (map (substitute name replacement) args)
substitute name replacement (TypeVar name')
    | name == name' = replacement
    | otherwise     = TypeVar name'

-- Substitutes a list of patterns
substituteList :: [Type] -> Type -> Type
substituteList rep typ = foldr (uncurry substitute) typ (zip [0..] rep)
Enter fullscreen mode Exit fullscreen mode

Now we can define what’s a problem

data Problem = Problem
    { problemMatrix :: [[Pat]]
    , problemCase   :: [Pat]
    , problemTypes  :: [Type]
    }
Enter fullscreen mode Exit fullscreen mode

And a context so we can get type definitions and constructors

data Context = Context
    { ctxTypes :: [(String, [String])]
    , ctxCons  :: [(String, [Type])]
    }
Enter fullscreen mode Exit fullscreen mode

Now we can define what is a complete and incomplete signature, the function that determines if the signature is complete or not, and a function to get the first pattern label.

data Completeness
    = Complete [String]
    | Incomplete [String]

getPatCons :: Pat -> Maybe Label
getPatCons Wildcard          = Nothing
getPatCons (Constructor l _) = Just l

isSignatureComplete :: Context -> String -> [String] -> Completeness
isSignatureComplete ctx typename names =
    let names'  = Maybe.fromJust (List.lookup typename (ctxTypes ctx))
        missing = filter (`notElem` names') names
    in if null missing then Complete names else Incomplete missing
Enter fullscreen mode Exit fullscreen mode

Two functions that determine if the matrix is empty or it’s complete (has one or more lines and zero columns)

isEmpty :: Problem -> Bool
isEmpty problem = null (problemMatrix problem)

isComplete :: Problem -> Bool
isComplete problem = any null (problemMatrix problem)
Enter fullscreen mode Exit fullscreen mode

And then the default matrix function

defaultRow :: [Pat] -> [[Pat]]
defaultRow row = do
    let first = head row
    case first of
        Wildcard        -> [tail row]
        Constructor _ _ -> []

defaultMatrix :: Problem -> Problem
defaultMatrix problem =
    let matrix = concatMap defaultRow (problemMatrix problem)
        types  = tail (problemTypes problem)
        case'  = concat $ defaultRow (problemCase problem)
    in Problem matrix case' types
Enter fullscreen mode Exit fullscreen mode

Now we can specialize the entire problem for a certain constructor

typeName :: Problem -> (String, [Type])
typeName problem =
    case head (problemTypes problem) of
        TypeVar _         -> error "This should not exists here!"
        TypeCon name args -> (name, args)

specializeConstructor :: Context -> Label -> [Type] -> Problem -> Bool
specializeConstructor ctx constructor args problem =
    -- we get the type of the constructor
    let constructorTy = Maybe.fromJust (List.lookup constructor (ctxCons ctx))
        -- We specialize the problem matrix to this constructor
        problem'      = specializeMatrix (problemMatrix problem) constructor (length constructorTy)
        -- We substitute the type variables with the arguments
        typ           = map (substituteList args) constructorTy
        -- We get the types of the problem
        types'        = typ ++ tail (problemTypes problem)
        --We specialize the cases for this constructor
        cases         = concat $ specialize (problemCase problem) constructor (length constructorTy)
        -- We create the new problem
        problem''     = Problem problem' cases types'
    in useful ctx problem''
Enter fullscreen mode Exit fullscreen mode

And define the split function

split :: Context -> Label -> [Type] -> Problem -> Bool
split context typ args problem = do
    let typ' = Maybe.fromJust (List.lookup typ (ctxTypes context))
    any (\label -> specializeConstructor context label args problem) typ'
Enter fullscreen mode Exit fullscreen mode

And finally the useful function that checks if the case is useful

isWildcard :: Pat -> Bool
isWildcard Wildcard = True
isWildcard _        = False

isWildcardRow :: [Pat] -> Bool
isWildcardRow = isWildcard . head

isWildCardMatrix :: [[Pat]] -> Bool
isWildCardMatrix = all isWildcardRow

useful :: Context -> Problem -> Bool
useful ctx problem
    | isEmpty problem = True
    | isComplete problem = False
    | otherwise          = do
        let (name, args) = typeName problem
        case head (problemCase problem) of
            Wildcard ->
                if isWildCardMatrix (problemMatrix problem)
                    then useful ctx (defaultMatrix problem)
                    else let names = Maybe.mapMaybe (getPatCons . head) (problemMatrix problem) in
                         case isSignatureComplete ctx name names of
                            Complete _   -> split ctx name args problem
                            Incomplete _ -> useful ctx (defaultMatrix problem)
            Constructor name' _ -> specializeConstructor ctx name' args problem
Enter fullscreen mode Exit fullscreen mode

We can test it using something like:

main :: IO ()
main =  do
    let ctx = Context
            { ctxTypes = [ ("List", ["Nil", "Cons"])
                         , ("Bool", ["True", "False"])]
            , ctxCons  = [ ("Nil", [])
                         , ("Cons", [TypeVar 0, TypeCon "List" [TypeVar 0]])
                         , ("True", [])
                         , ("False", [])]
            }

    let problem = Problem
            { problemMatrix = [[Constructor "Cons" [Constructor "True" [], Wildcard]]
                              ,[Constructor "Cons" [Constructor "False" [], Constructor "Nil" []]]
                              ,[Constructor "Nil" []]
                              ,[Constructor "Cons" [Constructor "False" [], Constructor "Cons" [Wildcard, Wildcard]]]]
            , problemCase   = [Wildcard]
            , problemTypes  = [TypeCon "List" [TypeCon "Bool" []]]
            }

    print $ useful ctx problem
Enter fullscreen mode Exit fullscreen mode

And that’s all for today. In the next part, I’ll introduce the Jesper Cockx algorithm for checking coverage in the presence of dependent types :)

Top comments (0)