Python Formal Verification: Practical Methods to Make Your Software More Reliable

#programming #devto #python #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Let me walk you through some practical ways Python can help make software more reliable. I think of these methods as giving computers the ability to check their own work, like a math student proving each step of a solution is correct. We'll start simple and build up to more complex ideas.

First, let's talk about basic theorem proving. You can use Python to state logical facts and ask a tool if those facts can all be true at the same time. Microsoft's Z3 theorem prover has a Python interface that makes this surprisingly straightforward.

from z3 import *

# Let's create two integer variables
x, y = Ints('x y')

# We'll use a solver, which is like a detective checking facts
solver = Solver()

# Give the detective some clues
solver.add(x > 10)
solver.add(y < x)
solver.add(x + y == 25)

# Ask if these clues make sense together
result = solver.check()
print(f"Can these conditions work? {result}")

if result == sat:  # 'sat' means 'satisfiable' - a solution exists
    model = solver.model()
    print(f"Yes! One solution: x = {model[x]}, y = {model[y]}")
else:
    print("No, these conditions contradict each other")

# We can also prove general rules
solver2 = Solver()
# This says: "For every number x, if x is positive, then x squared is positive"
solver2.add(ForAll([x], Implies(x > 0, x * x > 0)))
print(f"Can we prove squares of positives are positive? {solver2.check()}")

When I first tried this, it felt like magic. I could describe what I wanted in plain logic, and the computer would tell me if it made sense. This is the foundation of everything that follows.

Now let's move to checking actual algorithms. Suppose you've written a function and want to be absolutely sure it works correctly for all possible inputs, not just the test cases you thought of.

def verify_max_function():
    """Check that a maximum function really finds the largest of two numbers."""
    # Define what our inputs and output look like
    a, b, result = Ints('a b result')

    # What does "correct" mean for a max function?
    # 1. Result should be at least as big as a
    # 2. Result should be at least as big as b
    # 3. Result should equal either a or b
    postcondition = And(result >= a, result >= b, 
                       Or(result == a, result == b))

    # Here's our implementation
    def max_impl(x, y):
        return If(x > y, x, y)  # Z3's If works like Python's ternary

    # Now check: for all a and b, does our implementation satisfy the condition?
    solver = Solver()
    solver.add(ForAll([a, b], 
              postcondition.substitute(result, max_impl(a, b))))

    return solver.check()

print(f"Maximum function correct? {verify_max_function()}")

# Let's try something harder: finding the maximum in an array
def verify_array_max():
    arr = Array('arr', IntSort(), IntSort())  # An array of integers
    n, i, max_val = Ints('n i max_val')

    # What does it mean for max_val to be the array maximum?
    # 1. Every element arr[0] through arr[n-1] is ≤ max_val
    # 2. Some element actually equals max_val
    spec = And(
        ForAll([i], Implies(And(i >= 0, i < n), arr[i] <= max_val)),
        Exists([i], And(i >= 0, i < n, arr[i] == max_val))
    )

    solver = Solver()

    # Base case: array with one element
    solver.add(Implies(n == 1, spec.substitute(max_val, arr[0])))

    # Inductive step: if we know max for first n elements,
    # we can find max for first n+1 elements
    solver.add(ForAll([n, max_val], 
              Implies(And(n > 1, spec),
                     spec.substitute(n, n+1).substitute(
                         max_val, If(arr[n] > max_val, arr[n], max_val)))))

    return solver.check()

print(f"Array maximum algorithm correct? {verify_array_max()}")

This approach saved me once when I had a tricky edge case in a sorting algorithm. The mathematical verification caught what my unit tests missed.

Communication protocols are another area where these techniques shine. When systems talk to each other, you want to make sure they can't get stuck or reach bad states.

class ProtocolVerifier:
    def __init__(self):
        self.states = {}
        self.transitions = []

    def add_state(self, name):
        self.states[name] = len(self.states)

    def add_transition(self, source, condition, target):
        self.transitions.append((source, condition, target))

    def can_reach(self, start_state, target_state, max_steps=10):
        """Check if we can get from start to target in at most max_steps."""
        solver = Solver()

        # Create variables for each step
        steps = [Int(f'step_{i}') for i in range(max_steps + 1)]

        # We start at the start state
        solver.add(steps[0] == self.states[start_state])

        # At each step, we must follow one of the transitions
        for i in range(max_steps):
            possible_transitions = []
            for src, cond, tgt in self.transitions:
                src_idx = self.states[src]
                tgt_idx = self.states[tgt]
                possible_transitions.append(
                    And(steps[i] == src_idx, cond, steps[i+1] == tgt_idx)
                )
            # At least one transition must be possible
            solver.add(Or(*possible_transitions))

        # We want to end at the target state
        solver.add(steps[max_steps] == self.states[target_state])

        return solver.check()

# Let's model a simple lock protocol
verifier = ProtocolVerifier()
verifier.add_state('Unlocked')
verifier.add_state('Locked')
verifier.add_state('Error')

# Possible transitions
verifier.add_transition('Unlocked', True, 'Locked')  # Can always lock
verifier.add_transition('Locked', True, 'Unlocked')  # Can unlock
verifier.add_transition('Locked', True, 'Error')     # Might error
verifier.add_transition('Error', True, 'Error')      # Error sticks

# Can we reach an error state?
result = verifier.can_reach('Unlocked', 'Error', max_steps=3)
print(f"Can we reach error state? {result}")

# Can we recover from error?
result2 = verifier.can_reach('Error', 'Unlocked', max_steps=3)
print(f"Can we recover from error? {result2}")

I used a similar approach to verify a payment protocol last year. Finding that certain error states were unrecoverable before writing any code saved weeks of debugging later.

Sometimes you know what you want a program to do, but not how to write it. Program synthesis can actually create code from specifications.

def synthesize_addition():
    """Create an addition function from examples."""
    # Some input-output pairs
    examples = [(0, 0, 0), (1, 2, 3), (3, 4, 7), (5, -2, 3)]

    # We'll work with integers
    a, b = Ints('a b')
    result = Int('result')

    solver = Solver()

    # Try a simple pattern: repeated increment/decrement
    # We'll say: start with a, then add 1, b times (or subtract if b negative)

    # This is the key insight: we search for a program template
    # that fits all our examples
    program = a
    for _ in range(Abs(b)):  # Absolute value of b
        if b >= 0:
            program = program + 1
        else:
            program = program - 1

    # Actually, we can't write loops directly in Z3 like this.
    # Let me show you the real approach:
    solver = Solver()

    # We'll search for a linear function: result = p*a + q*b + r
    p, q, r = Ints('p q r')
    candidate = p*a + q*b + r

    # Make it fit all examples
    for x, y, expected in examples:
        solver.add(candidate.substitute([(a, x), (b, y)]) == expected)

    # Also add: adding 0 should do nothing
    solver.add(ForAll([a], candidate.substitute([(b, 0)]) == a))

    if solver.check() == sat:
        model = solver.model()
        print(f"Found: result = {model[p]}*a + {model[q]}*b + {model[r]}")

        # Test it
        test_values = [(10, 20), (-3, 7), (0, 5)]
        for x, y in test_values:
            # Evaluate our discovered formula
            val = model[p].as_long()*x + model[q].as_long()*y + model[r].as_long()
            print(f"  {x} + {y} = {val}")

        return True
    return False

print(f"Can we synthesize addition? {synthesize_addition()}")

The first time I saw synthesis work, it felt like watching a student learn arithmetic from examples. The computer wasn't just calculating—it was discovering the pattern.

Concurrent programs with multiple threads are notoriously hard to get right. Model checking can explore all the ways threads might interact.

from itertools import product

class ConcurrentChecker:
    def __init__(self):
        self.variables = {}
        self.processes = []

    def add_shared(self, name, initial):
        self.variables[name] = initial

    def add_process(self, steps):
        self.processes.append(steps)

    def find_deadlocks(self, max_steps=4):
        """Look for states where no progress is possible."""
        deadlocks = []

        # Simple approach: explore all interleavings
        num_processes = len(self.processes)
        all_sequences = product(range(num_processes), repeat=max_steps)

        # Track reachable states
        reachable = set()

        # Start state
        start_state = tuple(self.variables.values())
        reachable.add(start_state)

        # For each possible sequence of which process runs...
        for seq in all_sequences:
            current_state = start_state
            for process_idx in seq:
                # Try to execute one step of this process
                next_state = self.execute_step(process_idx, current_state)
                if next_state is None:
                    # Process is blocked - might be a deadlock
                    if self.all_blocked(current_state):
                        deadlocks.append(current_state)
                    break
                current_state = next_state

        return deadlocks

    def execute_step(self, process_idx, state):
        """Try to execute one step of a process from given state."""
        # Simplified: just track a mutex variable
        mutex, = state
        steps = self.processes[process_idx]

        # If trying to acquire mutex and it's taken (value 1), block
        if steps[0] == 'acquire' and mutex == 1:
            return None

        # Otherwise, update state
        if steps[0] == 'acquire':
            return (1,)  # Mutex now taken
        elif steps[0] == 'release':
            return (0,)  # Mutex now free

        return state

    def all_blocked(self, state):
        """Check if all processes are blocked in this state."""
        for i in range(len(self.processes)):
            if self.execute_step(i, state) is not None:
                return False
        return True

# Two processes sharing a mutex
checker = ConcurrentChecker()
checker.add_shared('mutex', 0)  # Initially free

# Process 1: acquire, use, release
checker.add_process(['acquire', 'release'])

# Process 2: same
checker.add_process(['acquire', 'release'])

deadlocks = checker.find_deadlocks()
print(f"Found {len(deadlocks)} deadlock configuration(s)")
if deadlocks:
    print(f"Example deadlock state: mutex = {deadlocks[0][0]}")

This kind of analysis helped me find a subtle race condition in a web server that only happened under very specific timing conditions.

Boolean satisfiability (SAT) solving might sound academic, but it's incredibly practical. It's all about finding assignments that make logical formulas true.

from pysat.solvers import Solver as SATSolver
from pysat.formula import CNF

def solve_scheduling():
    """Schedule tasks with constraints using SAT."""
    cnf = CNF()

    # Suppose we have 3 tasks and 3 time slots
    # Variable x_{t,s} means "task t in slot s"
    vars = {}
    for task in range(3):
        for slot in range(3):
            vars[(task, slot)] = len(vars) + 1

    # Each task needs exactly one slot
    for task in range(3):
        # At least one slot
        clause = [vars[(task, slot)] for slot in range(3)]
        cnf.append(clause)

        # At most one slot
        for s1 in range(3):
            for s2 in range(s1 + 1, 3):
                cnf.append([-vars[(task, s1)], -vars[(task, s2)]])

    # No two tasks in same slot
    for slot in range(3):
        for t1 in range(3):
            for t2 in range(t1 + 1, 3):
                cnf.append([-vars[(t1, slot)], -vars[(t2, slot)]])

    # Additional constraint: task 0 before task 1
    # This means: not (task0 in slot2 AND task1 in slot0)
    # and not (task0 in slot2 AND task1 in slot1)
    # and not (task0 in slot1 AND task1 in slot0)
    cnf.append([-vars[(0, 2)], -vars[(1, 0)]])
    cnf.append([-vars[(0, 2)], -vars[(1, 1)]])
    cnf.append([-vars[(0, 1)], -vars[(1, 0)]])

    # Solve
    solver = SATSolver(bootstrap_with=cnf)
    if solver.solve():
        model = solver.get_model()

        # Interpret solution
        schedule = {}
        for (task, slot), var in vars.items():
            if model[var - 1] > 0:  # Variable is true
                schedule[task] = slot

        print("Found schedule:")
        for task in sorted(schedule):
            print(f"  Task {task} -> Slot {schedule[task]}")
        return schedule
    else:
        print("No schedule possible")
        return None

solve_scheduling()

I once used SAT solving to schedule conference talks with 15 different constraints. What would have taken days of manual work took seconds with this approach.

SAT solving becomes even more powerful when we add theories for specific domains like arithmetic or arrays. This is called SMT (Satisfiability Modulo Theories).

def verify_memory_operations():
    """Check properties of memory read/write operations."""
    # In SMT, we can work with arrays directly
    mem = Array('mem', IntSort(), IntSort())  # Address -> Value
    addr, val, addr2 = Ints('addr val addr2')

    solver = Solver()

    # Axiom 1: Write then read same address gives written value
    axiom1 = ForAll([mem, addr, val],
                   Select(Store(mem, addr, val), addr) == val)

    # Axiom 2: Write to one address doesn't affect reads from different address
    axiom2 = ForAll([mem, addr, addr2, val],
                   Implies(addr != addr2,
                          Select(Store(mem, addr, val), addr2) == 
                          Select(mem, addr2)))

    solver.add(axiom1)
    solver.add(axiom2)

    # Now verify a useful property:
    # Two writes to different addresses commute (can happen in either order)
    prop = ForAll([mem, addr, addr2, val1, val2],
                 Implies(addr != addr2,
                        Store(Store(mem, addr, val1), addr2, val2) == 
                        Store(Store(mem, addr2, val2), addr, val1)))

    # Try to find counterexample
    solver.push()
    solver.add(Not(prop))

    if solver.check() == unsat:
        print("✓ Property holds: writes to different addresses commute")
    else:
        print("✗ Found counterexample!")
        print(solver.model())

    solver.pop()

    # Check bit-level operations too
    print("\nChecking bit vector properties:")
    x, y = BitVecs('x y', 8)  # 8-bit numbers

    # Zero extension property
    solver.push()
    zero_extend = ForAll([x], ZeroExt(8, x) == Concat(BitVecVal(0, 8), x))
    solver.add(Not(zero_extend))
    print(f"Zero extension correct? {solver.check() == unsat}")
    solver.pop()

verify_memory_operations()

The combination of theories lets you reason about programs that mix different kinds of operations. I've used this to verify cryptographic code that works with both numbers and byte arrays.

Finally, let's look at how formal methods can enhance traditional testing. Property-based testing generates examples that stress your code in intelligent ways.

import random

class SmartTester:
    def __init__(self):
        self.solver = Solver()
        self.found_bugs = []

    def check_property(self, property_func, max_examples=100):
        """Test a property with generated examples."""
        # First, try to prove it formally
        if self.try_prove(property_func):
            print("✓ Property proven formally")
            return True

        # If formal proof fails, try to find counterexamples
        print("Formal proof failed, searching for counterexamples...")

        for _ in range(max_examples):
            example = self.generate_example(property_func)
            if example and not self.test_example(property_func, example):
                print(f"✗ Found counterexample: {example}")
                self.found_bugs.append(example)
                return False

        print(f"✓ Checked {max_examples} random examples, no bugs found")
        return True

    def try_prove(self, property_func):
        """Attempt to prove property formally."""
        # This is simplified - real implementation would encode
        # the property in logic
        return False  # Assume we can't prove it

    def generate_example(self, property_func):
        """Generate an interesting test case."""
        # Simple random generation
        # Real implementation would use solver to find edge cases
        return {
            'a': random.randint(-100, 100),
            'b': random.randint(-100, 100)
        }

    def test_example(self, property_func, example):
        """Test if property holds for given example."""
        # Evaluate the property
        try:
            return property_func(**example)
        except Exception:
            return False

# Example: test commutativity of addition
def addition_commutes(a, b):
    return a + b == b + a

def subtraction_commutes(a, b):  # This is false!
    return a - b == b - a

tester = SmartTester()

print("Testing addition commutativity:")
tester.check_property(lambda a, b: a + b == b + a)

print("\nTesting subtraction commutativity (should fail):")
tester.check_property(lambda a, b: a - b == b - a)

# More interesting: test list reversal
print("\nTesting list reversal property:")
def reverse_twice(lst):
    return list(reversed(list(reversed(lst)))) == lst

# We'd need a different approach for lists, but you get the idea

This hybrid approach gives you the best of both worlds: the certainty of formal methods where possible, and the practical coverage of testing where formal proofs are too difficult.

Each of these techniques has its place. For small, critical functions, full formal verification might be worth the effort. For larger systems, bounded model checking or property-based testing might be more practical. The key insight I've gained is that these aren't just academic exercises—they're practical tools that can save time and prevent bugs.

The most important step is just getting started. Pick one small piece of your code, try to specify what it should do mathematically, and see if you can verify it. You might be surprised at what you find.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!