Shrijith Venkatramana

Posted on May 30

Lean4 Might Be the Missing Piece in AI: Why Theorem Provers Are Suddenly Everywhere

#ai #programming #webdev #productivity

Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

Most discussions about AI focus on larger models, larger datasets, and larger GPUs.

But there is an uncomfortable reality that every engineer building production AI systems eventually runs into:

LLMs can produce convincing answers, but they cannot guarantee correctness.

Ask an LLM to write code, reason about a distributed system, derive a mathematical formula, or analyze a security protocol. The result might be brilliant. It might also be subtly wrong.

The problem isn't intelligence.

The problem is verification.

That is why a relatively obscure technology from the world of formal methods is suddenly attracting attention:

Lean4.

A theorem prover originally designed for mathematicians is increasingly being viewed as a way to build AI systems that can not only generate answers, but actually prove that those answers are correct.

Let's look at what Lean4 is, how it works, and why some researchers believe theorem provers may become a critical layer in future AI systems.

The Fundamental Problem: LLMs Don't Know What's True

Large language models operate by predicting likely sequences of tokens.

That sounds obvious, but the implications are important.

When ChatGPT generates a response, it isn't checking whether a statement is true.

It is generating text that statistically resembles text associated with the prompt.

Consider a simple coding example:

def is_sorted(arr):
    return all(arr[i] < arr[i+1]
               for i in range(len(arr)-1))

Looks reasonable.

But there is a subtle bug.

[1,1,2,3]

is sorted, yet the function returns False because it uses < instead of <=.

Many tests might pass.

A code reviewer might miss it.

An LLM might confidently explain why the implementation is correct.

None of these establish correctness.

Testing can show the presence of bugs.

It cannot prove the absence of bugs.

That distinction is what theorem proving is about.

What Exactly Is Lean4?

Lean4 is two things:

A programming language
A theorem prover

The theorem prover part is the interesting piece.

Instead of writing code and then testing it, you describe properties that must always hold.

Lean then requires a mathematical proof that those properties are true.

For example, consider a simple theorem:

For every natural number n, n + 0 = n

In Lean this becomes something that must be formally proven.

The system does not accept hand-wavy reasoning.

Every logical step must be justified.

If any step is invalid, the proof fails.

This is fundamentally different from traditional software validation.

Traditional testing:

Input A -> Pass
Input B -> Pass
Input C -> Pass

Formal proof:

For all valid inputs:
    Property P always holds

The theorem checker verifies the proof mechanically.

No intuition.

No assumptions.

No trust.

Only proof.

Why Lean Feels Different From Traditional Formal Methods

Formal verification has existed for decades.

Historically it suffered from two problems:

Tools were difficult to use
Formalization was extremely expensive

Lean changes the equation in several ways.

First, it is designed as a practical programming language.

Second, it has a large ecosystem called Mathlib containing thousands of formally verified definitions and theorems.

Instead of proving everything from scratch, developers can build on existing verified foundations.

For example:

Natural numbers
Integers
Groups
Rings
Calculus
Probability
Linear algebra

Much of this already exists inside the ecosystem.

This makes Lean feel closer to software engineering than traditional theorem proving systems.

You are often composing verified building blocks rather than creating everything from first principles.

The AI + Lean Workflow Is What Makes This Interesting

The most exciting development is not Lean itself.

It's the combination of Lean and LLMs.

Think about the typical AI workflow today:

Prompt
    ↓
LLM
    ↓
Answer

Now compare that with an emerging architecture:

Prompt
    ↓
LLM
    ↓
Candidate Solution
    ↓
Lean
    ↓
Verification
    ↓
Accepted / Rejected

The LLM becomes a generator.

Lean becomes a verifier.

This separation is powerful.

Humans already work this way.

A mathematician may invent a proof.

A journal referee verifies it.

An engineer may write code.

Tests verify it.

An architect proposes a design.

Structural calculations verify it.

The same pattern can apply to AI systems.

Generation and verification become separate concerns.

A Concrete Example: Finding Bugs Automatically

Imagine an LLM generating a sorting algorithm.

The desired property is:

For any list L:

sort(L) returns:
    1. A permutation of L
    2. Elements in non-decreasing order

An LLM might generate:

def sort(xs):
    return sorted(set(xs))

At first glance it appears to work.

But duplicates disappear.

[1,1,2]

becomes:

[1,2]

The algorithm violates the permutation property.

A theorem prover can catch this immediately.

The interesting part is that verification is not based on finding a counterexample through testing.

The proof obligation itself fails.

The algorithm cannot be proven correct.

This is fundamentally stronger than conventional testing approaches.

Why This Matters Beyond Mathematics

Many people hear "theorem prover" and assume this is only useful for mathematicians.

That is increasingly false.

Formal verification is already used in areas such as:

Compilers

The famous CompCert compiler demonstrates that compiler correctness can be formally proven.

Cryptography

Security protocols often rely on formal proofs.

A tiny mistake can compromise billions of dollars.

Aerospace

Flight control systems require exceptionally high confidence.

Finance

Smart contracts and trading infrastructure can benefit from machine-checked guarantees.

AI Agents

Agents increasingly perform actions instead of merely generating text.

As autonomy increases, verification becomes more valuable.

The more expensive a mistake becomes, the more attractive formal guarantees become.

The Bigger Picture: Probabilistic Intelligence + Deterministic Verification

There is a tendency to think of theorem provers and LLMs as competing technologies.

They're not.

In many ways they complement each other.

LLMs are excellent at:

Search
Exploration
Creativity
Pattern matching
Generating candidate solutions

Theorem provers are excellent at:

Verification
Correctness
Logical consistency
Mathematical guarantees

One generates.

The other validates.

A useful analogy is software development itself.

We don't replace programmers with compilers.

We use compilers to verify what programmers produce.

Future AI systems may look similar:

LLM = Generator

Theorem Prover = Verifier

The combination is potentially far more powerful than either component alone.

Final Thoughts

For years the AI industry has largely optimized for capability.

Can the model write code?

Can it solve math problems?

Can it reason?

Those are important questions.

But another question is becoming increasingly important:

How do we know the answer is actually correct?

Theorem provers such as Lean4 offer one possible answer.

They provide a mechanism for transforming "the model thinks this is right" into "this has been formally verified."

Whether Lean itself becomes dominant remains to be seen.

But the broader idea—combining probabilistic generation with formal verification—feels less like a niche research direction and more like a plausible next step in the evolution of AI systems.

What do you think?

Will theorem provers become a standard component of future AI stacks, or will they remain specialized tools used only in high-assurance domains?

*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a…

View on GitHub

Top comments (1)

Harjot Singh • May 31

Theorem provers showing up in AI is one of the more exciting threads, it's the strongest possible answer to "how do you know the output is correct." For most generated code you settle for tests and types as a cheap proxy; Lean-style verification is the real thing where it's tractable. The catch is exactly tractability, formal proof is gold for the narrow core (a parser, a state machine, a protocol) but you can't prove a whole CRUD app. The pragmatic stack is formal where it pays, deterministic checks plus an independent judge everywhere else. That layered correctness is how I think about it in Moonshift. Where do you see provers landing first in practice, spec/contracts or critical algorithms?