Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
Most discussions about AI focus on larger models, larger datasets, and larger GPUs.
But there is an uncomfortable reality that every engineer building production AI systems eventually runs into:
LLMs can produce convincing answers, but they cannot guarantee correctness.
Ask an LLM to write code, reason about a distributed system, derive a mathematical formula, or analyze a security protocol. The result might be brilliant. It might also be subtly wrong.
The problem isn't intelligence.
The problem is verification.
That is why a relatively obscure technology from the world of formal methods is suddenly attracting attention:
Lean4.
A theorem prover originally designed for mathematicians is increasingly being viewed as a way to build AI systems that can not only generate answers, but actually prove that those answers are correct.
Let's look at what Lean4 is, how it works, and why some researchers believe theorem provers may become a critical layer in future AI systems.
The Fundamental Problem: LLMs Don't Know What's True
Large language models operate by predicting likely sequences of tokens.
That sounds obvious, but the implications are important.
When ChatGPT generates a response, it isn't checking whether a statement is true.
It is generating text that statistically resembles text associated with the prompt.
Consider a simple coding example:
def is_sorted(arr):
return all(arr[i] < arr[i+1]
for i in range(len(arr)-1))
Looks reasonable.
But there is a subtle bug.
[1,1,2,3]
is sorted, yet the function returns False because it uses < instead of <=.
Many tests might pass.
A code reviewer might miss it.
An LLM might confidently explain why the implementation is correct.
None of these establish correctness.
Testing can show the presence of bugs.
It cannot prove the absence of bugs.
That distinction is what theorem proving is about.
What Exactly Is Lean4?
Lean4 is two things:
- A programming language
- A theorem prover
The theorem prover part is the interesting piece.
Instead of writing code and then testing it, you describe properties that must always hold.
Lean then requires a mathematical proof that those properties are true.
For example, consider a simple theorem:
For every natural number n, n + 0 = n
In Lean this becomes something that must be formally proven.
The system does not accept hand-wavy reasoning.
Every logical step must be justified.
If any step is invalid, the proof fails.
This is fundamentally different from traditional software validation.
Traditional testing:
Input A -> Pass
Input B -> Pass
Input C -> Pass
Formal proof:
For all valid inputs:
Property P always holds
The theorem checker verifies the proof mechanically.
No intuition.
No assumptions.
No trust.
Only proof.
Why Lean Feels Different From Traditional Formal Methods
Formal verification has existed for decades.
Historically it suffered from two problems:
- Tools were difficult to use
- Formalization was extremely expensive
Lean changes the equation in several ways.
First, it is designed as a practical programming language.
Second, it has a large ecosystem called Mathlib containing thousands of formally verified definitions and theorems.
Instead of proving everything from scratch, developers can build on existing verified foundations.
For example:
Natural numbers
Integers
Groups
Rings
Calculus
Probability
Linear algebra
Much of this already exists inside the ecosystem.
This makes Lean feel closer to software engineering than traditional theorem proving systems.
You are often composing verified building blocks rather than creating everything from first principles.
The AI + Lean Workflow Is What Makes This Interesting
The most exciting development is not Lean itself.
It's the combination of Lean and LLMs.
Think about the typical AI workflow today:
Prompt
↓
LLM
↓
Answer
Now compare that with an emerging architecture:
Prompt
↓
LLM
↓
Candidate Solution
↓
Lean
↓
Verification
↓
Accepted / Rejected
The LLM becomes a generator.
Lean becomes a verifier.
This separation is powerful.
Humans already work this way.
A mathematician may invent a proof.
A journal referee verifies it.
An engineer may write code.
Tests verify it.
An architect proposes a design.
Structural calculations verify it.
The same pattern can apply to AI systems.
Generation and verification become separate concerns.
A Concrete Example: Finding Bugs Automatically
Imagine an LLM generating a sorting algorithm.
The desired property is:
For any list L:
sort(L) returns:
1. A permutation of L
2. Elements in non-decreasing order
An LLM might generate:
def sort(xs):
return sorted(set(xs))
At first glance it appears to work.
But duplicates disappear.
[1,1,2]
becomes:
[1,2]
The algorithm violates the permutation property.
A theorem prover can catch this immediately.
The interesting part is that verification is not based on finding a counterexample through testing.
The proof obligation itself fails.
The algorithm cannot be proven correct.
This is fundamentally stronger than conventional testing approaches.
Why This Matters Beyond Mathematics
Many people hear "theorem prover" and assume this is only useful for mathematicians.
That is increasingly false.
Formal verification is already used in areas such as:
Compilers
The famous CompCert compiler demonstrates that compiler correctness can be formally proven.
Cryptography
Security protocols often rely on formal proofs.
A tiny mistake can compromise billions of dollars.
Aerospace
Flight control systems require exceptionally high confidence.
Finance
Smart contracts and trading infrastructure can benefit from machine-checked guarantees.
AI Agents
Agents increasingly perform actions instead of merely generating text.
As autonomy increases, verification becomes more valuable.
The more expensive a mistake becomes, the more attractive formal guarantees become.
The Bigger Picture: Probabilistic Intelligence + Deterministic Verification
There is a tendency to think of theorem provers and LLMs as competing technologies.
They're not.
In many ways they complement each other.
LLMs are excellent at:
- Search
- Exploration
- Creativity
- Pattern matching
- Generating candidate solutions
Theorem provers are excellent at:
- Verification
- Correctness
- Logical consistency
- Mathematical guarantees
One generates.
The other validates.
A useful analogy is software development itself.
We don't replace programmers with compilers.
We use compilers to verify what programmers produce.
Future AI systems may look similar:
LLM = Generator
Theorem Prover = Verifier
The combination is potentially far more powerful than either component alone.
Final Thoughts
For years the AI industry has largely optimized for capability.
Can the model write code?
Can it solve math problems?
Can it reason?
Those are important questions.
But another question is becoming increasingly important:
How do we know the answer is actually correct?
Theorem provers such as Lean4 offer one possible answer.
They provide a mechanism for transforming "the model thinks this is right" into "this has been formally verified."
Whether Lean itself becomes dominant remains to be seen.
But the broader idea—combining probabilistic generation with formal verification—feels less like a niche research direction and more like a plausible next step in the evolution of AI systems.
What do you think?
Will theorem provers become a standard component of future AI stacks, or will they remain specialized tools used only in high-assurance domains?
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
HexmosTech
/
git-lrc
Free, Micro AI Code Reviews That Run on Commit
| 🇩🇰 Dansk | 🇪🇸 Español | 🇮🇷 Farsi | 🇫🇮 Suomi | 🇯🇵 日本語 | 🇳🇴 Norsk | 🇵🇹 Português | 🇷🇺 Русский | 🇦🇱 Shqip | 🇨🇳 中文 |
git-lrc
Free, Micro AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a…
Top comments (0)