This article focuses only on AlphaGeometry DSL itself. It does not cover model training, search strategy, or paper results.
The goal is to treat the DSL as an engineering-facing protocol document and answer four questions:
- How problem input is encoded
- How actions are defined in
defs.txt - How geometric relations are mapped into predicates
- How numerical construction and symbolic reasoning are connected
If you want to reproduce AlphaGeometry, build a geometry data generator, or design a compatibility layer for a custom solver, understanding the DSL protocol is a prerequisite.
1. Role of the DSL
AlphaGeometry DSL is a domain-specific language for geometric construction and relation expression. It mainly serves 3 purposes:
- Express initial geometric premises
- Express executable construction actions
- Express target geometric goals to be verified
Its output is not a final proof text, but a set of geometric relations consumable by a reasoning engine.
From an implementation perspective, the DSL is closer to an intermediate representation:
- Upstream, it connects to problem descriptions or data generators
- In the middle, it connects to action definitions and relation expansion
- Downstream, it connects to
rules.txtand DDAR reasoning
The protocol is centered on relation generation rather than diagram drawing.
2. Problem File Structure
A complete problem is usually written as:
problem_name
premises ? goal
Example:
orthocenter
a b c = triangle;
h = on_tline b a c, on_tline c a b
? perp a h b c
It can be decomposed into 3 sections:
-
a b c = triangle;Initial premises and free objects -
h = on_tline b a c, on_tline c a bConstruction based on known objects -
? perp a h b cTarget predicate
This DSL fragment is not a natural-language solution. It is a geometric program:
Premises
-> constructions
-> predicate graph
-> goal checking
After parsing, the system usually needs to produce:
- An initial object table
- An action invocation sequence
- An initial predicate set
- A target to be checked
3. Basic Syntax Conventions
The core expression form of the DSL is:
output = action(parameters)
Example:
h = on_tline b a c, on_tline c a b
This means that point h satisfies two construction constraints at the same time:
- Draw a line through
bperpendicular toac - Draw a line through
cperpendicular toab
Therefore, h is the intersection of the two perpendicular lines.
This style has two main properties:
- Output variables are uniformly represented as point variables
- Geometric objects are represented implicitly through point sets rather than independent object types
For example:
line a b
circle o a
-
line a bdenotes the line defined by pointsaandb -
circle o adenotes the circle centered atopassing througha
This design simplifies the parser and relation graph structure, but it requires the predicate system to be expressive enough to cover higher-level semantics of lines, circles, and angles.
4. Action Definition Structure in defs.txt
defs.txt is the action registry. Each action usually contains 5 parts:
action_name outputs inputs
variable dependency
input conditions
geometric constraints
numerical constructions
Example:
midpoint x a b
x : a b
a b = diff a b
x : coll x a b, cong x a x b
midp a b
The role of each part is as follows.
1. Action signature
midpoint x a b
This means:
- The action name is
midpoint - The output point is
x - The input points are
a b
2. Variable dependency
x : a b
This means output variable x depends on inputs a and b.
This part is typically used for dependency graphs or variable scope management.
3. Input validity conditions
a b = diff a b
This means a and b must be distinct points.
This layer is used to prevent degenerate constructions and does not directly generate proof relations.
4. Geometric constraints
x : coll x a b, cong x a x b
This means the following predicates must hold after the action is completed:
-
coll x a b, meaningxis collinear withaandb -
cong x a x b, meaningXA = XB
This part defines the symbolic semantics of the action and is the main input consumed by the downstream reasoning system.
5. Numerical construction interface
midp a b
This means the numerical engine should invoke a midpoint construction.
Typical uses of the numerical layer include:
- Generating concrete coordinate instances
- Checking whether a construction degenerates
- Providing numerical truth checks for predicates
5. Predicate System
Predicates are the core input format of the AlphaGeometry reasoning system.
Common core predicates include:
| predicate | Meaning |
|---|---|
coll A B C |
Three points are collinear |
cong A B C D |
AB = CD |
perp A B C D |
AB ⟂ CD |
para A B C D |
AB ∥ CD |
eqangle ... |
Angles are equal |
cyclic A B C D |
Four points are concyclic |
These predicates enter the rule system defined in rules.txt and trigger further inference.
A typical flow is:
construction
-> initial predicates
-> rule firing
-> new predicates
-> goal reached / not reached
So the core value of the DSL is not the action catalog itself, but its predicate generation capability.
Whether an action is useful mainly depends on:
- Which predicates it introduces
- Whether those predicates are likely to trigger rule chains
- Whether they significantly shorten the proof path to the goal
6. Common Action Types
1. Basic objects
free a
triangle a b c
quadrangle a b c d
-
freegenerates a free point -
trianglegenerates the 3 base points of a triangle -
quadranglegenerates the 4 base points of a quadrilateral
2. Points on a line or circle
on_line x a b
on_circle x o a
on_pline x a b c
on_tline x a b c
-
on_linecorresponds to a collinearity constraint -
on_circlecorresponds to an equal-radius constraint -
on_plinecorresponds to a parallel constraint -
on_tlinecorresponds to a perpendicular constraint
3. Intersection constructions
intersection_ll x a b c d
intersection_lc x a o b
intersection_cc x o w a
These represent:
- line-line intersection
- line-circle intersection
- circle-circle intersection
4. Basic geometric constructions
midpoint x a b
foot x a b c
mirror x a b
-
midpointgenerates a midpoint -
footgenerates a foot of the perpendicular -
mirrorgenerates a symmetric point
The key property of these actions is that a single invocation can introduce multiple high-density relations.
5. Triangle centers
For example:
circumcenter x a b c
incenter x a b c
excenter x a b c
centroid x y z i a b c
ninepoints x y z i a b c
These actions typically introduce multiple relation groups at once, such as equidistance, angle bisection, and perpendicular bisectors.
6. Special polygons
For example:
square a b x y
rectangle a b c d
parallelogram a b c x
trapezoid a b c d
eq_trapezoid a b c d
These primitives have stronger initial relations and are better suited for generating structured problems or high-constraint training samples.
7. Worked Example
Example:
orthocenter
a b c = triangle;
h = on_tline b a c, on_tline c a b
? perp a h b c
The execution process is as follows.
Step 1: Parse the premises
triangle a b c generates the base point set and non-degeneracy conditions.
Step 2: Execute the construction
h is defined as the intersection of the following two constraints:
- A line through
bperpendicular toac - A line through
cperpendicular toab
Step 3: Materialize predicates
The construction is converted into:
perp b h a c
perp c h a b
Step 4: Verify the goal
The system checks whether it can derive:
perp a h b c
If yes, the goal is established. Otherwise, the problem is not proved under the current construction and rule set.
8. Execution Pipeline
From problem text to goal verification, the typical pipeline is:
Problem DSL
-> parse.py
-> action expansion via defs.txt
-> geometry graph
-> predicate inference via rules.txt
-> DDAR solver
The responsibility of each stage is:
- Parse the problem text and identify premises, constructions, and goals
- Look up the corresponding action definition in
defs.txt - Expand variable dependencies, input conditions, and geometric constraints
- Write the constraints into the geometry relation graph
- Trigger new predicates according to
rules.txt - Run reachability checks against the target predicate
Numerical construction and symbolic reasoning usually coexist in parallel in this pipeline:
- The numerical layer handles instantiation and truth checking
- The symbolic layer handles strict inference and proof tracing
9. Implementation Notes
1. Action design should optimize for relation output
Whether an action is worth keeping should be judged by the quality of the predicates it introduces, not by whether the geometric meaning feels intuitive.
2. Degeneracy must be handled explicitly
Cases such as coincident points, parallel lines without an intersection, and zero-radius circles should be intercepted either in input conditions or in the numerical layer.
3. Predicate coverage determines the expressive ceiling
If the system can only express collinearity, parallelism, and perpendicularity, the representational power for harder geometry problems will become limited very quickly.
4. The numerical interface should not be omitted
If symbolic definitions exist without numerical construction interfaces, the cost of data generation, debugging, and truth checking rises substantially.
10. Recommended Minimal Implementable Subset
If you want a minimal version compatible with the AlphaGeometry approach, prioritize support for the following actions:
triangleon_lineon_tlineon_plineintersection_llmidpointfoot
And support at least the following predicates:
collperpparacongcycliceqangle
This is a small but workable protocol core.
11. Protocol Essence
AlphaGeometry DSL can be summarized as:
Geometry Construction DSL
+ Predicate Interface
+ Rule-System Input Layer
Its main value is not in describing diagrams, but in compressing geometry problems into an executable, verifiable, and inferable protocol layer.
12. Reliable Recommendation: Dino-GSP
If you need a geometry representation environment that is more open than AlphaGeometry DSL and better suited for product and ecosystem integration, take a look at Dino-GSP.
It also represents geometric objects as executable structures and defines its own DSL and constraint representation layer to support:
- More open geometry construction and editing workflows
- Ecosystem integration for teaching, content production, and AI geometry applications
- Programmable figure generation, constraint validation, and auxiliary structure construction
If AlphaGeometry DSL is closer to an internal protocol for solvers and research systems, Dino-GSP is closer to an extensible product layer and an open ecosystem interface.
Top comments (0)