Nail Sharipov

Posted on Jun 16

Earcut64: Zero-Allocation Triangulation for Tiny Polygons

#algorithms #rust #showdev #gamedev

Earcut looks like a textbook triangulation algorithm, yet it can outperform more complex methods on small, simple polygons.

By limiting ourselves to ≤ 64 vertices we can keep everything on the stack, encode the contour in a single u64 to get maximum performance.

Why Earcut at All?

Earcut is almost a teaching example: pick any non-self-intersecting polygon, find an "ear" (three consecutive vertices that form an empty inner triangle), cut it off, repeat.

Simple? Yes. Slow? Sometimes. But on contours with few points Earcut can be the fastest tool in the shed, as my benchmarks against iTriangle and Mapbox implementations have shown.

That observation sparked an idea: what if we crafted a micro-optimized version of Earcut specialised for single-contour shapes with ≤ 64 vertices? No heap, one bit-mask, maximum speed.

1. How Classic Earcut Works

Walk through every triple of consecutive vertices.
For each triple check:
- The triangle is oriented CCW.
- It contains no other vertices inside.
If both hold, you found an ear—remove its middle vertex and continue.

Worst-case complexity: O(N³). In practice most polygons behave better (closer to O(N²)) because ears appear quickly.

2. "Bigger-Ear" Variant

Instead of cutting one triangle at a time, we try to grow a larger convex "ear" that can be triangulated in bulk.

Why convex?

Triangulating a convex polygon is trivial.
Checking "is point inside?" is cheaper.

Empty-check methods

Even-Odd ray
Cross-product sign — faster because you can abort on the first sign flip.

All cross-products same sign - point is inside
Sign flips - point is outside

3. Finding the Maximum Ear

If the convex ear still hides internal points, rewind from the tail until they disappear.
Key trick: keep the first vertex fixed—the main sweep will revisit it anyway.

To pick the "nearest " internal point:

Build vectors from the first vertex to every candidate.
Compare them pair-wise via cross-product (arccos is both slow and imprecise).
The vector with the smallest positive angle to the first edge defines the limit.

Cut everything up to that limit and triangulate locally.

4. Speed Hacks

Bounding-box filter — drop points obviously outside the ear’s AABB.
Early rejection — if the first triangle is already dirty, skip the ear.
Heap of nearest points — we only care about the closest violators.
Incremental testing — check the next point starting from the last valid triangle.

Optimized pass — step‑by‑step

The image above illustrates how the incremental filter + heap strategy works:

Gray points — vertices discarded by the ear’s axis‑aligned bounding box.
Orange points — remaining candidates sorted by angle to the first edge(red).
Green segments — final ear triangles kept in the mesh.

Candidate 1: its vector falls inside triangle 4, yet the point itself lies outside the ear - keep scanning.

Candidate 2: tested starting from triangle 4, ends up in triangle 5 and is still outside.

Candidate 3: tested from triangle 5, reaches triangle 6 and finally lands inside — search stops.

Result: the ear is triangulated using triangles 1 – 5.

No internal point found?

Two scenarios:

All candidates examined - no obstructing vertices; cut the ear entirely.
Heap overflow - only part of the candidates checked; seen points are outside but we lack guarantees for the rest. Cut only up to the last verified triangle — safe but maybe sub‑optimal.

This compromise preserves speed without sacrificing robustness. Experiments showed that a heap of 1–5 nearest points offers the best trade-off; larger heaps slow things down without benefit.

5. Earcut64: Stack-Only, Bit-Mask Navigation

For contours of ≤ 64 points, store presence bits in a single u64

0b110111 - vertex 3 was removed

Bit math for Navigation:

impl BitMaskNav for u64 {
    #[inline(always)]
    fn ones_index_to_end(index: usize) -> Self {
        debug_assert!(index <= 64);
        let shift = index & 63;
        u64::MAX << shift   // 1-bits from index (inclusive) to the end
    }

    /// Finds the next available bit (index of vertex) after `after`, wrapping if needed
    #[inline(always)]
    fn next_wrapped_index(self, after: usize) -> usize {
        debug_assert!(after <= 63);
        let front = self & u64::ones_index_to_end(after + 1);
        if front != 0 {
            front.trailing_zeros() as usize
        } else {
            self.trailing_zeros() as usize
        }
    }
}

Filtering internal points

let candidates = available & !ear_indices;
/*
available   – all remaining vertices
ear_indices – vertices currently forming the ear
candidates  – potential internal points
*/

Zero allocations, zero copies - everything lives in registers or on the stack.

6. Benchmarks

Below are average runtimes (µs) from the first two public tests on my benchmark page. Hardware: 3 GHz 6‑core Intel i5‑8500.

Star

Count	Earcut64	Monotone	Earcut Rust	Earcut C++
8	0.28	0.5	0.73	0.42
16	0.64	1.6	1.23	0.5
32	1.61	3.9	2.6	1.2
64	4.45	8.35	5.6	3.3
128	-	17.8	12.6	8.4
256	-	37.5	29.1	22.9
512	-	79.7	80.7	72.7
1024	-	172	259	209
2048	-	388	736	641
4096	-	898	3158	2804
8192	-	1824	13435	11479
16384	-	3846	51688	44017

Spiral

Count	Earcut64	Monotone	Earcut Rust	Earcut C++
8	0.35	0.7	0.77	0.42
16	1.2	1.4	1.66	0.77
32	4.2	3.0	6.25	3.4
64	16.1	6.2	18.6	19.8
128	-	12.8	71.6	66
256	-	26.7	295	306
512	-	55.5	1230	1438
1024	-	120	5301	7595
2048	-	279	22682	50140
4096	-	685	96933	376060
8192	-	1435	416943	3.7kk
16384	-	3080	1812147	43.4kk

Note: Earcut64 and Monotone in these tables are both algorithms from my open‑source Rust library iTriangle.

Benchmark takeaway

Ear-based triangulators shine on simple shapes and quickly fall off on pathological ones — exactly what theory predicts.
Mapbox Earcut (C++) is the fastest in the 16 – 512‑point range for simple contours, which makes sense given its map‑rendering focus.
Earcut64 holds its own and even outperforms the professional library in several small‑polygon cases.
The pure‑Rust Earcut port stays surprisingly strong on very complex contours; within the Earcut family.

7. Why It Matters — and What’s Next?

Earcut64 it’s fully stack-allocated, zero-copy, and designed for maximum inlining by the compiler.

That makes it a perfect candidate for GPU porting.

But more importantly — it's part of iTriangle, an integer-based triangulation library validated against billions of randomized tests, from clean stars to extreme degeneracies.

iTriangle guarantees watertight, non-overlapping triangle meshes — no crashes, no invalid output, no surprises and it's a perfect solution for: