DEV Community

Cover image for AI Just Scored 37.8% of Human Intelligence — Introducing the AGCI Benchmark
Epic Programmer
Epic Programmer

Posted on

AI Just Scored 37.8% of Human Intelligence — Introducing the AGCI Benchmark

For years, AI evaluation has been stuck in a loop — models acing short-term tasks, then forgetting everything the next day.

We built the AGCI Benchmark to measure something deeper:
how well an AI learns, remembers, and adapts over time.

Cost and Resource Usage of AGCI Benchmark

It’s not about solving puzzles anymore — it’s about testing cognitive continuity.
How much of human-like intelligence can a system retain from experience?

In its first public run, Dropstone — our self-learning IDE — scored 37.8% of human intelligence on the AGCI Benchmark, leading every evaluated system to date.

AGCI v1.0 evaluation results as of November 2025, measuring performance across the cognitive dimensions framework.

This framework measures seven cognitive dimensions — from perception and memory persistence to adaptive reasoning — designed to capture the intelligence that unfolds over time, not in a single prompt.

📖 Read the benchmark and methodology here:
👉 https://www.dropstone.io/research/agci-benchmark

The AGCI Benchmark is open for replication and critique.
If you believe intelligence is more than one-shot reasoning, this might be the conversation that redefines how we measure it.

Top comments (0)