For years, AI evaluation has been stuck in a loop — models acing short-term tasks, then forgetting everything the next day.
We built the AGCI Benchmark to measure something deeper:
how well an AI learns, remembers, and adapts over time.
It’s not about solving puzzles anymore — it’s about testing cognitive continuity.
How much of human-like intelligence can a system retain from experience?
In its first public run, Dropstone — our self-learning IDE — scored 37.8% of human intelligence on the AGCI Benchmark, leading every evaluated system to date.
This framework measures seven cognitive dimensions — from perception and memory persistence to adaptive reasoning — designed to capture the intelligence that unfolds over time, not in a single prompt.
📖 Read the benchmark and methodology here:
👉 https://www.dropstone.io/research/agci-benchmark
The AGCI Benchmark is open for replication and critique.
If you believe intelligence is more than one-shot reasoning, this might be the conversation that redefines how we measure it.


Top comments (0)