DEV Community

Sid
Sid

Posted on

Which AI models are actually "brain-like"? I built an open-source benchmark to measure it

Meta released TRIBE v2 last week - a foundation model that predicts fMRI brain activation from video, audio, and text. The question I kept coming back to was:

How do we actually compare AI models to the brain in a rigorous, statistical way?

So I built CortexLab - an open-source toolkit that adds the missing analysis layer on top of TRIBE v2.

The core idea

Take any model (CLIP, DINOv2, V-JEPA2, LLaMA) and ask:

  • Do its internal features align with predicted brain activity patterns?
  • Which brain regions does it match?
  • Is that alignment statistically significant?

What you can do with it

Compare models against the brain

  • RSA, CKA, Procrustes similarity scoring
  • Permutation testing, bootstrap CIs, FDR correction per ROI
  • Noise ceiling estimation (upper bound on achievable alignment)

Analyze brain responses

  • Cognitive load scoring across 4 dimensions (visual, auditory, language, executive)
  • Peak response latency per ROI (reveals cortical processing hierarchy)
  • Lag correlations and sustained vs transient response decomposition

Study brain networks

  • ROI connectivity matrices with partial correlation
  • Network clustering, modularity, degree/betweenness centrality

Real-time inference

  • Sliding-window streaming predictions for BCI-style pipelines
  • Cross-subject adaptation with minimal calibration data

Example results

Benchmark output comparing 4 models (synthetic data, so scores reflect alignment method properties, not real brain claims):

  clip-vit-b32:
       rsa: +0.0407  (p=0.104, CI=[0.011, 0.203])
       cka: +0.8561  (p=0.174, CI=[0.903, 0.937])

  dinov2-vit-s:
       rsa: -0.0052  (p=0.542, CI=[-0.042, 0.164])
       cka: +0.8434  (p=0.403, CI=[0.895, 0.932])

  vjepa2-vit-g:
       rsa: +0.0121  (p=0.333, CI=[-0.010, 0.166])
       cka: +0.8731  (p=0.438, CI=[0.915, 0.944])

  llama-3.2-3b:
       rsa: -0.0075  (p=0.642, CI=[-0.026, 0.145])
       cka: +0.8848  (p=0.731, CI=[0.922, 0.949])
Enter fullscreen mode Exit fullscreen mode

Why this isn't just TRIBE v2

TRIBE v2 gives raw vertex-level brain predictions. CortexLab adds:

  • Statistical testing (is this score meaningful?)
  • Interpretability (which ROIs, which modality, how does it evolve over time?)
  • Model comparison framework (is model A significantly better than model B?)

Without that, you have predictions. With this, you can draw conclusions.

Interactive demo (no GPU needed)

There's a Streamlit dashboard with biologically realistic synthetic data (HRF convolution, modality-specific activation, spatial smoothing). You can explore all analysis tools interactively.

Links:

76 tests, CC BY-NC 4.0, 3 external contributors already.

Looking for feedback

Especially interested in:

  • Better alignment metrics beyond RSA/CKA/Procrustes
  • Neuroscience validity of the ROI-to-cognitive-dimension mapping
  • Ideas for real-world benchmarks (datasets, model comparisons)

Happy to answer questions about the implementation or methodology.

Top comments (0)