GNN vs. Trees: High-Speed Hybrid Architecture for XLA Runtime Prediction

#architecture #deeplearning #machinelearning #performance

Introduction
A common trap in Machine Learning engineering is deploying over-parameterized models where simpler, structurally informed pipelines can deliver identical precision at a fraction of the cost. To prove this hypothesis, I spent 24 hours reverse-engineering the "Google - Fast or Slow? Predict AI Model Runtime" competition from 2023.

The Challenge
Google's XLA compiler needs to pick the best physical memory layout and tile configurations for complex operation graphs. The wrong choice causes dramatic slowdowns. Benchmarking every configuration on actual TPUs is too expensive. We need an intelligent proxy model to rank configurations by speed.

The Standard Approach vs. My Hybrid Architecture
Top-tier competition submissions favored massive, deep GNNs coupled with heavy MLP ranking heads optimized via Pairwise Margin Ranking Loss. While powerful, these architectures leave significant optimization potential on the table regarding memory footprints and inference speeds.

My solution shifts the heavy lifting from continuous gradient propagation to structured feature engineering:

Statistical Feature Extraction: Computational footprint densities are extracted via cheap, closed-form global graph metrics.
Shallow Message Passing: A lightweight, 2-layer GNN serves purely as a fast localized node feature extractor.
Gradient Boosted Trees: The combined structural vectors are fed directly into a Scikit-Learn HistGradientBoostingRegressor.

Five Technical Benefits for Google Infrastructure

Maximized compilation autotuning speeds by replacing deep neural net inference with tree-based regression.
Drastic reduction in Peak RAM consumption during graph configuration evaluations.
Zero reliance on physical sensors for identifying high-latency graph segments.
Clean scalability when processing massive NLP and XLA layout workloads.
Higher hardware utilization efficiency across distributed production clusters.

The Nexus to Frontier AI (Gemini)
Compiler optimizations at the XLA layer are not theoretical exercises. The ability to predict graph runtimes and optimize memory layout sequences is exactly what allows Google to train and serve massive LLMs like Gemini efficiently.

Codebase & Notebooks
The entire project has been converted from a research script into a modular, production-ready MLOps package.

Review the architecture and run the experiments:
GitHub:
Kaggle: