Building KaggleIngest: How I Bridged Kaggle Data with AI Coding Assistants

#python #kaggle #react #opensource

Provide rich context about Kaggle competitions to AI coding assistants
If you've ever tried to use an AI coding assistant for a Kaggle competition, you know the struggle:

Hundreds of notebooks to sift through
Context windows that fill up with imports and visualizations
No easy way to extract the valuable insights

I built KaggleIngest to solve this.

What is KaggleIngest?

It's an open-source tool that:

Takes any Kaggle competition or dataset URL
Ranks and downloads the top notebooks
Extracts valuable patterns (skipping boilerplate)
Outputs token-optimized context for LLMs

Live Demo: kaggleingest.com
GitHub: github.com/Anand-0037/KaggleIngest

The Tech Stack

Layer	Technology
Frontend	React 19 + Vite + TanStack Query
Backend	FastAPI + Python 3.13 + Redis
Deploy	Vercel (frontend) + Render (backend)

Key Technical Challenges

1. Kaggle SDK Quirks

The official Kaggle SDK has some... interesting behaviors. When credentials are missing, it calls exit(1):

# This crashes your entire app!
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()  # exit(1) if no credentials

My solution: Wrap the import in a try/except that catches SystemExit:

try:
    kaggle_service.get_client()
except SystemExit as e:
    logger.warning(f"Kaggle auth failed: {e}")
    return {"kaggle": False}

2. Smart Notebook Ranking

Not all notebooks are equal. A 5-year-old notebook with 1000 upvotes might be less useful than a recent one with 100.

I use a scoring formula:

score = log(upvotes + 1) * time_decay_factor

Where time_decay_factor decreases for older notebooks.

3. Token Optimization

LLMs are expensive. I used TOON (Token-Optimized Object Notation):

// Standard JSON: 150 tokens
{
  "notebook_title": "Introduction to Ensembling",
  "notebook_author": "arthurtok",
  "upvotes": 3847
}

// TOON: 90 tokens
{"t":"Introduction to Ensembling","a":"arthurtok","v":3847}