DEV Community

Anand Vashishtha
Anand Vashishtha

Posted on

Building KaggleIngest: How I Bridged Kaggle Data with AI Coding Assistants

KaggleIngest Home Page
Provide rich context about Kaggle competitions to AI coding assistants
If you've ever tried to use an AI coding assistant for a Kaggle competition, you know the struggle:

  • Hundreds of notebooks to sift through
  • Context windows that fill up with imports and visualizations
  • No easy way to extract the valuable insights

I built KaggleIngest to solve this.

What is KaggleIngest?

It's an open-source tool that:

  1. Takes any Kaggle competition or dataset URL
  2. Ranks and downloads the top notebooks
  3. Extracts valuable patterns (skipping boilerplate)
  4. Outputs token-optimized context for LLMs

Live Demo: kaggleingest.com
GitHub: github.com/Anand-0037/KaggleIngest

The Tech Stack

Layer Technology
Frontend React 19 + Vite + TanStack Query
Backend FastAPI + Python 3.13 + Redis
Deploy Vercel (frontend) + Render (backend)

Key Technical Challenges

1. Kaggle SDK Quirks

The official Kaggle SDK has some... interesting behaviors. When credentials are missing, it calls exit(1):

# This crashes your entire app!
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()  # exit(1) if no credentials
Enter fullscreen mode Exit fullscreen mode

My solution: Wrap the import in a try/except that catches SystemExit:

try:
    kaggle_service.get_client()
except SystemExit as e:
    logger.warning(f"Kaggle auth failed: {e}")
    return {"kaggle": False}
Enter fullscreen mode Exit fullscreen mode

2. Smart Notebook Ranking

Not all notebooks are equal. A 5-year-old notebook with 1000 upvotes might be less useful than a recent one with 100.

I use a scoring formula:

score = log(upvotes + 1) * time_decay_factor
Enter fullscreen mode Exit fullscreen mode

Where time_decay_factor decreases for older notebooks.

3. Token Optimization

LLMs are expensive. I used TOON (Token-Optimized Object Notation):

// Standard JSON: 150 tokens
{
  "notebook_title": "Introduction to Ensembling",
  "notebook_author": "arthurtok",
  "upvotes": 3847
}

// TOON: 90 tokens
{"t":"Introduction to Ensembling","a":"arthurtok","v":3847}
Enter fullscreen mode Exit fullscreen mode

That's 40% fewer tokens for the same information.

Try It Yourself

  1. Go to kaggleingest.com
  2. Paste a Kaggle URL (try: https://www.kaggle.com/competitions/titanic)
  3. Download the context file
  4. Feed it to your favorite LLM

Star on GitHub if this was helpful!

Questions? Drop them in the comments!

Top comments (0)