
Provide rich context about Kaggle competitions to AI coding assistants
If you've ever tried to use an AI coding assistant for a Kaggle competition, you know the struggle:
- Hundreds of notebooks to sift through
- Context windows that fill up with imports and visualizations
- No easy way to extract the valuable insights
I built KaggleIngest to solve this.
What is KaggleIngest?
It's an open-source tool that:
- Takes any Kaggle competition or dataset URL
- Ranks and downloads the top notebooks
- Extracts valuable patterns (skipping boilerplate)
- Outputs token-optimized context for LLMs
Live Demo: kaggleingest.com
GitHub: github.com/Anand-0037/KaggleIngest
The Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19 + Vite + TanStack Query |
| Backend | FastAPI + Python 3.13 + Redis |
| Deploy | Vercel (frontend) + Render (backend) |
Key Technical Challenges
1. Kaggle SDK Quirks
The official Kaggle SDK has some... interesting behaviors. When credentials are missing, it calls exit(1):
# This crashes your entire app!
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate() # exit(1) if no credentials
My solution: Wrap the import in a try/except that catches SystemExit:
try:
kaggle_service.get_client()
except SystemExit as e:
logger.warning(f"Kaggle auth failed: {e}")
return {"kaggle": False}
2. Smart Notebook Ranking
Not all notebooks are equal. A 5-year-old notebook with 1000 upvotes might be less useful than a recent one with 100.
I use a scoring formula:
score = log(upvotes + 1) * time_decay_factor
Where time_decay_factor decreases for older notebooks.
3. Token Optimization
LLMs are expensive. I used TOON (Token-Optimized Object Notation):
// Standard JSON: 150 tokens
{
"notebook_title": "Introduction to Ensembling",
"notebook_author": "arthurtok",
"upvotes": 3847
}
// TOON: 90 tokens
{"t":"Introduction to Ensembling","a":"arthurtok","v":3847}
That's 40% fewer tokens for the same information.
Try It Yourself
- Go to kaggleingest.com
- Paste a Kaggle URL (try:
https://www.kaggle.com/competitions/titanic) - Download the context file
- Feed it to your favorite LLM
Star on GitHub if this was helpful!
Questions? Drop them in the comments!
Top comments (0)