DEV Community

Cover image for ๐—ช๐—ต๐—ฎ๐˜ ๐—ถ๐—ณ ๐ซ๐ž๐ฅ๐ข๐š๐›๐ฅ๐ฒ ๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐ญ๐š๐ฌ๐ค๐ฌ ๐˜„๐—ฎ๐˜€ ๐Ÿ๐ข๐ง๐š๐ฅ๐ฅ๐ฒ ๐˜„๐—ถ๐˜๐—ต๐—ถ๐—ป ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต?!
Farouk Boukil
Farouk Boukil

Posted on

๐—ช๐—ต๐—ฎ๐˜ ๐—ถ๐—ณ ๐ซ๐ž๐ฅ๐ข๐š๐›๐ฅ๐ฒ ๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐ญ๐š๐ฌ๐ค๐ฌ ๐˜„๐—ฎ๐˜€ ๐Ÿ๐ข๐ง๐š๐ฅ๐ฅ๐ฒ ๐˜„๐—ถ๐˜๐—ต๐—ถ๐—ป ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต?!

We all know the grind of working with data, even with AI tools: every experiment starts with re-explaining everything, every iteration needs you to prompt, wait, review, correct, and repeat. And the moment you close the session, everything learned is gone.

It makes us the bottleneck, and this hinders human-AI collaboration...

So I built ๐Ž๐ฉ๐ž๐ง๐ƒ๐š๐ญ๐š๐’๐œ๐ข, an autonomous agent purpose-built for DS/ML, and tested it on Kaggle. I enrolled in a recent competition, ran the agent with no hints, no guidance, while ironing my shirts.

In one shot, it landed AUC 0.95, a top-30% finish out of 3K+ teams and 36K+ submissions using hashtag#Anthropic's Claude Sonnet 4.6. (More on this in README)

The top-1 outperformed this agent by merely 0.004, but at the cost of massive manual effort even while using popular AI tools. The needed a dozen model families, deep learning, 400-feature notebooks, AutoML sweeps across many libraries, and 186 models ensembled carefully. Essentially a few weeks worth of effort and time!!

OpenDataSci abstracts away all the complexity and has so much to offer for DS/ML automation:

โ†’ Owns the entire development lifecycle from EDA to final evaluation
โ†’ Plans, codes, and executes autonomously in a secure local sandbox
โ†’ Self-reviews and corrects before anything reaches you
โ†’ Remembers your data across sessions, gets smarter each run
โ†’ Runs parallel experiments and ensembles
โ†’ Has advanced context management for token efficiency and quality
โ†’ Ships with predefined skills for DS/ML, so it knows how to do things right
โ†’ Bring your own knowledge: out-of-the-box support for custom skills
โ†’ Works with any major LLM provider (hashtag#Anthropic, hashtag#OpenAI, hashtag#Bedrock, hashtag#VertexAI, hashtag#Ollama, hashtag#vLLM, and any OpenAI-compatible server).

This and so much more!! You set the goal. It does the work. No data science knowledge required.

๐Ÿ”— https://github.com/f4roukb/open-data-sci
๐Ÿ“ฆ pip install open-data-sci

Spin it up on your data and see what it achieves!

Top comments (1)

Collapse
 
nazar_boyko profile image
Nazar Boyko

The number that jumped out at me isn't the AUC, it's the gap. Top-1 beat your agent by 0.004 but needed a dozen model families, 186 models stacked together, and weeks of work. That's a strong argument that an agent can get you most of the way fast, and the last sliver is where human effort still earns its keep. The one thing I'd hold loosely is that this is a single competition, so I'd be curious whether the same one-shot result holds on messier problems like time series or images before reading too much into it.