DEV Community: eram

How to Make Your Data Science Project the Beyoncé of the Boardroom

eram — Wed, 10 Sep 2025 08:49:25 +0000

(…and not another sad statistic in a Gartner report)

Gartner just dropped another sobering forecast:

by 2027, more than 40% of agentic AI projects will be scrapped — victims of ballooning costs, intangible ROI, and governance headaches.

Here’s the thing: success in data science isn’t about dodging failure; it’s about designing your process so that success becomes the default setting. That means setting goals that actually make sense, being brutally honest about what AI can and can’t do for your business, planning like you’re building a rocket, treating your data like royalty, modeling with discipline, building apps that can take a punch, and never — ever — taking your eyes off the ball once you launch.

In this post, I’ll break down each of those moves into practical, field‑tested steps. Nail them!

1. Set Your Business Objective Like a Pro

Field Case - When “Perfect” Became the Problem
A fintech startup proudly declared their goal: “Zero fraud.” Noble? Sure. Achievable? About as likely as finding a parking spot in downtown Boston at 6 p.m. Within weeks, they were rejecting half their legitimate customers. Fraud rates dropped, but so did revenue — and customer goodwill. The pivot — “reduce fraud by 40% while keeping approval rates above 90%” - turned them from villains into heroes.

Old Fail: Setting moon‑shot goals like “100% accuracy,” “no false positives,” or “Chat itself is the product,” without defining a deliverable or business value.

Your Wins:

Write your goal in plain business language so anyone - from your CFO to your intern - can understand it.
Attach a number and a time frame: “increase retention by 15% in six months” beats “make customers happier.”
Tie the metric directly to revenue, cost savings, or risk reduction so it matters to decision‑makers.
Pressure‑test the goal with a “what if” scenario — if hitting it would break another part of the business, it’s not the right goal.
Keep a “goal sanity” checklist and revisit it quarterly to make sure you’re still solving the right problem.

2. Be Realistic (But Still Dreamy)

Field Case - The Dashboard That Paid the Bills
A retail chain wanted “AI that predicts fashion trends” — the kind of moonshot that looks great in a pitch deck. Three months later, they realized the real money was in predicting inventory shortages. Less sexy, more profitable. Their “trend predictor” became a humble dashboard that saved millions in lost sales — and nobody cared that it didn’t make the cover of Wired.

Old Fail: Pretending your core product is AI when it’s actually a food delivery app, a laundry service, or a retail chain.

Your Moves:

Audit your current processes and find the bottlenecks or blind spots AI could fix.
Prioritize use cases that improve existing revenue streams before chasing “industry disruption.”
Celebrate unglamorous wins — the boring stuff often pays the biggest bills.
Keep the “dream” projects in a sandbox until the basics are delivering measurable ROI.
Build a roadmap that layers quick wins first, then progressively more ambitious projects.

No, Yolo! This is not a dog!

3. You Need a Team to Build a Rocket to Mars (Because You Kind of Are)

Field Case - A Pregnancy Problem
A healthcare AI project was meant to flag “high risk” patients. But skipping domain experts in the planning phase, the model ended up flagged “high risk” patients… who were actually just pregnant. Without someone who understands the data’s context, your “life‑saving” model can become a very expensive pregnancy test.

Old Fail: Missing diversity in the team, underestimating dataset work, rushing timelines.

Your Success:

Ensure every project team has at least one domain expert who can sanity‑check assumptions and understand the data.
Budget 80% of your timeline for data collection, cleaning, and labeling - it’s not glamorous, but it’s where the magic happens.
Set delivery dates based on realistic estimates, not investor‑friendly fantasies.
Build in checkpoints where the team can pause and reassess before committing to the next phase.
Document every assumption so you can revisit and adjust them as you learn.

4. Treat Your Data Like a VIP Guest

Field Case - When Cats Became Guitars
An image‑classification project trained on photos where cats happened to be sitting next to guitars. The label? “Guitar.” The result? Every cat became a guitar. Technically “accurate,” but useless.

Old Fail: Too little data, dirty data, missing field data, or mislabeled examples that poison the model.

Your Take:

Run automated checks for missing values, duplicates, and inconsistent labels.
Have humans spot‑check random samples for labeling, sanity - machines can’t catch every nuance.
Test your model against adversarial examples - like an apple with “iPod” taped to it - before shipping.
Keep a “data hygiene” log so you can trace and fix issues quickly when they pop up.
Establish a recurring “data audit day” where the team reviews and cleans the dataset.

Attack on an apple

5. Model Like You Mean It

Field Case - Lost in Translation
A social media sentiment model trained only on X (aka Twitter) slang failed miserably on LinkedIn posts. “Crushing it” meant “awesome” on X, but on LinkedIn it often meant “burnout incoming.”

Old Fail: Jumping to conclusions, skipping cross‑validation, choosing algorithms heavier than your infrastructure can handle.

Your Playbook:

Always split your data into training, validation, and test sets - and actually use them.
Match algorithm complexity to your deployment environment. A 200‑layer neural nets inside a mobile app — no!
Test on data from different sources to catch context‑drift issues early.
Monitor for model decay and retrain before performance drops below acceptable thresholds.
Keep a “model graveyard” of past experiments so you don’t repeat mistakes.

6. Build Applications That Can Survive the Real World

Field Case - The Chatbot That Went Rogue
An AI chatbot launched without proper safeguards. Within 24 hours, it was spewing offensive content because users figured out how to “train” it in real time.

Old Fail: No safeguards, scaling issues, switching to auto‑pilot too soon, not preparing for attacks.

Your Edge:

Simulate hostile user behavior before launch to see how your system reacts.
Keep a human review step in place until the model has proven itself in production.
Add anomaly detection and rate‑limiting to prevent abuse at scale.
Maintain a rapid‑response plan for rolling back or disabling features if something goes sideways.
Train your ops team to recognize and respond to early warning signs of failure.

Now with extra seamlessness — — who needs visible craft?

7. Monitor, Measure, Optimize — Forever

Field Case - The Three‑Minute Ride That Wasn’t
A ride‑sharing app’s ETA model started showing “3 minutes” for every ride, no matter the distance. Passengers were thrilled for about 30 seconds — until they realized the number never changed. Drivers were confused, support tickets piled up, and social media had a field day. The culprit? A server clock drifted by 17 minutes, throwing off the calculations. Monitoring caught it — but only after a week of chaos.

Old Fail: Assuming it just works, missing KPIs, skipping A/B testing, ignoring real‑user feedback.

Your Levers:

Define success metrics before launch and track them continuously.
Run A/B tests on model updates to measure real‑world impact.
Collect and act on feedback from actual users, not just your dev team.
Set up alerts for anomalies so you can fix issues before they become PR disasters.
Conduct consistent "postmortems" on both successes and setbacks to continue learning.

Google Translate doing its best at speaking Hausa

Final words

This post is a re‑imagining of a lecture I first gave back in 2019. Strangely enough, the challenges - and the remedies - have outlived the seismic shifts brought on by the rise of LLMs. The tech has evolved, the buzzwords have changed, but the fundamentals still decide who wins and who flames out. Get those fundamentals right, and your project won’t just survive. It’ll headline the main stage, strut in the spotlight, and have the whole boardroom singing along. 🙂

In a follow‑up post, I’ll dive deeper into my latest AI/LLM reflections - what’s changed, what hasn’t, and where I think the next big wins will come from. Stay tuned.

Originally posted here

Vibe Coding from the Trenches

eram — Wed, 14 May 2025 08:39:28 +0000

“I see you’re trying to show me the contents of that file, but it appears to be empty. Ah, now I see the issue. Let me try a different approach to debug this…”
— Cursor Agent 250511.

Alright, intro!

Let’s talk about the latest ripple making waves across the tech pond. You probably caught the buzz last week when Shopify’s CEO, Tobi Lütke, dropped an internal memo that basically set a new baseline: using AI isn’t just encouraged anymore, it’s practically expected. The memo even suggested teams need to justify not using AI before asking for more resources or headcount. Forbes quickly jumped on this, slapping a catchy (and maybe slightly terrifying?) “Hire AI, Not Humans” headline on it.

Now, before we dive deep, let’s be clear: this isn’t another think piece about the grand future of AGI, whether LLMs will pass the Turing test next Tuesday, or how the job market is going to morph entirely (though, yeah, those are big questions!).

Instead, I want to zoom in on something super relevant to those of us actually building things: the hype around Agentic Coders, or “Vibe Coding” as it is now tredy to call it. The promise is seductive, right? If AI can whip up graphics, translate languages on the fly, or slash presentation prep time, surely it can handle software engineering? Can startups really run lean with 5 devs instead of 50, like some breathless posts suggest? And personally, as an engineer, is this the silver bullet that finally unlocks that mythical “10x developer” status we all heard about?
(Co-pilot won’t let you create Studio Ghibli style illustrations any longer, but ChatGPT still does…)
Here we go: My Reality Check on “Vibe Coding”

To find out, I rolled up my sleeves and spent the last couple of months putting six different Vibe Coding products through their paces on real work. Here’s the lowdown from the trenches:

The Not-So-Groovy Parts (Where the Vibe Gets Weird):

Hallucination Station: Oh boy. I saw AI agents confidently referencing libraries that simply don’t exist, either in my codebase or anywhere on npm/PyPI/etc. They’d suggest calling phantom methods on perfectly good, existing libraries. They even started inventing configuration keys in my YAML and JSON files like they were writing avant-garde poetry for my linters.
Fantasy Code Chronicles: In one memorable instance, the agent decided the best way to parse something was with a complex regex. Anyone who’s wrestled with parsing knows: that’s usually a job for a lexer/parser combo. It wasn’t just suboptimal; it was the wrong tool for the job.
Regex Roadblock & Model Hopping: Speaking of regex, another time, the agent wrote flawed regex pattern… and then got completely stuck trying to fix its own mistake. I eventually had to switch to a different AI model (looking at you, o3-mini) just to get past it.
The “Ship It” Temptation: This is a sneaky one. The AI spits out code that looks plausible. It’s super tempting for a dev (especially one under pressure) to just grab it and move on without truly understanding the how or why. That’s tech debt waiting to happen, and good luck debugging it later if you didn’t grasp it initially.
Code Quality — Duplication Drama: I saw the same logic rewritten multiple times in slightly different ways, leading to classic code duplication. Not exactly DRY (Don’t Repeat Yourself).
Code Quality — Junior Moves: Experienced devs spend tons of time refactoring, generalizing, using inheritance, tweaking existing functions and classes. The agents? They often behaved like enthusiastic junior devs, preferring to “invent the world” by writing brand new code for every little thing instead of building smartly on what’s already there.
Silent Breaking Changes: One agent tweaked my Linter config. Seemed innocent enough, until I realized it necessitated small, annoying changes across the entire codebase. And the agent? Blissfully unaware of the ripple effect it caused.
Library Loyalty Issues: I use a specific, slightly less common open-source library (deepkit/types). The agent seemed determined to either replace it entirely or just ignore it, defaulting to more mainstream choices like Joi or even React libraries (in a backend context!). This got really obvious when there was a bug — instead of fixing the code using my preferred library, it just rewrote the whole section using a different library. Not helpful!
Security? What Security?: This was genuinely concerning. Agents generated code that introduced vulnerabilities — specifically, forgetting to add input validation for data coming from external sources and neglecting to implement authorization checks on newly created API endpoints. Yikes.
Lost in Translation (Jargon & Ambiguity): I often slipped up, using slightly “wrong” or ambiguous terms in my prompts. Asking for changes related to a ‘privilege’ when my code uses the term ‘permission’. Mentioning ‘swagger’ instead of the precise ‘OpenAPI’. Using internal shorthand like ‘CVE’ interchangeably with ‘vulnerability’ or ‘weakness’. A human teammate, even a junior one, would likely pause, ask for clarification, or realize a simple request shouldn’t require building a whole new universe of code. The agent? It often went full bazooka, interpreting a minor tweak request as a signal to construct entirely new models and libraries.
The Ripple Effect (also a Lost in Translation): Tools like Morph.ai promise seamless integration, letting agents act directly on JIRA tickets. Cool concept, but now your Product Managers and Business Analysts need to be super precise with their language, basically speaking the same internal dialect as your codebase. If a PM writes a story about changing a ‘product’ but your system calls it a ‘catalog ID’, the agent might get hopelessly confused or build the wrong thing entirely.

Honestly? There were definitely moments I felt like I was wasting more time, painstakingly explaining the requirements, trying to guide the refactoring process, pointing out missed edge cases. Absolutely had a few “why didn’t I just write this myself?!” breakdowns, cursing the digital void.

And yeah, you could argue that some of these issues — misunderstanding requirements, needing guidance on refactoring — sound like onboarding a new human team member. True! But here’s the kicker: the agent delivers its answers with absolute, unwavering confidence. It never doubts itself. Imagine having a teammate who knows they’re right, even when they’re completely off-base. Annoying, right? You wouldn’t want that vibe on your team.

But What About Those Slick Demos?

We’ve all seen them — those mind-blowing demos from tools like v0 or Lovable, where an entire app seemingly springs into existence from a simple prompt. And yes, they are awesome! For whipping up a quick prototype, testing a small concept, or scaffolding something based on simple templates, they can definitely work and speed things up.

But scaling that to a real-world, production-grade application? One with multiple microservices, hundreds or thousands of files, strict linting rules, established coding patterns, and specific company best practices? Sorry, but we’re not there yet. Check back next year, maybe?

Performance Puzzles:

Can these agents help optimize algorithms or find performance bottlenecks? Sometimes, maybe. They might point you in a vaguely correct direction. But just as often, I found they’d latch onto my (potentially) flawed hunch and dig the rabbit hole even deeper, suggesting ineffective fixes or repeating suggestions I’d already ruled out.

Okay, Okay — FOCUS! Where Does Vibe Coding Actually Shine?

It’s not all doom and gloom! Based on my experiments, these tools are genuinely useful for specific tasks:

Unit Test Automation: This is a big one. Pointing an AI at existing code and asking it to generate unit tests? It’s often surprisingly good at this, saving a ton of boilerplate time.
Refactoring Assistance (with Guardrails): Once you have those solid tests, AI can be a great partner in refactoring older, gnarlier code. It can help explain undocumented logic and then assist in rewriting it using more modern approaches. Crucially, the tests keep it honest.
Reducing Handoff Friction: Need to jump into a part of the codebase you’ve never touched before? An AI agent can help you add small features or make targeted changes without needing a deep, upfront understanding of the entire module. Think code bits, not massive features.
Language Learning Accelerator: If you’re a JavaScript pro needing to dabble in Java (or vice-versa), these tools are fantastic tutors, helping you translate concepts and syntax quickly.
Automated Code Review Buddy: When you think your code is ready, running it past an AI reviewer can surface genuinely insightful suggestions for improvement, catching things you might have missed. Like, 25% good suggestions and 75% nagging. But still.
Skip Google and forget Stack Overflow. This one is 75% good for the Viber. If you know what your question should be, it will give you a concise answer, spearing you from endless scrolling on the internet.

The Bottom Line: Peer Programmer, Not Replacement (Yet?)

Let’s crunch some rough numbers. Assume a developer spends about 5 hours a day actually heads-down coding (the rest is meetings, design, reviews, learning, lunching etc.). If AI tools make that coding time, say, 30% more efficient — which feels plausible to me — that translates to roughly a 15–20% overall productivity boost for a typical developer.

That’s huge! It’s a significant improvement and absolutely justifies Tobi Lütke’s stance that reflexive AI usage should be the baseline.

But does it mean we’re replacing devs wholesale or creating legions of 10x engineers overnight? Nope. Not anytime soon. It’s powerful augmentation, a new tool in the belt that makes good developers better and faster at certain things. It helps, it assists, it speeds up knowledge building. But the human element — the critical thinking, the architectural vision, the nuanced understanding of requirements, the collaborative problem-solving, the ability to doubt and question — remains absolutely essential.

So yeah, embrace the AI assistants. Expect them to be part of the workflow. But keep your brilliant human developers close — you’re going to need them more than ever to wield these powerful new tools effectively.