marcosomma

Posted on Mar 17

I Am Tired of Fake AI Expertise

#programming #ai #opensource #career

I have spent the last year trying to talk about AI as an engineering discipline.

Not AI as a content machine. Not AI as a growth trick. Not AI as a stream of screenshots, prompt hacks, and recycled takes written by the same models people claim to master.

I mean AI as systems work.

Orchestration. Validation. Data quality. Observability. Evaluation. Failure handling. Context boundaries. Retry policies. Structured outputs. Cost control at the workflow level. Real interfaces between probabilistic components and deterministic software.

And honestly, part of the reason I stepped back from that conversation is simple: too much of the public AI discourse is being led by people who do not build real AI systems.

They are loud. They are polished. They are confident. They are often rewarded for being confidently wrong.

That is the part that disappoints me.

The current wave of self proclaimed "AI experts" is flattening a difficult field into a set of cheap slogans. A domain that requires serious expertise is being turned into social media theatre. And the result is not just annoying. It is actively harmful.

It is making people misunderstand what AI is, how it fails, where it costs money, and what actually makes it useful in production.

The field is being narrated by people who optimize for reach, not rigor

Recently I saw yet another high visibility post making a big point about format optimization and token savings, as if shaving a few characters from JSON were some major breakthrough in AI engineering.

This is the kind of thing that gets thousands of likes.

A side by side screenshot.
A catchy claim.
A simple narrative.
A fake sense of leverage.

And once again the message was basically this: if you are still doing things the old way, you are wasting money.

This is the language of marketing, not engineering.

The problem is not that someone shared an imperfect idea. Imperfect ideas are fine. Early exploration is fine. Public discussion is fine. We all get things wrong.

The problem is the posture of expertise around it.

There is a massive difference between saying, "I tried this and here are the results, caveats, and failure modes," and saying, "Here is the better way," when the claim is based on shallow intuition, weak evidence, and no visible system level execution.

That difference matters.

Because a lot of people reading those posts are not experienced enough to detect the gap.

They see confidence and assume competence.
They see engagement and assume validity.
They see a title and assume credibility.

And that is how misinformation spreads in technical fields. Not through obvious lies, but through reduction. Through oversimplification. Through confident framing of weak ideas.

Tokens have become marketing

One of the worst examples of this is token discourse.

Tokens matter. Of course they matter. Costs matter. Latency matters. Compression matters. Input design matters.

But token count has become the vanity metric of AI engineering.

It is easy to post about because it looks measurable. It fits inside a screenshot. It creates a simple hero story. "Look, I reduced 40 percent of the tokens." Great. And what happened to reliability? What happened to parse consistency? What happened to failure recovery? What happened to total workflow cost after retries, validation, tool calls, and fallback paths?

That is the real question.

A shorter prompt is not automatically a better system.
A smaller payload is not automatically a better architecture.
A new text format is not automatically a better interface for a stochastic model.

Sometimes saving tokens means losing robustness.
Sometimes saving tokens means increasing ambiguity.
Sometimes saving tokens means moving complexity downstream into validation and repair.
Sometimes saving tokens means nothing at all, because the real cost of the system is somewhere else.

This is what too many public AI voices still fail to understand.

AI cost is not just prompt cost.
AI quality is not just output prettiness.
AI engineering is not just model interaction.

The real economy of AI is at the system level.

The real cost is in everything around the model

If you have ever shipped a real AI feature, you know where the effort goes.

It goes into making sure the right context is available at the right moment.
It goes into preventing irrelevant context from leaking in.
It goes into checking whether the model output is complete, valid, safe, and usable.
It goes into retries when the model drifts.
It goes into routing when one step should not be handled by the same prompt as another.
It goes into fallback strategies when the first attempt is weak.
It goes into evaluating whether a result is acceptable before it reaches a user.
It goes into observability so you can explain why the system behaved the way it did.
It goes into datasets so your judgments are not based on vibes.
It goes into data quality so the model is not forced to reason on garbage.

That is where the tokens get burned.

And that is correct.

Those tokens are not waste. They are the cost of making a probabilistic component useful inside a product.

This is what so much AI content gets backwards. It treats the model call as the whole system. It assumes the right prompt is the product. It implies that if you phrase the question well enough, the problem is solved.

That is not how production works.

A prompt is an input. A model is a stochastic component. A product is a controlled system around them.

If you collapse those distinctions, you are not doing AI engineering. You are gambling.

Prompting alone is not engineering. It is gambling.

I keep repeating this line because I think it cuts to the center of the problem.

Prompting alone is not engineering. It is gambling.

Yes, prompting matters. Yes, prompt design can improve outcomes. Yes, well structured instructions can reduce confusion and guide the model.

But prompting is not a substitute for architecture.

It is not a substitute for validation.
It is not a substitute for proper interfaces.
It is not a substitute for evaluation.
It is not a substitute for state control.
It is not a substitute for business rules.
It is not a substitute for deterministic code where deterministic code should exist.

And yet an absurd amount of public AI discourse still acts as if prompting is the main skill. As if being fluent in prompt phrasing is equivalent to understanding AI systems.

It is not.

A person can be very good at prompting and still have almost no understanding of reliability engineering, retrieval quality, orchestration design, evaluation methodology, observability, or failure containment.

That is why I have become increasingly skeptical of AI advice that starts and ends with "here is a better prompt."

A better prompt for what?
Under which constraints?
With what model?
Against which dataset?
Measured how?
Compared to what baseline?
Under what latency budget?
With what failure rate?
With what retry policy?
Inside what workflow?
At what scale?
For which users?
Against which acceptance criteria?

Without those questions, we are not discussing engineering. We are discussing prompt aesthetics.

2025 was the year of demos. 2026 should be different.

I can understand how we got here.

In 2025, the industry was still drunk on demos. That phase made sense. Everything felt new. Chat interfaces looked magical. People discovered that a model could generate code, write marketing copy, extract structure from text, summarize documents, and imitate expertise with frightening smoothness.

So of course the conversation was dominated by novelty.

People were exploring.
People were guessing.
People were posting every new trick they found.
The market rewarded velocity, not discipline.

Fine.

But we are not there anymore.

In 2026, this excuse is weaker. We have already seen enough failures, hallucinations, broken agents, fake automation, and "AI powered" wrappers to know that prompting your way through complexity does not scale.

We should be having better conversations by now.

We should be talking more about evaluation design than prompt poetry.
We should be talking more about system boundaries than persona tuning.
We should be talking more about retrieval quality than format gimmicks.
We should be talking more about workflow control than chatbot charisma.

Instead, too many large accounts are still posting beginner level content with expert level confidence.

That is not harmless. It distorts the learning environment for everyone coming into the field.

This is why so much AI still does not work

A lot of people ask why AI products still feel fragile.

Why do they fail on edge cases?
Why do they break in production?
Why do they look impressive in demos and weak in real usage?
Why do teams burn money without creating durable value?
Why do so many "agents" look like wrappers with marketing?

This is part of the answer.

Because too many people still think AI is an oracle.

They still approach it like a mystical reasoning engine that only needs the right wording. They still believe the model is the product. They still imagine that clever prompting is a replacement for engineering discipline.

So they underinvest in everything that actually makes the system work.

They underinvest in ground truth data.
They underinvest in evals.
They underinvest in routing logic.
They underinvest in structured interfaces.
They underinvest in observability.
They underinvest in negative testing.
They underinvest in validation.
They underinvest in deterministic controls.

Then they are surprised when the system behaves like a stochastic component with partial competence and unstable boundaries.

That surprise is not a model failure. It is a design failure.

AI does not fail because it is useless.
AI fails because people keep trying to deploy it as magic.

Expertise should be demonstrated, not announced

The most frustrating part is not being wrong. Everyone is wrong sometimes.

The most frustrating part is the performance of expertise.

The field is full of titles, badges, self descriptions, and aesthetic authority. "Top voice." "AI expert." "Thought leader." "Award winning." Fine. None of that tells me whether you understand evaluation drift, state leakage, retrieval contamination, schema reliability, fallback routing, or cost accumulation across a multi step pipeline.

Show me the system.
Show me the logs.
Show me the benchmark.
Show me the constraints.
Show me the failure modes.
Show me the tradeoffs.
Show me the production scars.

That is what builds credibility.

I trust practitioners who expose uncertainty and show their work. I trust people who can explain not just what succeeded, but what broke and why. I trust engineers who understand that AI is not one prompt and one output, but an unstable component that becomes useful only when surrounded by structure.

I do not trust polished certainty without evidence.

And I think more of us need to say that openly.

We need less AI theatre and more systems thinking

This article is not a call to stop experimenting. It is the opposite.

Experiment more. Build more. Test more. Share results more.

But stop pretending that shallow takes are deep expertise.
Stop teaching people that token screenshots are system design.
Stop selling prompting as if it were engineering.
Stop flattening a hard field into content loops.

If you want better AI products, treat AI like what it is: a probabilistic system component that must be constrained, validated, observed, and integrated with care.

That is less sexy than "10 prompts that changed my workflow."
It is less viral than side by side screenshots.
It is less accessible than fake certainty.

But it is real.

And right now, real is exactly what this field needs more of.

Because the problem is no longer that AI is misunderstood by outsiders.

The problem is that too much of it is being misexplained by insiders.

If we want the field to mature, we need fewer self proclaimed experts and more actual practitioners.

Not louder people.
Better ones!