Grigoriy Krasilnikov

Posted on Mar 30

A Production Pattern for AI Image Recognition Without Hardwiring Model Logic Into Your Backend

#ai #automation #architecture

Direct model integration is not a crime.

But if image recognition in your product is not the whole point of the system and is just one feature, an unpleasant thing shows up very quickly: the backend starts doing work that is not really its job.

I am just describing the case of my own. Maybe it's not so important for you. But it's something - for me.

This is not a holy war against SDKs.

SDKs are not the real issue here.

The real issue is where all the mess around the model should live so the application does not slowly start rotting from the inside.

Because one thing is when AI is the product. In that case, yes, build around it.

It is a different story when AI is just one function inside a larger system. A user uploads an image, and you want structured data out of it. A receipt. A document. A label. A business card. A book cover. It does not matter. The object changes clothes, but the mechanics stay the same.

That is where I like a simple boundary: the app owns auth, validation, and business state, while n8n owns the model-facing orchestration.

We Have Seen This Before

The first move is usually predictable.

You need image recognition, so you wire the model directly into the backend.

Sometimes that is fine. Really fine. No need to turn it into drama.

But then the usual human mess begins:

prompts need tuning
model responses need cleaning
something needs normalization
something needs retries
something needs a fallback
somebody wants provider switching
and one day you realize the model is not returning data, it is returning a semi-liquid mistake

That is the point where the backend starts absorbing concerns that are not really application concerns.

At first it looks harmless.

Later it does not.

Where the Real Problem Is

The problem, as usual, is not where people like to look for it.

The problem is not whether the model can look at an image.

That is the easy part.

The real problem is where all the surrounding mechanics end up living.

Once prompts, branching, retries, output shaping, provider-specific details, and callback behavior get fused into backend code, even a small AI change starts behaving like an application release.

Sometimes not even a small one.

For a feature that is only one part of a bigger system, that is a bad trade.

In cases like this, I want a different split:

business state stays inside the app
everything model-facing moves into a layer that is easier to change, inspect, and repair

In practice, that layer is often n8n.

Short Version

the application owns auth, validation, state, and the final business decision
n8n handles preprocessing, model calls, branching, retries, and transformation
what comes back into the app is not magic but a contract-bound structured payload
the model-facing logic stays easier to replace, observe, and iterate on

What You Are Actually Building

I do not think about this as "AI inside the app."

It is a different construction.

It is a narrow integration boundary:

the app accepts an image
n8n runs the workflow
the app receives normalized output and does its normal work

That is it.

Put more bluntly, the app should remain an app, not turn into a nervous half-erased orchestrator for external APIs.

The Basic Split

The split is simple:

the application owns transport, auth, validation, and business state
n8n owns the model-facing orchestration

That means the application does not need to know:

which model is currently in use
how prompts are written
how retries are configured
how the image is preprocessed
whether the provider changes next month

In practice, this usually leaves the app with two ordinary endpoints:

one accepts the image and starts the workflow
one accepts normalized output and performs normal business work

And honestly, that is a good sign. When a system does not start growing extra limbs the moment you add AI, things are usually going in the right direction.

Architecture

Browser
  │
  ▼
Your App
  │
  ├── accepts image + user context
  ├── assigns request ID / idempotency key
  ▼
n8n
  │
  ├── preprocess image
  ├── fetch domain context if needed
  ├── call the model
  ├── parse and normalize the result
  ├── run confidence / policy check
  ├── route to fallback or manual review if needed
  └── send a signed callback to the app
  ▼
Your App
  │
  ├── validate the contract
  ├── verify callback signature
  ├── apply domain logic
  └── store audit trail

What matters to me here is not only where the model call happens.

What matters more is this:

who owns state
who validates output
how retries live
what exactly crosses the boundary back into the app

Because anyone can draw a happy path.

Then the real world shows up and starts hitting it in the head.

The Difference in One Table

Aspect	AI in backend	AI orchestration via `n8n`
Prompts	live in application code	live in workflow logic
Retries / branching	mixed into backend behavior	live in the orchestration layer
Model switching	drags backend risk with it	changes at workflow level
Output shaping	another special code path	handled inside workflow transformations
Audit / execution visibility	needs to be built by hand	much of it is visible in the execution path
Business state	mixed with AI flow	stays in the app

What the Application Should Actually Do

I generally like the app to stay boring in places like this.

Boring is a good sign here.

Endpoint 1: start the workflow

It should:

accept the image
assign a request ID or correlation ID
validate size, MIME type, and coarse input constraints
pass either the file or a storage reference into n8n
pass user-scoped context if it is actually needed
return either an immediate result or an accepted-processing response

It should not:

build prompts
call model SDKs directly
parse model output
implement retry policy for model failures
carry vendor-specific tuning concerns

If the payload is heavy or sensitive, I would much rather pass a short-lived storage reference than drag raw image bytes through every hop. Because I can. And because it usually means less swearing later.

Endpoint 2: accept normalized output

It should:

accept structured JSON from n8n
verify callback authenticity
enforce idempotency on repeated delivery
validate the payload against a stable contract
create, update, or link entities
record the result for audit and debugging
return a normal application response

It should not:

care which exact model produced the result
care how the image was analyzed
care how retries and fallbacks lived upstream
treat AI payload as some kind of sacred business object

If tomorrow the same payload comes from a CSV import, a human form, or an internal service, this endpoint should behave the same way.

That is what a correct boundary looks like.

Why I Like This Pattern

Because it puts responsibility where it belongs.

The app keeps:

authentication
permissions
domain validation
database writes
auditability
final business decisions

`n8n` keeps:

model orchestration
prompt engineering
image preprocessing
retry behavior
workflow branching
fallback behavior
execution visibility
vendor switching

I do not think of n8n here as "the place where prompts live."

That description is too poor.

For me it is the operational layer around the model call. The layer where all the surrounding mess lives without being smeared across backend code.

The Most Important Part: Domain Context

This is where it actually gets interesting.

Because the whole story is not "send an image to AI."

Any fool can do that.

The interesting part starts when, before the model call, you pull live domain context from the app into the workflow:

category trees
allowed statuses
existing entities
formatting rules
language codes
internal taxonomies

Then instead of asking:

"What category does this belong to?"

you ask:

"Choose the best option from this exact list of categories that already exists in the system."

At that point the answer starts moving from "well, sounds plausible" toward "this can actually be fed back into the machine."

That difference is not cosmetic.

Without context, AI usually gives you text.

With context, it sometimes starts giving you usable input.

That is a very different conversation.

A Realistic Flow

In practice, this usually looks like:

the client uploads an image to the app
the app validates the request and assigns a request ID
the app passes the file or storage reference to n8n
n8n normalizes the image
n8n pulls domain context
n8n sends image and context to the model
n8n parses and normalizes the result
n8n runs confidence or policy checks
n8n either calls back or routes the case to manual review
the app validates the callback and applies domain logic

For a huge number of real tasks, that is more than enough.

Why `n8n` Fits Here At All

Because this is mostly an orchestration problem, not a "write some more backend code" problem.

You need a layer that can:

wait for external APIs
branch on errors
survive transient failures
transform payloads
insert a review step
call back into the app
show execution history

Yes, you can drag all of that into the application.

You can also write your own bus, your own retry engine, your own tracing layer, and your own wrapper around the model.

You can.

The only question is why that should be the default move when AI is one function in the system, not the meaning of the system.

Security Notes

This only works while trust boundaries do not turn into a circus.

Provider secrets should not live in the app

The model vendor key is better off in n8n, not in the frontend and not in every corner of the main application.

The workflow entry point needs protection

Do not expose a webhook that anybody can use to throw arbitrary images at your system. That is a bad idea not because it is morally naughty, but because it will hurt later and may also cost you money.

Callbacks need adult handling

Use signed callbacks, a trusted caller policy, or both. Callback authenticity should be part of the contract, not a matter of faith.

Do not log sensitive payloads carelessly

That includes:

raw image bytes
base64 payloads
bearer tokens
full model payloads with user data

This is exactly the kind of thing that comes back later carrying an axe.

What This Gives You

If you go this way, several very concrete things get easier:

changing prompts without rewriting application logic
testing workflow behavior separately from core app behavior
adding retries, branching, and review steps without backend sprawl
changing models with less backend coupling
feeding more domain context later
keeping a clearer execution path for debugging

What You Still Pay For

Nothing here is free.

You are adding:

one more moving part
one more network hop
one more execution surface
callback idempotency concerns
audit and traceability work
evaluation drift if the model starts behaving differently over time

If n8n goes down, the feature goes down with it.

If the callback contract is sloppy, the integration will start rotting.

If nobody watches executions, retries, and bad outputs, the workflow will quietly degrade without much warning.

That is an honest price.

I still prefer it to hiding the same complexity deeper in application code where it is harder to see and more annoying to change.

The Rule I Keep

If the application owns business state, let it keep owning business state.

If the workflow owns model calls, retries, validation gates, and payload cleanup, let it keep owning those too.

Do not mix the two just because "well, technically we can."

Technically, we can do lots of stupid things.

Final Take

I think the question "how do I avoid SDKs forever?" is slightly crooked to begin with.

The more useful question is:

"Where should the model-facing logic live so the rest of the system does not become fragile, muddy, and expensive to maintain?"

When image recognition is just one feature inside a bigger product, n8n acting as the orchestration layer is often a perfectly sane answer.

DEV Community