Direct model integration is not a crime.
But if image recognition in your product is not the whole point of the system and is just one feature, an unpleasant thing shows up very quickly: the backend starts doing work that is not really its job.
I am just describing the case of my own. Maybe it's not so important for you. But it's something - for me.
This is not a holy war against SDKs.
SDKs are not the real issue here.
The real issue is where all the mess around the model should live so the application does not slowly start rotting from the inside.
Because one thing is when AI is the product. In that case, yes, build around it.
It is a different story when AI is just one function inside a larger system. A user uploads an image, and you want structured data out of it. A receipt. A document. A label. A business card. A book cover. It does not matter. The object changes clothes, but the mechanics stay the same.
That is where I like a simple boundary: the app owns auth, validation, and business state, while n8n owns the model-facing orchestration.
We Have Seen This Before
The first move is usually predictable.
You need image recognition, so you wire the model directly into the backend.
Sometimes that is fine. Really fine. No need to turn it into drama.
But then the usual human mess begins:
- prompts need tuning
- model responses need cleaning
- something needs normalization
- something needs retries
- something needs a fallback
- somebody wants provider switching
- and one day you realize the model is not returning data, it is returning a semi-liquid mistake
That is the point where the backend starts absorbing concerns that are not really application concerns.
At first it looks harmless.
Later it does not.
Where the Real Problem Is
The problem, as usual, is not where people like to look for it.
The problem is not whether the model can look at an image.
That is the easy part.
The real problem is where all the surrounding mechanics end up living.
Once prompts, branching, retries, output shaping, provider-specific details, and callback behavior get fused into backend code, even a small AI change starts behaving like an application release.
Sometimes not even a small one.
For a feature that is only one part of a bigger system, that is a bad trade.
In cases like this, I want a different split:
- business state stays inside the app
- everything model-facing moves into a layer that is easier to change, inspect, and repair
In practice, that layer is often n8n.
Short Version
- the application owns auth, validation, state, and the final business decision
-
n8nhandles preprocessing, model calls, branching, retries, and transformation - what comes back into the app is not magic but a contract-bound structured payload
- the model-facing logic stays easier to replace, observe, and iterate on
What You Are Actually Building
I do not think about this as "AI inside the app."
It is a different construction.
It is a narrow integration boundary:
- the app accepts an image
-
n8nruns the workflow - the app receives normalized output and does its normal work
That is it.
Put more bluntly, the app should remain an app, not turn into a nervous half-erased orchestrator for external APIs.
The Basic Split
The split is simple:
- the application owns transport, auth, validation, and business state
-
n8nowns the model-facing orchestration
That means the application does not need to know:
- which model is currently in use
- how prompts are written
- how retries are configured
- how the image is preprocessed
- whether the provider changes next month
In practice, this usually leaves the app with two ordinary endpoints:
- one accepts the image and starts the workflow
- one accepts normalized output and performs normal business work
And honestly, that is a good sign. When a system does not start growing extra limbs the moment you add AI, things are usually going in the right direction.
Architecture
Browser
│
▼
Your App
│
├── accepts image + user context
├── assigns request ID / idempotency key
▼
n8n
│
├── preprocess image
├── fetch domain context if needed
├── call the model
├── parse and normalize the result
├── run confidence / policy check
├── route to fallback or manual review if needed
└── send a signed callback to the app
▼
Your App
│
├── validate the contract
├── verify callback signature
├── apply domain logic
└── store audit trail
What matters to me here is not only where the model call happens.
What matters more is this:
- who owns state
- who validates output
- how retries live
- what exactly crosses the boundary back into the app
Because anyone can draw a happy path.
Then the real world shows up and starts hitting it in the head.
The Difference in One Table
| Aspect | AI in backend | AI orchestration via n8n
|
|---|---|---|
| Prompts | live in application code | live in workflow logic |
| Retries / branching | mixed into backend behavior | live in the orchestration layer |
| Model switching | drags backend risk with it | changes at workflow level |
| Output shaping | another special code path | handled inside workflow transformations |
| Audit / execution visibility | needs to be built by hand | much of it is visible in the execution path |
| Business state | mixed with AI flow | stays in the app |
What the Application Should Actually Do
I generally like the app to stay boring in places like this.
Boring is a good sign here.
Endpoint 1: start the workflow
It should:
- accept the image
- assign a request ID or correlation ID
- validate size, MIME type, and coarse input constraints
- pass either the file or a storage reference into
n8n - pass user-scoped context if it is actually needed
- return either an immediate result or an accepted-processing response
It should not:
- build prompts
- call model SDKs directly
- parse model output
- implement retry policy for model failures
- carry vendor-specific tuning concerns
If the payload is heavy or sensitive, I would much rather pass a short-lived storage reference than drag raw image bytes through every hop. Because I can. And because it usually means less swearing later.
Endpoint 2: accept normalized output
It should:
- accept structured JSON from
n8n - verify callback authenticity
- enforce idempotency on repeated delivery
- validate the payload against a stable contract
- create, update, or link entities
- record the result for audit and debugging
- return a normal application response
It should not:
- care which exact model produced the result
- care how the image was analyzed
- care how retries and fallbacks lived upstream
- treat AI payload as some kind of sacred business object
If tomorrow the same payload comes from a CSV import, a human form, or an internal service, this endpoint should behave the same way.
That is what a correct boundary looks like.
Why I Like This Pattern
Because it puts responsibility where it belongs.
The app keeps:
- authentication
- permissions
- domain validation
- database writes
- auditability
- final business decisions
n8n keeps:
- model orchestration
- prompt engineering
- image preprocessing
- retry behavior
- workflow branching
- fallback behavior
- execution visibility
- vendor switching
I do not think of n8n here as "the place where prompts live."
That description is too poor.
For me it is the operational layer around the model call. The layer where all the surrounding mess lives without being smeared across backend code.
The Most Important Part: Domain Context
This is where it actually gets interesting.
Because the whole story is not "send an image to AI."
Any fool can do that.
The interesting part starts when, before the model call, you pull live domain context from the app into the workflow:
- category trees
- allowed statuses
- existing entities
- formatting rules
- language codes
- internal taxonomies
Then instead of asking:
"What category does this belong to?"
you ask:
"Choose the best option from this exact list of categories that already exists in the system."
At that point the answer starts moving from "well, sounds plausible" toward "this can actually be fed back into the machine."
That difference is not cosmetic.
Without context, AI usually gives you text.
With context, it sometimes starts giving you usable input.
That is a very different conversation.
A Realistic Flow
In practice, this usually looks like:
- the client uploads an image to the app
- the app validates the request and assigns a request ID
- the app passes the file or storage reference to
n8n -
n8nnormalizes the image -
n8npulls domain context -
n8nsends image and context to the model -
n8nparses and normalizes the result -
n8nruns confidence or policy checks -
n8neither calls back or routes the case to manual review - the app validates the callback and applies domain logic
For a huge number of real tasks, that is more than enough.
Why n8n Fits Here At All
Because this is mostly an orchestration problem, not a "write some more backend code" problem.
You need a layer that can:
- wait for external APIs
- branch on errors
- survive transient failures
- transform payloads
- insert a review step
- call back into the app
- show execution history
Yes, you can drag all of that into the application.
You can also write your own bus, your own retry engine, your own tracing layer, and your own wrapper around the model.
You can.
The only question is why that should be the default move when AI is one function in the system, not the meaning of the system.
Security Notes
This only works while trust boundaries do not turn into a circus.
Provider secrets should not live in the app
The model vendor key is better off in n8n, not in the frontend and not in every corner of the main application.
The workflow entry point needs protection
Do not expose a webhook that anybody can use to throw arbitrary images at your system. That is a bad idea not because it is morally naughty, but because it will hurt later and may also cost you money.
Callbacks need adult handling
Use signed callbacks, a trusted caller policy, or both. Callback authenticity should be part of the contract, not a matter of faith.
Do not log sensitive payloads carelessly
That includes:
- raw image bytes
- base64 payloads
- bearer tokens
- full model payloads with user data
This is exactly the kind of thing that comes back later carrying an axe.
What This Gives You
If you go this way, several very concrete things get easier:
- changing prompts without rewriting application logic
- testing workflow behavior separately from core app behavior
- adding retries, branching, and review steps without backend sprawl
- changing models with less backend coupling
- feeding more domain context later
- keeping a clearer execution path for debugging
What You Still Pay For
Nothing here is free.
You are adding:
- one more moving part
- one more network hop
- one more execution surface
- callback idempotency concerns
- audit and traceability work
- evaluation drift if the model starts behaving differently over time
If n8n goes down, the feature goes down with it.
If the callback contract is sloppy, the integration will start rotting.
If nobody watches executions, retries, and bad outputs, the workflow will quietly degrade without much warning.
That is an honest price.
I still prefer it to hiding the same complexity deeper in application code where it is harder to see and more annoying to change.
The Rule I Keep
If the application owns business state, let it keep owning business state.
If the workflow owns model calls, retries, validation gates, and payload cleanup, let it keep owning those too.
Do not mix the two just because "well, technically we can."
Technically, we can do lots of stupid things.
Final Take
I think the question "how do I avoid SDKs forever?" is slightly crooked to begin with.
The more useful question is:
"Where should the model-facing logic live so the rest of the system does not become fragile, muddy, and expensive to maintain?"
When image recognition is just one feature inside a bigger product, n8n acting as the orchestration layer is often a perfectly sane answer.
Top comments (0)