This is Part 4 of a series building a production-ready semantic search API with Java, Spring Boot, and pgvector.
Part 1 covered the architecture.
Part 2 defined the schema.
Part 3 handled the embeddings — how text becomes vectors.
Each piece worked in isolation.
But systems don't fail in isolation — they fail at the boundaries.
If you've ever built a feature that worked perfectly on its own but broke the moment you connected it to everything else — this article is about preventing that.
At this point, we have a schema that can store documents and an embedding layer that can generate vectors.
But nothing connects them. A document has nowhere to go. A query has no pipeline.
This is where the service layer comes in.
This is a production-style implementation — not a demo. The full project structure, tests, and configuration are available on GitHub.
What Does the Service Layer Actually Do?
The database stores state, but it doesn't understand it.
PENDING, READY, and FAILED only become meaningful once the service layer defines when those transitions happen and what triggers them.
When a document arrives, the service decides the order of operations — save first, embed second, update on success, record failure explicitly if something goes wrong.
Search follows the same pattern. A query doesn't go straight to the database. It's first converted into an embedding, then passed through a query that applies lifecycle constraints, metadata filters, and scoring thresholds.
The service layer controls that entire pipeline.
The service layer owns one thing: the rules that make the system predictable.
Without it, the system is just a collection of correct but disconnected components.
HTTP Request
│
▼
Controller Layer ← validates input, delegates to service
│
▼
Service Layer ← all decisions happen here
│ │
▼ ▼
Repository Layer Embedding Layer
(JPA + JdbcTemplate) (EmbeddingClient interface)
│ │
▼ ▼
PostgreSQL + pgvector OpenAI API
The Interface That Keeps Everything Clean
The service layer exposes one interface to the rest of the application:
public interface DocumentService {
CreateDocumentResponse create(CreateDocumentRequest request);
DocumentResponse getById(Long id);
SearchResponse search(SearchRequest request);
}
Controllers depend on the interface, not the implementation.
Defining the contract as an interface and hiding the implementation behind it is what makes the system testable and changeable without cascading updates across the codebase.
The more important detail is what does not cross this boundary.
The Document entity never crosses this boundary — by design. Controllers receive DTOs, not persistence objects.
That separation means the database schema and the API contract can evolve independently. The schema can change without breaking clients. The API can change without rewriting persistence logic.
Why this matters to you: If you've ever had a database change break your API — or an API change force a database rewrite — this boundary is what prevents that. Define it early and hold it firmly.
What Happens When Embedding Fails?
From the outside, creating a document looks simple. Send a document, get an ID back.
Inside the service, everything is built around one assumption: the second step might fail.
@Override
@Transactional
public CreateDocumentResponse create(CreateDocumentRequest request) {
Document saved = saveAsPending(request);
embedAndPersist(
saved.getId(),
saved.getTitle(),
saved.getContent()
);
return new CreateDocumentResponse(
saved.getId(),
DocumentStatus.READY
);
}
Two lines, two distinct operations.
The first saves the document immediately with a status of PENDING.
The document exists in the database before any embedding call is made.
If the application crashes at this point, the document is already there with a recoverable state.
The second calls the OpenAI API, generates the embedding, and updates the document to READY.
If this step fails, the document moves to FAILED instead, and the error is stored directly in the database.
POST /documents
│
▼
saveAsPending()
status = PENDING ← document is safe in the database
│
▼
embedAndPersist()
│
┌──┴──────────────┐
│ │
▼ ▼
status = READY status = FAILED
searchable error stored in DB
excluded from search
There's an alternative that looks simpler — embed first, then save.
It removes a step but removes visibility. If embedding fails in that model, the document never exists. There's no record, no state, nothing to debug.
By saving first, every attempt leaves a trace.
Failures don't disappear.
They become data.
This pattern — save first, embed second — is the difference between a failure you can debug and one that just disappears.
Here's how the failure handling actually works:
private void embedAndPersist(Long documentId, String title, String content) {
try {
float[] embedding = embeddingClient.embed(title + "\n\n" + content);
int updated = jdbcTemplate.update(SQL_UPDATE_EMBEDDING,
toPgVectorLiteral(embedding), documentId);
if (updated != 1) {
throw new IllegalStateException(
"Unexpected row count updating embedding for document id=" + documentId);
}
} catch (IllegalStateException e) {
throw e;
} catch (Exception e) {
markFailed(documentId, e.getMessage());
throw new RuntimeException("Embedding failed for document id=" + documentId, e);
}
}
Three decisions here worth understanding:
Title and content are concatenated for embedding.
title + "\n\n" + contentgives the model full context. A document titled "Payment Failure Handling Policy" with content about retry logic produces a richer embedding than the content alone.IllegalStateExceptionis re-thrown unchanged. If the update affects zero or more than one row, something is wrong with the database state — not the embedding call. That error should propagate as-is rather than being wrapped as an embedding failure.Everything else triggers
markFailed. Network timeouts, rate limits, malformed responses — any exception that isn't anIllegalStateExceptionrecords the failure and re-throws. The caller sees the failure. The database gets a record of what went wrong.
Most API integration failures are silent. This makes them loud.
Search — The Pipeline That Ties Everything Together
Search is the most complex operation in the service. It touches the embedding layer, the repository, and the database — and it has to coordinate all three correctly.
What makes it manageable is not reducing that complexity, but containing it deliberately.
The orchestration method is deliberately small:
@Override
public SearchResponse search(SearchRequest request) {
String qVector = embedQuery(request.getQuery());
List<SearchResultItem> items = fetchResults(
request,
qVector
);
int total = countResults(
qVector,
request.getFilters(),
request.getMinScore()
);
return new SearchResponse(
request.getPage(),
request.getSize(),
total,
items
);
}
Four lines. Each delegates to a private method with a clear name.
The method reads like a description of the search process — embed the query, fetch the results, count the total, return the response.
The how is pushed down into methods that can be reasoned about in isolation.
private String embedQuery(String query) {
return toPgVectorLiteral(embeddingClient.embed(query));
}
The query goes through the same embedding client used for documents.
That symmetry matters — the query and the stored documents exist in the same vector space. Without it, similarity search would be meaningless.
The SQL is constructed in two layers: the inner query selects candidates and computes similarity, while the outer query applies score thresholds and pagination.
The split isn't stylistic. PostgreSQL cannot reference a SELECT alias in a WHERE clause at the same query level — which is why cosine_distance must be resolved in a subquery before the score threshold can filter on it.
SELECT * FROM (
SELECT id, title, content, metadata,
(embedding <=> ?::vector) AS cosine_distance
FROM documents
WHERE status = 'READY'
AND embedding IS NOT NULL
AND (metadata->>'category') = ?
) AS sub
WHERE (((1.0 - cosine_distance) + 1.0) / 2.0) >= ?
ORDER BY cosine_distance ASC
LIMIT ? OFFSET ?;
If you've ever wondered why your JPA queries feel limiting for complex use cases — this is where you cross that line deliberately.
Why JPA Isn’t Enough for Vector Search
The search query isn't static.
Metadata filters, score thresholds, and pagination all change the SQL at runtime.
At that point the abstraction provided by JPA starts to break down — you're no longer mapping objects, you're constructing a query.
That's where QueryBuilder comes in:
private static class QueryBuilder {
private final StringBuilder sql;
private final List<Object> params = new ArrayList<>();
QueryBuilder(String baseSql, String firstParam) {
this.sql = new StringBuilder(baseSql);
this.params.add(firstParam);
}
QueryBuilder(String baseSql, QueryBuilder source) {
this.sql = new StringBuilder(baseSql);
this.params.addAll(source.params);
}
}
The two constructors mirror the structure of the query – inner and outer.
The first builds the inner query.
The second builds the outer query, inheriting parameters from the inner one without tracking them manually.
Where injection risk actually lives:
void applyFilters(Map<String, String> filters) {
if (filters == null || filters.isEmpty()) return;
for (Map.Entry<String, String> entry : filters.entrySet()) {
String key = entry.getKey();
if (key == null || !key.matches("^[a-zA-Z0-9_-]{1,64}$")) {
throw new IllegalArgumentException("Invalid metadata filter key: " + key);
}
sql.append(" AND (metadata->>'").append(key).append("') = ?\n");
params.add(entry.getValue());
}
}
The filter key is appended directly into the SQL string. SQL doesn't allow placeholders for column names or JSON path expressions — which means this is where injection risk enters the system.
The regex is not a convenience. It is the only control point between user input and the database.
^[a-zA-Z0-9_-]{1,64}$ — only alphanumeric characters, underscores, and hyphens.
Anything else is rejected before it reaches the database. Filter values, on the other hand, always go through JDBC parameters and are safe regardless of input.
This split — validated keys, parameterised values — is what makes the query both flexible and secure.
This is one of those cases where the 'boring' regex is doing serious security work. Don't skip it.
Key validation handles injection risk. The other challenge in query construction is where to apply the score threshold.
Score filtering is applied on the outer query — not the inner one. cosine_distance is defined in the inner query's SELECT clause.
PostgreSQL cannot reference that alias in a WHERE clause at the same level. Wrapping it as a subquery makes it a real column in the outer scope — which is what allows minScore to work at all.
This is the point where you stop “using an ORM” and start designing queries deliberately.
Updating a Document means Updating Its Embedding Too
Updating a document is not the same as updating a database row.
When content changes, the stored embedding becomes stale. A document about "payment retry logic" gets updated to "refund processing."
But the embedding still points toward payment retries. Searches for "refund policy" would miss it. Searches for "payment retries" would still find it — incorrectly.
The update operation handles this explicitly:
private void applyUpdates(Document doc, UpdateDocumentRequest request) {
doc.setTitle(request.getTitle());
doc.setContent(request.getContent());
doc.setMetadata(request.getMetadata());
doc.setStatus(DocumentStatus.PENDING);
doc.setEmbeddingError(null);
documentRepository.save(doc);
}
The moment content changes, the embedding becomes invalid.
The system makes that explicit by resetting the document to PENDING, removing it from search until a new embedding is generated.
This trades availability for correctness — a document disappearing briefly is preferable to returning incorrect results.
findOrThrow is called again after embedAndPersist so the response reflects the document's final state — including the updated status and embeddingUpdatedAt timestamp — not the state before the embedding ran.
This is easy to miss when you first build it. If a document update doesn't trigger a re-embed, your search results will silently drift out of sync with your content.
One Place for All Your Errors
Errors in this system fall into two categories — errors the caller caused and errors the system encountered.
Those two cases should not look the same.
A missing document returns a 404. Invalid input returns a 400. An embedding failure returns a 500.
What matters more than the distinction is consistency — every error, regardless of where it originates, returns the same shape:
{
"code": "NOT_FOUND",
"message": "Document not found: 42"
}
That consistency is enforced in one place — GlobalExceptionHandler.
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(ResourceNotFoundException.class)
public ResponseEntity<ErrorResponse> handleNotFound(
ResourceNotFoundException ex
) {
return ResponseEntity.status(404)
.body(new ErrorResponse(
"NOT_FOUND",
ex.getMessage()
));
}
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<ErrorResponse> handleValidation(
MethodArgumentNotValidException ex
) {
String message = ex.getBindingResult()
.getFieldErrors()
.stream()
.map(e -> e.getField() + ": " + e.getDefaultMessage())
.collect(Collectors.joining(", "));
return ResponseEntity.status(400)
.body(new ErrorResponse(
"VALIDATION_ERROR",
message
));
}
@ExceptionHandler(Exception.class)
public ResponseEntity<ErrorResponse> handleGeneral(
Exception ex
) {
return ResponseEntity.status(500)
.body(new ErrorResponse(
"INTERNAL_ERROR",
"An unexpected error occurred"
));
}
}
The @RestControllerAdvice annotation makes it active across all controllers without being wired into any of them.
The service layer throws exceptions. The handler translates them. The controllers never see error handling code.
A client that always receives code and message can handle all errors with one piece of logic.
A client that receives different shapes from different endpoints has to handle each one separately.
One handler, consistent responses everywhere — your frontend team will thank you.
How the LifecycleKeeps Bad Data Out of Search
The document lifecycle isn't just about tracking failures. It's what keeps invalid data out of search results entirely.
Every search query filters on two conditions before any similarity calculation runs:
WHERE status = 'READY'
AND embedding IS NOT NULL
A PENDING document is excluded. A FAILED document is excluded.
This is where the schema design from Part 2 pays off — the composite index on (status, created_at DESC) exists specifically to support this filtering pattern.
Without it, every search scans the full table and discards non-ready documents. With it, PostgreSQL jumps directly to the relevant subset.
PENDING ──────────────────────────────┐
│ │
▼ │
embedAndPersist() │
│ │
┌──┴──────────────┐ │
│ │ │
▼ ▼ ▼
READY FAILED not searchable
searchable error in DB
not searchable
The lifecycle isn't just about correctness. It's a performance optimization.
If you've ever had stale or incomplete data show up in search results with no explanation — a lifecycle model like this is what prevents it.
The System Now Works
With the service layer in place, the system finally behaves like a system.
A document arrives at POST /documents. The controller validates the request and delegates to the service.
The service saves the document as PENDING, calls the embedding client, and updates the status to READY.
The document is now stored with a valid embedding and visible to search.
A search query arrives at POST /search.
The service embeds the query, builds the SQL dynamically through QueryBuilder, applies filters and score thresholds, and returns ranked results with three score fields — cosineDistance, cosineSimilarity, and score.
Every layer has exactly one job. Every failure is visible. Every response has a consistent shape.
The system that started as a schema and an embedding client in Part 1 is now a complete, working API.
What's Next
The service layer completes the system. Everything now works end to end.
But working systems still have flaws.
In the next article, I’ll step back from the implementation and break down what this system gets right, what it gets wrong, and what I would change if I were to build it again.
See you there.

Top comments (7)
“Systems don’t fail in isolation — they fail at the boundaries” is such a strong line.
The save-first-then-process pattern for embeddings makes a lot of sense too, especially for debugging and visibility.
Also liked how you kept the service layer focused on orchestration instead of leaking entities across layers.
Curious, have you ever seen this approach become a bottleneck as the system grows, or does it scale well with clear boundaries?
Thanks!
That line actually came from a debugging session where everything looked fine on its own, but broke the moment the pieces had to work together. That experience ended up shaping most of the design here.
On scaling, the save-first pattern has held up well so far since the write and embed steps are already decoupled. The next step is usually moving embedding off the request path entirely — letting a background worker or queue process PENDING documents asynchronously. The lifecycle model makes that transition pretty natural without changing the API.
Where I’ve seen some friction is around reporting or analytics. Strict boundaries mean you can’t just join across everything, so it adds a bit of overhead. In practice, that usually leads to introducing a separate read model rather than loosening the boundaries.
That makes a lot of sense, especially moving embedding off the request path. Feels like that’s the natural evolution once things start getting real traffic.
The read model point is interesting too. I’ve seen teams try to “cheat” around that by breaking boundaries for quick analytics, and it usually turns into a mess later.
Did you end up going with something like a separate reporting DB or event-driven sync for the read side, or still keeping it simple for now?
Yeah, that’s been my approach so far — keeping it simple on the read side and avoiding breaking boundaries just for convenience.
I’ve seen the same thing where teams “cheat” for quick analytics and it turns into a mess later.
If this were to grow, I’d probably lean toward an event-driven approach. The lifecycle transitions already give you natural hooks to build a read model without changing the core system.
Trying to keep the write path clean and let the read side evolve only when it’s actually needed.
That’s a solid approach.
The “natural hooks” point is key. Most systems already have the signals, people just ignore them and bolt on shortcuts later.
Event-driven fits well here, but only once there’s real pressure. Doing it too early just adds complexity without payoff.
Keeping the write path clean is probably the highest leverage decision. You can always evolve the read side, but untangling a messy write path is painful.
Curious, at what point would you actually introduce the event layer? When reads get slow, or when use cases start diverging?
For introducing an event layer, I’d lean toward use-case pressure over raw performance. If reads are just getting slower, you can usually get pretty far with indexing or query tuning. But once you start seeing different consumers needing different views of the same data — analytics, monitoring, downstream workflows — that’s when it starts to make sense.
The lifecycle transitions already give a clean signal (
PENDING → READY → FAILED), so at that point it’s less of a redesign and more of exposing what’s already there.Until then, I try to keep it simple and avoid adding infrastructure too early — I wrote a bit more about that tradeoff here.
Nice, that makes a lot of sense.
Treating events as something you reveal from lifecycle transitions, not design upfront, is the key. Most systems already have the signals, they just don’t use them.
Waiting for real use-case divergence also feels right. Performance issues can usually be handled without adding that complexity.
Curious, when you add it, would you keep events internal for read models or expose them externally as well?