DEV Community

Cover image for The Service Layer: Where Separate Components Become a System
Ozioma Ochin
Ozioma Ochin

Posted on • Originally published at ozi.hashnode.dev

The Service Layer: Where Separate Components Become a System

This is Part 4 of a series building a production-ready semantic search API with Java, Spring Boot, and pgvector.

Part 1 covered the architecture.

Part 2 defined the schema.

Part 3 handled the embeddings — how text becomes vectors.

Each piece worked in isolation.

But systems don't fail in isolation — they fail at the boundaries.

If you've ever built a feature that worked perfectly on its own but broke the moment you connected it to everything else — this article is about preventing that.

At this point, we have a schema that can store documents and an embedding layer that can generate vectors.

But nothing connects them. A document has nowhere to go. A query has no pipeline.

This is where the service layer comes in.

This is a production-style implementation — not a demo. The full project structure, tests, and configuration are available on GitHub.

What Does the Service Layer Actually Do?

The database stores state, but it doesn't understand it.

PENDING, READY, and FAILED only become meaningful once the service layer defines when those transitions happen and what triggers them.

When a document arrives, the service decides the order of operations — save first, embed second, update on success, record failure explicitly if something goes wrong.

Search follows the same pattern. A query doesn't go straight to the database. It's first converted into an embedding, then passed through a query that applies lifecycle constraints, metadata filters, and scoring thresholds.

The service layer controls that entire pipeline.

The service layer owns one thing: the rules that make the system predictable.

Without it, the system is just a collection of correct but disconnected components.

HTTP Request
     │
     ▼
Controller Layer       ← validates input, delegates to service
     │
     ▼
Service Layer          ← all decisions happen here
     │                    │
     ▼                    ▼
Repository Layer      Embedding Layer
(JPA + JdbcTemplate)  (EmbeddingClient interface)
     │                    │
     ▼                    ▼
PostgreSQL + pgvector  OpenAI API
Enter fullscreen mode Exit fullscreen mode

The Interface That Keeps Everything Clean

The service layer exposes one interface to the rest of the application:

public interface DocumentService {
    CreateDocumentResponse create(CreateDocumentRequest request);
    DocumentResponse getById(Long id);
    SearchResponse search(SearchRequest request);
}
Enter fullscreen mode Exit fullscreen mode

Controllers depend on the interface, not the implementation.

Defining the contract as an interface and hiding the implementation behind it is what makes the system testable and changeable without cascading updates across the codebase.

The more important detail is what does not cross this boundary.

The Document entity never crosses this boundary — by design. Controllers receive DTOs, not persistence objects.

That separation means the database schema and the API contract can evolve independently. The schema can change without breaking clients. The API can change without rewriting persistence logic.

Why this matters to you: If you've ever had a database change break your API — or an API change force a database rewrite — this boundary is what prevents that. Define it early and hold it firmly.

What Happens When Embedding Fails?

From the outside, creating a document looks simple. Send a document, get an ID back.

Inside the service, everything is built around one assumption: the second step might fail.

@Override
@Transactional
public CreateDocumentResponse create(CreateDocumentRequest request) {

    Document saved = saveAsPending(request);

    embedAndPersist(
            saved.getId(),
            saved.getTitle(),
            saved.getContent()
    );

    return new CreateDocumentResponse(
            saved.getId(),
            DocumentStatus.READY
    );
}
Enter fullscreen mode Exit fullscreen mode

Two lines, two distinct operations.

The first saves the document immediately with a status of PENDING.

The document exists in the database before any embedding call is made.

If the application crashes at this point, the document is already there with a recoverable state.

The second calls the OpenAI API, generates the embedding, and updates the document to READY.

If this step fails, the document moves to FAILED instead, and the error is stored directly in the database.

POST /documents
      │
      ▼
saveAsPending()
status = PENDING ← document is safe in the database
      │
      ▼
embedAndPersist()
      │
   ┌──┴──────────────┐
   │                 │
   ▼                 ▼
status = READY   status = FAILED
searchable       error stored in DB
                 excluded from search
Enter fullscreen mode Exit fullscreen mode

There's an alternative that looks simpler — embed first, then save.

It removes a step but removes visibility. If embedding fails in that model, the document never exists. There's no record, no state, nothing to debug.

By saving first, every attempt leaves a trace.

Failures don't disappear.

They become data.

This pattern — save first, embed second — is the difference between a failure you can debug and one that just disappears.

Here's how the failure handling actually works:

private void embedAndPersist(Long documentId, String title, String content) {
    try {
        float[] embedding = embeddingClient.embed(title + "\n\n" + content);
        int updated = jdbcTemplate.update(SQL_UPDATE_EMBEDDING,
                toPgVectorLiteral(embedding), documentId);
        if (updated != 1) {
            throw new IllegalStateException(
                    "Unexpected row count updating embedding for document id=" + documentId);
        }
    } catch (IllegalStateException e) {
        throw e;
    } catch (Exception e) {
        markFailed(documentId, e.getMessage());
        throw new RuntimeException("Embedding failed for document id=" + documentId, e);
    }
}
Enter fullscreen mode Exit fullscreen mode

Three decisions here worth understanding:

  1. Title and content are concatenated for embedding. title + "\n\n" + content gives the model full context. A document titled "Payment Failure Handling Policy" with content about retry logic produces a richer embedding than the content alone.

  2. IllegalStateException is re-thrown unchanged. If the update affects zero or more than one row, something is wrong with the database state — not the embedding call. That error should propagate as-is rather than being wrapped as an embedding failure.

  3. Everything else triggers markFailed. Network timeouts, rate limits, malformed responses — any exception that isn't an IllegalStateException records the failure and re-throws. The caller sees the failure. The database gets a record of what went wrong.

Most API integration failures are silent. This makes them loud.

Search — The Pipeline That Ties Everything Together

Search is the most complex operation in the service. It touches the embedding layer, the repository, and the database — and it has to coordinate all three correctly.

What makes it manageable is not reducing that complexity, but containing it deliberately.

The orchestration method is deliberately small:

@Override
public SearchResponse search(SearchRequest request) {

    String qVector = embedQuery(request.getQuery());

    List<SearchResultItem> items = fetchResults(
            request,
            qVector
    );

    int total = countResults(
            qVector,
            request.getFilters(),
            request.getMinScore()
    );

    return new SearchResponse(
            request.getPage(),
            request.getSize(),
            total,
            items
    );
}
Enter fullscreen mode Exit fullscreen mode

Four lines. Each delegates to a private method with a clear name.

The method reads like a description of the search process — embed the query, fetch the results, count the total, return the response.

The how is pushed down into methods that can be reasoned about in isolation.

private String embedQuery(String query) { 
return toPgVectorLiteral(embeddingClient.embed(query)); 
}
Enter fullscreen mode Exit fullscreen mode

The query goes through the same embedding client used for documents.

That symmetry matters — the query and the stored documents exist in the same vector space. Without it, similarity search would be meaningless.

The SQL is constructed in two layers: the inner query selects candidates and computes similarity, while the outer query applies score thresholds and pagination.

The split isn't stylistic. PostgreSQL cannot reference a SELECT alias in a WHERE clause at the same query level — which is why cosine_distance must be resolved in a subquery before the score threshold can filter on it.

SELECT * FROM (
    SELECT id, title, content, metadata,
           (embedding <=> ?::vector) AS cosine_distance
    FROM documents
    WHERE status = 'READY'
      AND embedding IS NOT NULL
      AND (metadata->>'category') = ?
) AS sub
WHERE (((1.0 - cosine_distance) + 1.0) / 2.0) >= ?
ORDER BY cosine_distance ASC
LIMIT ? OFFSET ?;
Enter fullscreen mode Exit fullscreen mode

If you've ever wondered why your JPA queries feel limiting for complex use cases — this is where you cross that line deliberately.

Why JPA Isn’t Enough for Vector Search

The search query isn't static.

Metadata filters, score thresholds, and pagination all change the SQL at runtime.

At that point the abstraction provided by JPA starts to break down — you're no longer mapping objects, you're constructing a query.

That's where QueryBuilder comes in:

private static class QueryBuilder {

   private final StringBuilder sql;
   private final List<Object> params = new ArrayList<>();

   QueryBuilder(String baseSql, String firstParam) {
       this.sql = new StringBuilder(baseSql);
       this.params.add(firstParam);
   }

   QueryBuilder(String baseSql, QueryBuilder source) {
       this.sql = new StringBuilder(baseSql);
       this.params.addAll(source.params);
   }
}
Enter fullscreen mode Exit fullscreen mode

The two constructors mirror the structure of the query – inner and outer.

The first builds the inner query.

The second builds the outer query, inheriting parameters from the inner one without tracking them manually.

Where injection risk actually lives:

void applyFilters(Map<String, String> filters) {
   if (filters == null || filters.isEmpty()) return;

   for (Map.Entry<String, String> entry : filters.entrySet()) {
       String key = entry.getKey();

       if (key == null || !key.matches("^[a-zA-Z0-9_-]{1,64}$")) {
           throw new IllegalArgumentException("Invalid metadata filter key: " + key);
       }

       sql.append("  AND (metadata->>'").append(key).append("') = ?\n");
       params.add(entry.getValue());
   }
}
Enter fullscreen mode Exit fullscreen mode

The filter key is appended directly into the SQL string. SQL doesn't allow placeholders for column names or JSON path expressions — which means this is where injection risk enters the system.

The regex is not a convenience. It is the only control point between user input and the database.

^[a-zA-Z0-9_-]{1,64}$ — only alphanumeric characters, underscores, and hyphens.

Anything else is rejected before it reaches the database. Filter values, on the other hand, always go through JDBC parameters and are safe regardless of input.

This split — validated keys, parameterised values — is what makes the query both flexible and secure.

This is one of those cases where the 'boring' regex is doing serious security work. Don't skip it.

Key validation handles injection risk. The other challenge in query construction is where to apply the score threshold.

Score filtering is applied on the outer query — not the inner one. cosine_distance is defined in the inner query's SELECT clause.

PostgreSQL cannot reference that alias in a WHERE clause at the same level. Wrapping it as a subquery makes it a real column in the outer scope — which is what allows minScore to work at all.

This is the point where you stop “using an ORM” and start designing queries deliberately.

Updating a Document means Updating Its Embedding Too

Updating a document is not the same as updating a database row.

When content changes, the stored embedding becomes stale. A document about "payment retry logic" gets updated to "refund processing."

But the embedding still points toward payment retries. Searches for "refund policy" would miss it. Searches for "payment retries" would still find it — incorrectly.

The update operation handles this explicitly:

private void applyUpdates(Document doc, UpdateDocumentRequest request) {
    doc.setTitle(request.getTitle());
    doc.setContent(request.getContent());
    doc.setMetadata(request.getMetadata());
    doc.setStatus(DocumentStatus.PENDING);
    doc.setEmbeddingError(null);
    documentRepository.save(doc);
}
Enter fullscreen mode Exit fullscreen mode

The moment content changes, the embedding becomes invalid.

The system makes that explicit by resetting the document to PENDING, removing it from search until a new embedding is generated.

This trades availability for correctness — a document disappearing briefly is preferable to returning incorrect results.

findOrThrow is called again after embedAndPersist so the response reflects the document's final state — including the updated status and embeddingUpdatedAt timestamp — not the state before the embedding ran.

This is easy to miss when you first build it. If a document update doesn't trigger a re-embed, your search results will silently drift out of sync with your content.

One Place for All Your Errors

Errors in this system fall into two categories — errors the caller caused and errors the system encountered.

Those two cases should not look the same.

A missing document returns a 404. Invalid input returns a 400. An embedding failure returns a 500.

What matters more than the distinction is consistency — every error, regardless of where it originates, returns the same shape:

{
  "code": "NOT_FOUND",
  "message": "Document not found: 42"
}
Enter fullscreen mode Exit fullscreen mode

That consistency is enforced in one place — GlobalExceptionHandler.

@RestControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(ResourceNotFoundException.class)
    public ResponseEntity<ErrorResponse> handleNotFound(
            ResourceNotFoundException ex
    ) {
        return ResponseEntity.status(404)
                .body(new ErrorResponse(
                        "NOT_FOUND",
                        ex.getMessage()
                ));
    }

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<ErrorResponse> handleValidation(
            MethodArgumentNotValidException ex
    ) {
        String message = ex.getBindingResult()
                .getFieldErrors()
                .stream()
                .map(e -> e.getField() + ": " + e.getDefaultMessage())
                .collect(Collectors.joining(", "));

        return ResponseEntity.status(400)
                .body(new ErrorResponse(
                        "VALIDATION_ERROR",
                        message
                ));
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<ErrorResponse> handleGeneral(
            Exception ex
    ) {
        return ResponseEntity.status(500)
                .body(new ErrorResponse(
                        "INTERNAL_ERROR",
                        "An unexpected error occurred"
                ));
    }
}
Enter fullscreen mode Exit fullscreen mode

The @RestControllerAdvice annotation makes it active across all controllers without being wired into any of them.

The service layer throws exceptions. The handler translates them. The controllers never see error handling code.

A client that always receives code and message can handle all errors with one piece of logic.

A client that receives different shapes from different endpoints has to handle each one separately.

One handler, consistent responses everywhere — your frontend team will thank you.

How the LifecycleKeeps Bad Data Out of Search

The document lifecycle isn't just about tracking failures. It's what keeps invalid data out of search results entirely.

Every search query filters on two conditions before any similarity calculation runs:


WHERE status = 'READY'
    AND embedding IS NOT NULL
Enter fullscreen mode Exit fullscreen mode

A PENDING document is excluded. A FAILED document is excluded.

This is where the schema design from Part 2 pays off — the composite index on (status, created_at DESC) exists specifically to support this filtering pattern.

Without it, every search scans the full table and discards non-ready documents. With it, PostgreSQL jumps directly to the relevant subset.

PENDING ──────────────────────────────┐
   │                                  │
   ▼                                  │
embedAndPersist()                     │
   │                                  │
┌──┴──────────────┐                   │
│                 │                   │
▼                 ▼                   ▼
READY          FAILED            not searchable
searchable     error in DB
               not searchable

Enter fullscreen mode Exit fullscreen mode

The lifecycle isn't just about correctness. It's a performance optimization.

If you've ever had stale or incomplete data show up in search results with no explanation — a lifecycle model like this is what prevents it.

The System Now Works

With the service layer in place, the system finally behaves like a system.

A document arrives at POST /documents. The controller validates the request and delegates to the service.

The service saves the document as PENDING, calls the embedding client, and updates the status to READY.

The document is now stored with a valid embedding and visible to search.

Search and Post

A search query arrives at POST /search.

The service embeds the query, builds the SQL dynamically through QueryBuilder, applies filters and score thresholds, and returns ranked results with three score fields — cosineDistance, cosineSimilarity, and score.

Every layer has exactly one job. Every failure is visible. Every response has a consistent shape.

The system that started as a schema and an embedding client in Part 1 is now a complete, working API.

What's Next

The service layer completes the system. Everything now works end to end.

But working systems still have flaws.

In the next article, I’ll step back from the implementation and break down what this system gets right, what it gets wrong, and what I would change if I were to build it again.

See you there.

Top comments (0)