Most semantic search tutorials treat embeddings as a single line of code — call the API, get a vector, store it.
In practice, this is the part of the system where the most subtle bugs live. Not the kind that throw exceptions, but the kind that silently produces wrong similarity scores, wrong rankings, and search results that look correct but feel off.
When I first built this service, I expected the difficult parts to be the database schema and the search query. Instead, most of the time went into the embedding layer. Small mistakes here don’t crash the application. They just make search behave strangely.
Three things make this layer trickier than it looks.
First, the API call is external. It can fail because of network issues, rate limits, or invalid requests, and the failure is not always obvious from the client side.
Second, the response parsing has silent failure modes. A wrong field name, a missing element, or a partially parsed response can still produce a vector — just not the right one.
Third, the normalization step is easy to get wrong, skip entirely, or apply twice. When that happens, similarity scores change even though the text hasn’t.
In Part 2, the schema was designed to store embeddings safely, track their lifecycle, and support retries when something goes wrong. Now we need to generate those embeddings correctly.
That responsibility lives entirely inside the embedding layer.
What the embedding layer is responsible for
Before looking at any implementation, it helps to define what the embedding layer is supposed to do — and just as importantly, what it is not supposed to do.
At a high level, the layer has one job: convert text into a vector that can be stored and compared in the database.
That sounds simple, but several steps are involved: sending the text to the API, validating the response, parsing the JSON, converting to a float array, and normalizing the vector before returning it.
Everything else belongs somewhere else.
The embedding layer does not know about the database.
It does not know about documents, metadata, or search queries.
Its only responsibility is converting text into a vector and returning it to whoever asked.
That boundary is what makes this layer testable, replaceable, and easy to reason about in isolation.
The service layer can call it without knowing what happens inside. The tests can mock it without spinning up an HTTP client. A different provider can be swapped in without touching anything outside this layer.
That boundary is captured by a single interface.
The EmbeddingClient interface
Before looking at the OpenAI implementation, the most important design decision in this layer is the interface.
public interface EmbeddingClient {
float[] embed(String text);
}
This interface is intentionally small, but it defines the boundary for the entire embedding layer.
The service layer depends on this contract, not on any specific provider. As far as the rest of the application is concerned, embedding is simply a function that takes text and returns a vector.
How that vector is produced is an implementation detail.
One method, one responsibility.
The embedding layer should not expose HTTP details, JSON parsing, or model configuration.
All of that stays behind the implementation.
The return type is also a deliberate choice. The method returns a float[], not a List and not a custom wrapper type.
The database layer ultimately writes this value into a VECTOR column, and pgvector expects a primitive float array. Returning anything else would only introduce unnecessary conversion code between layers.
Depending on the interface rather than the implementation means the provider is swappable.
The class that implements this interface today is called OpenAiEmbeddingClient, but nothing in the service layer depends on that fact.
The same interface could later be backed by a local model, a different provider, or even a mock implementation for tests.
Wiring the client with Spring
The client is registered as a Spring component and configured through constructor injection.
public OpenAiEmbeddingClient(
ObjectMapper mapper,
@Value("${openai.apiKey}") String apiKey,
@Value("${openai.embeddingModel}") String model) {
this.mapper = mapper;
this.apiKey = apiKey;
this.model = model;
}
The values for the API key and model come from application configuration.
openai.apiKey=${OPENAI_API_KEY}
openai.embeddingModel=${OPENAI_EMBEDDING_MODEL:text-embedding-3-small}
Reading the API key from an environment variable is not just a convention, it is a requirement for any service that runs outside a local machine.
Hardcoding credentials in source code makes rotation difficult and leaks secrets into version control. Using ${OPENAI_API_KEY} allows the same code to run locally, in CI, and in production without changes.
The model name is also injected rather than hardcoded, but with a default value. The syntax ${OPENAI_EMBEDDING_MODEL:text-embedding-3-small} means the property is optional.
If no environment variable is provided, the client falls back to text-embedding-3-small. This makes local setup easier while still allowing the model to be changed without recompiling the application.
Constructor injection is used instead of field injection for a reason. All dependencies are provided when the object is created, and the fields can remain final.
This makes the class easier to test and prevents partially constructed instances. It also keeps the configuration visible at the entry point of the class instead of scattered across annotations.
At this point the embedding layer has a clear boundary and a concrete implementation. The remaining work is inside the client itself: building the HTTP request, validating the response, and turning the result into a normalized vector.
The full source code — including OpenAiEmbeddingClient, EmbeddingUtils, and all three Flyway migrations — is available on GitHub.
The embed() orchestration method
@Override
public float[] embed(String text) {
try {
HttpResponse<String> response = sendRequest(text != null ? text : "");
validateResponse(response);
return parseEmbedding(response.body());
} catch (RuntimeException e) {
throw e;
} catch (Exception e) {
throw new RuntimeException("Failed to get embedding from OpenAI", e);
}
}
This method is intentionally small. It does not contain the implementation details of the HTTP call, response parsing, or normalization. Instead, it orchestrates the process by delegating each step to a private method.
Keeping the public method short makes the flow easy to read. The code describes what happens without showing how it happens: send the request, validate the response, parse the embedding.
The null guard at the entry point is intentional:
text != null ? text : ""
The embedding call should never fail because the caller passed a null value. Converting null to an empty string ensures the method always produces a result, even if the input is missing.
Handling this at the boundary keeps the rest of the code simpler because the private methods never need to check for null.
The exception handling follows the same idea of keeping the boundary clean. Runtime exceptions are rethrown unchanged, while checked exceptions are wrapped in a RuntimeException.
The caller never has to deal with checked exceptions coming from the embedding layer, and the service layer can treat embedding failures like any other runtime error.
Building the HTTP request
private HttpResponse<String> sendRequest(String text) throws Exception {
String body = mapper.writeValueAsString(
Map.of("model", model, "input", text)
);
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(OPENAI_EMBEDDINGS_URL))
.header("Authorization", "Bearer " + apiKey)
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(body, StandardCharsets.UTF_8))
.build();
return httpClient.send(request,
HttpResponse.BodyHandlers.ofString(StandardCharsets.UTF_8)
);
}
Several small decisions in this method prevent bugs that are difficult to trace later.
The API URL is stored in a constant at the top of the class instead of being written inline.
private static final String OPENAI_EMBEDDINGS_URL =
"https://api.openai.com/v1/embeddings";
Defining the URL once makes it visible and easy to verify. A single missing character — embedding instead of embeddings — produces a 404 that looks nothing like a URL error because the OpenAI response body for an unknown endpoint is not obvious.
The request body is built using Jackson instead of concatenating strings.
mapper.writeValueAsString(
Map.of("model", model, "input", text)
);
Manually building JSON is fragile. A missing quote, an extra comma, or an unescaped character in the input text can produce a request that looks correct in code but fails at runtime.
Using the ObjectMapper guarantees that the JSON is valid every time.
The request explicitly uses UTF-8 when writing the body.
HttpRequest.BodyPublishers.ofString(
body,
StandardCharsets.UTF_8
)
Relying on the platform default charset can lead to different behaviour between local development and production.
Specifying UTF-8 ensures the request is encoded the same way in every environment.
The method returns the raw HTTP response instead of parsing it immediately. This keeps responsibilities separate. The request method only sends the request. Validation and parsing happen in the next steps.
Validating and parsing the response
Not every API response is a success. Before parsing anything, the response status needs to be verified and the parsing itself has subtle failure modes worth understanding.
A typical response from the embeddings API looks like this:
{
"data": [
{
"embedding": [0.023, -0.181, 0.442, ...],
"index": 0
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
The first step is to verify that the request actually succeeded.
private void validateResponse(HttpResponse<String> response) {
if (response.statusCode() / 100 != 2) {
throw new RuntimeException(
"OpenAI embeddings failed: HTTP "
+ response.statusCode()
+ " body=" + response.body()
);
}
}
Instead of checking for a single status code, the method verifies that the response is in the 2xx range.
response.statusCode() / 100 != 2
Integer division keeps only the hundreds digit, so this condition catches any non-2xx response with one comparison.
This includes rate limits, server errors, and invalid requests, all of which should stop the embedding process immediately.
Once the response is known to be valid, the next step is to extract the vector.
private float[] parseEmbedding(String responseBody) throws Exception {
JsonNode embedding = mapper.readTree(responseBody)
.path("data")
.get(0)
.path("embedding");
float[] out = new float[embedding.size()];
for (int i = 0; i < embedding.size(); i++) {
out[i] = (float) embedding.get(i).asDouble();
}
return EmbeddingUtils.l2Normalized(out);
}
The parsing code uses path() instead of get() for most lookups — and the difference matters.
path() returns a MissingNode if the field does not exist, while get() would return null.
This avoids null pointer exceptions and makes the parsing code more predictable when the response structure changes.
The values are read as doubles and then cast to float.
(float) embedding.get(i).asDouble()
Jackson parses JSON numbers as double by default. Converting through asDouble() preserves precision correctly before the cast to float, which matches the type expected by pgvector.
The vector is not returned directly after parsing — it passes through one more step first.
L2 normalization: what it is and why it matters
Normalisation is the final step before the vector is returned.
public static float[] l2Normalized(float[] v) {
double sumOfSquares = 0.0;
for (float f : v) sumOfSquares += (double) f * f;
double norm = Math.sqrt(sumOfSquares);
if (norm == 0.0) return v.clone();
float[] out = new float[v.length];
for (int i = 0; i < v.length; i++) {
out[i] = (float) (v[i] / norm);
}
return out;
}
In geometric terms, this moves every vector onto the surface of a unit sphere.
After normalization, the magnitude of the vector no longer depends on the length of the input text, only on its direction in the embedding space.
This matters because similarity search uses cosine distance.
Cosine similarity compares the angle between two vectors, not their length. If vectors are not normalized, longer vectors can produce larger dot products even when the meaning is not closer.
Without normalization, two documents about the same topic but different lengths can score differently against the same query. Not because one is more relevant, but because one is longer.
Normalization removes this length bias and makes similarity depend only on semantic direction.
The method also handles the edge case where the vector length is zero.
if (norm == 0.0) {
return v.clone();
}
Returning a clone instead of the original array prevents the caller from accidentally mutating the input.
Recent embedding models already return normalized vectors, including text-embedding-3-small. The explicit normalization here is defensive.
It guarantees correct behaviour even if the model changes later, and it documents the assumption directly in code instead of relying on external behaviour.
Why the embedding layer is behind an interface
When the interface was introduced in earlier section, the implementation behind it was simple. Now that the full implementation is visible — HTTP requests, response validation, parsing, normalisation — the value of keeping all of that behind a single method becomes clearer.
A mock implementation can return a fixed vector without making an HTTP call, which allows the service layer to be tested without depending on the external API.
This separation may look unnecessary when the system is small, but it becomes important as soon as the embedding logic grows.
The client now handles HTTP requests, response validation, parsing, and normalization. Keeping all of that behind a single method prevents those details from leaking into the rest of the application.
What's Next
Part 4 moves up one level to the service layer — where everything built so far is orchestrated into a complete API.
See you in Part 4!
Top comments (0)