<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ozioma Ochin</title>
    <description>The latest articles on DEV Community by Ozioma Ochin (@oozioma).</description>
    <link>https://dev.to/oozioma</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3774459%2F958ff1df-b66f-43b0-802f-ae983de3abb9.png</url>
      <title>DEV Community: Ozioma Ochin</title>
      <link>https://dev.to/oozioma</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oozioma"/>
    <language>en</language>
    <item>
      <title>Semantic Search Is an Architecture Problem</title>
      <dc:creator>Ozioma Ochin</dc:creator>
      <pubDate>Sat, 11 Apr 2026 23:32:19 +0000</pubDate>
      <link>https://dev.to/oozioma/semantic-search-is-an-architecture-problem-5h8l</link>
      <guid>https://dev.to/oozioma/semantic-search-is-an-architecture-problem-5h8l</guid>
      <description>&lt;p&gt;Most semantic search systems don’t fail because of embeddings. They fail because of how the system is designed around them.&lt;/p&gt;

&lt;p&gt;When I started building a semantic search API with Spring Boot and pgvector, I expected the hard parts to be vector math and database configuration. Generating embeddings and computing similarity felt like the core of the problem.&lt;/p&gt;

&lt;p&gt;They weren’t.&lt;/p&gt;

&lt;p&gt;The system worked. Documents were stored, embeddings were generated, and search returned results that looked reasonable.&lt;/p&gt;

&lt;p&gt;But they weren’t reliable. Some queries felt off. Others looked correct but weren’t useful. Small changes in input produced disproportionately different results.&lt;/p&gt;

&lt;p&gt;Nothing was obviously broken — but the system wasn’t behaving in a way I could trust.&lt;/p&gt;

&lt;p&gt;Semantic search isn’t defined by how you generate embeddings. It’s defined by how your system structures data before embedding, enforces consistency during ingestion, and filters and ranks results after retrieval.&lt;/p&gt;

&lt;p&gt;In other words, semantic search is an architecture problem.&lt;/p&gt;

&lt;p&gt;This article breaks down five decisions that proved harder than expected while building a production-ready semantic search API in Java — and why they had more impact on search quality than the embedding model.&lt;/p&gt;

&lt;p&gt;The full source code is on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Each lesson is grounded in that implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1 — Embeddings Don’t Solve Retrieval. They Define Its Boundaries.
&lt;/h2&gt;

&lt;p&gt;The most common mistake in semantic search is treating embeddings as the solution to retrieval.&lt;/p&gt;

&lt;p&gt;Embeddings don’t solve retrieval; they define the space in which retrieval happens. That space is shaped by decisions made before the embedding call runs.&lt;/p&gt;

&lt;p&gt;In my implementation, embeddings are generated by combining the document’s title and content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\n\n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At a glance, this looks like a simple implementation detail. It isn’t — it’s an architectural decision.&lt;/p&gt;

&lt;p&gt;That single line determines how every document enters the vector space.&lt;/p&gt;

&lt;p&gt;It defines what context is preserved, what signals are amplified, and what noise is introduced.&lt;/p&gt;

&lt;p&gt;Once the embedding is created, those decisions are fixed — every query, similarity calculation, and ranking outcome depends on them.&lt;/p&gt;

&lt;p&gt;This is where the gap between similarity and usefulness begins.&lt;/p&gt;

&lt;p&gt;A document titled &lt;em&gt;“Payment Failure Handling Policy”&lt;/em&gt; produces a stronger, more useful embedding when the title is included — it anchors the content with meaningful context. But a document titled &lt;em&gt;“Notes”&lt;/em&gt; or &lt;em&gt;“Draft”&lt;/em&gt; does the opposite. The model encodes both the generic label and the actual content, pulling the vector in competing directions.&lt;/p&gt;

&lt;p&gt;The result is a system that returns technically similar results that aren’t useful.&lt;/p&gt;

&lt;p&gt;The inconsistency comes from the input itself. Embeddings are the output of your ingestion architecture. If that architecture is inconsistent, your search results will be too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd do differently:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would define the embedding input as a system-level contract before writing any embedding code.&lt;/p&gt;

&lt;p&gt;Not a guideline. A contract.&lt;/p&gt;

&lt;p&gt;Every document should enter the embedding pipeline in a consistent, explicitly defined format, regardless of who created it or when it was ingested. The structure — which fields are included, how they are ordered, and how they are separated — must be fixed and enforced at the ingestion boundary.&lt;/p&gt;

&lt;p&gt;In practice, this means formalizing a canonical input shape and rejecting or transforming anything that doesn’t conform. Titles like “Notes” or “Draft” should not be treated as meaningful context; they should be excluded or normalized before embedding so they don’t distort the representation.&lt;/p&gt;

&lt;p&gt;The benefit isn’t just cleaner vectors. It’s determinism.&lt;/p&gt;

&lt;p&gt;When search quality degrades, the first question is always: &lt;em&gt;what did the model actually see?&lt;/em&gt; Without a defined input contract, answering that requires tracing through every ingestion path and reconstructing the input post hoc. With a contract, the input is predictable, and the failure surface is narrow.&lt;/p&gt;

&lt;p&gt;Embedding quality is constrained by the structure of the input. Once vectors are generated, those decisions are fixed. Retrieval can only reflect what was decided upstream.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Embedding quality is bounded by the structure of the input, so consistency has to be enforced before the embedding call ever runs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Lesson 2 — Search Quality Is Determined at Ingestion, Not at Query
&lt;/h2&gt;

&lt;p&gt;Most efforts to improve semantic search focus on the query layer — thresholds, ranking logic, or retrieval strategies.&lt;/p&gt;

&lt;p&gt;Search quality is determined long before a query is executed — by how data is structured, validated, and normalized at ingestion.&lt;/p&gt;

&lt;p&gt;In the initial version of this system, the schema was intentionally flexible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This made the system easy to use. It also allowed documents of different shapes to be embedded and retrieved using the same logic.&lt;/p&gt;

&lt;p&gt;Short notes and long-form documents were encoded into the same vector space without regard for scale or structure. Metadata varied between objects, strings, and primitives.&lt;/p&gt;

&lt;p&gt;The search layer operated correctly, but the data it operated on did not follow a consistent contract.&lt;/p&gt;

&lt;p&gt;Similarity scores were valid, but not reliable.&lt;/p&gt;

&lt;p&gt;This is where semantic search stops being a retrieval problem and becomes a data modeling problem. The query can only work with what the system has already allowed into the index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd do differently:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would treat ingestion as a constrained interface, not a flexible one.&lt;/p&gt;

&lt;p&gt;Every field should be validated against an explicit structural contract before it is stored or embedded. Metadata, in particular, should be restricted to a consistent shape — a JSON object with defined keys — rather than accepting any valid JSON representation.&lt;/p&gt;

&lt;p&gt;That constraint belongs at the API boundary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@JsonValidator&lt;/span&gt; 
&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;JsonNode&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this system, that validation arrived late. Before it existed, metadata was accepted as any valid &lt;code&gt;JSON&lt;/code&gt; — including strings and primitives. The consequence was a &lt;a href="https://github.com/buenas/-semantic-search-service/blob/main/src/main/resources/db/migration/V3__normalize_metadata_jsonb.sql" rel="noopener noreferrer"&gt;V3&lt;/a&gt; Flyway migration to repair data already in the database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;jsonb_typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A string stored as &lt;code&gt;"category=billing"&lt;/code&gt; instead of &lt;code&gt;{"category": "billing"}&lt;/code&gt; passes every &lt;code&gt;JSON&lt;/code&gt; validation check and fails silently at query time — invisible to every filter that expects key-based access. That migration should never have been necessary. The validator at the API boundary was the fix — the migration was the cost of not having it earlier.&lt;/p&gt;

&lt;p&gt;The boundary for enforcing correctness is the point of entry. By the time data reaches storage or embedding, it should already conform to the system’s expectations. Without that constraint, the system does not degrade gracefully — it drifts.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Retrieval quality is set at ingestion, not rescued at query time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Lesson 3 — Systems Break at Boundaries, Not Within Components
&lt;/h2&gt;

&lt;p&gt;Each component in a semantic search system can behave correctly in isolation. The database stores documents, the embedding layer generates vectors, and the search query returns results.&lt;/p&gt;

&lt;p&gt;Failures appear at the boundaries when those components operate as a single system.&lt;/p&gt;

&lt;p&gt;The initial version of the write path was straightforward:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;generate the embedding → persist the document&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the embedding call failed — due to a timeout, rate limit, or unexpected response — the document was never written.&lt;/p&gt;

&lt;p&gt;The system did not crash. It lost information.&lt;/p&gt;

&lt;p&gt;The current implementation reverses that flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Override&lt;/span&gt;
&lt;span class="nd"&gt;@Transactional&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CreateDocumentResponse&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CreateDocumentRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;Document&lt;/span&gt; &lt;span class="n"&gt;saved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;saveAsPending&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;embedAndPersist&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTitle&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContent&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;CreateDocumentResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="nc"&gt;DocumentStatus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;READY&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The document is written first, with an explicit lifecycle state. Embedding becomes a second step, not a prerequisite for persistence.&lt;/p&gt;

&lt;p&gt;If embedding fails, the system records that failure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'FAILED'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Failures become part of the system’s state. This is the difference between a pipeline that appears to work and one that can be understood under failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd do differently:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would design the document lifecycle as a first-class system model before implementing any write path. A document should move through explicit, enforceable states, not implicit transitions tied to method execution.&lt;/p&gt;

&lt;p&gt;In this system, those states are minimal but sufficient:&lt;/p&gt;

&lt;p&gt;PENDING → READY (embedding succeeded)&lt;br&gt;&lt;br&gt;
PENDING → FAILED (embedding failed, error stored)&lt;br&gt;&lt;br&gt;
FAILED → PENDING (manual retry or background job)&lt;/p&gt;

&lt;p&gt;What matters is not the number of states, but what they guarantee.&lt;/p&gt;

&lt;p&gt;A document should not be considered searchable unless its embedding has been written successfully. That constraint must be enforced at the query layer, not assumed at write time. Without it, partially processed data leaks into retrieval and produces inconsistent results or runtime failures.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Failure must be inspectable. Recording the error alongside the document makes it possible to answer what failed and why.&lt;/p&gt;

&lt;p&gt;The key mistake in the initial design was treating embedding as part of persistence. In practice, it is a separate stage with its own failure modes, latency characteristics, and operational risks. Collapsing those concerns into a single step removes the ability to reason about them.&lt;/p&gt;

&lt;p&gt;A system that models its intermediate states explicitly can tolerate failure without losing visibility. One that doesn’t will appear correct until it isn’t — and offer no explanation when it fails.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Systems fail at stage boundaries, so lifecycle state has to be modeled explicitly rather than inferred from method flow.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Lesson 4 — Retrieval Is Easy. Ranking Is Where Systems Fail.
&lt;/h2&gt;

&lt;p&gt;Once embeddings are generated and queries return results, it’s tempting to consider the system complete.&lt;/p&gt;

&lt;p&gt;The retrieval layer works. A query is converted into a vector and similar documents are returned.&lt;/p&gt;

&lt;p&gt;Vector similarity measures proximity in a vector space. It does not measure whether a result is useful to the person searching. A system can return results that are semantically close and still fail the only test that matters: relevance.&lt;/p&gt;

&lt;p&gt;In this implementation, ranking does not exist as a separate component. It is embedded directly in the result-mapping path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;SearchResultItem&lt;/span&gt; &lt;span class="nf"&gt;mapToSearchResultItem&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ResultSet&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;rowNum&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;SQLException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getDouble&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"cosine_distance"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;cosSim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cosSim&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SearchResultItem&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getLong&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"title"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;readMetadataAsJsonNode&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"metadata"&lt;/span&gt;&lt;span class="o"&gt;)),&lt;/span&gt;
            &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;cosSim&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The raw &lt;code&gt;cosine_distance&lt;/code&gt; returned by pgvector is transformed into &lt;code&gt;cosine_similarity&lt;/code&gt; and then normalized into a bounded &lt;code&gt;score&lt;/code&gt; before the result leaves the system.&lt;/p&gt;

&lt;p&gt;This makes the scoring model explicit and traceable. Every result carries the values that determined its position.&lt;/p&gt;

&lt;p&gt;But these values do not define relevance. They describe closeness in vector space, not whether a result is worth showing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@DecimalMin&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.0"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@DecimalMax&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Double&lt;/span&gt; &lt;span class="n"&gt;minScore&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Set it too low, and results are technically related but not useful. Set it too high, and relevant results are excluded.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this point, semantic search stops being a retrieval problem and becomes a decision problem. The system is no longer asking what is similar — it is deciding what deserves to be shown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd do differently:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would define a relevance benchmark before implementing any ranking logic.&lt;/p&gt;

&lt;p&gt;Without a benchmark, ranking becomes reactive. Thresholds are adjusted based on observed outputs, but there is no stable way to determine whether those adjustments improve the system or simply change it.&lt;/p&gt;

&lt;p&gt;A benchmark makes that distinction measurable. Each change to the ranking logic can be evaluated against the same queries, making it possible to tune the system deliberately rather than iteratively guessing.&lt;/p&gt;

&lt;p&gt;A threshold of &lt;code&gt;0.65&lt;/code&gt; might return all twenty expected results but include fifteen irrelevant ones. A threshold of &lt;code&gt;0.80&lt;/code&gt; might eliminate the noise but miss six expected results.&lt;/p&gt;

&lt;p&gt;Without a benchmark, you cannot measure that tradeoff — only observe that results feel better or worse.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Similarity scores describe closeness; a benchmark is what makes ranking measurable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Lesson 5 — The ORM Stops Helping Earlier Than You Expect in Vector Search.
&lt;/h2&gt;

&lt;p&gt;The early version of the search layer expressed everything through Spring Data JPA. That worked for standard CRUD operations, and it felt natural to extend the same abstraction to search.&lt;/p&gt;

&lt;p&gt;JPA is built for entity lifecycle operations and relational query patterns that map cleanly to objects. Vector search does not fit that model — it depends on database-specific operators, explicit casting, and query shapes that change at runtime.&lt;/p&gt;

&lt;p&gt;In this system, JPA still has a clear role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;DocumentRepository&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;JpaRepository&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The search path is different. It uses raw SQL because the query depends on pgvector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;SQL_SEARCH_INNER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""
        SELECT
            id,
            title,
            content,
            metadata,
            (embedding &amp;lt;=&amp;gt; ?::vector) AS cosine_distance
        FROM documents
        WHERE status = 'READY'
          AND embedding IS NOT NULL
        """&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query changes shape depending on filters, score thresholds, and pagination. That logic is handled directly in the service layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;applyFilters&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Entry&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;entrySet&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getKey&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"^[a-zA-Z0-9_-]{1,64}$"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;IllegalArgumentException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"Invalid metadata filter key: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
            &lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  AND (metadata-&amp;gt;&amp;gt;'"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"') = ?\n"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The filter key is appended directly into the SQL string. SQL does not allow parameterizing column names or JSON path expressions. The regex is the only thing standing between user input and the database.&lt;/p&gt;

&lt;p&gt;The subquery structure is necessary — not stylistic. PostgreSQL cannot reference a &lt;code&gt;SELECT&lt;/code&gt; alias in the &lt;code&gt;WHERE&lt;/code&gt; clause of the same query level. &lt;code&gt;cosine_distance&lt;/code&gt; has to be resolved in a subquery before the score threshold can filter on it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What I'd do differently:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would define the persistence boundary earlier.&lt;/p&gt;

&lt;p&gt;The temptation is to keep everything inside one abstraction for consistency. &lt;/p&gt;

&lt;p&gt;In practice, that usually produces a worse result: repository interfaces with native queries, partial use of the ORM, and search logic split awkwardly across layers that were never designed to express it.&lt;/p&gt;

&lt;p&gt;The common workaround — &lt;code&gt;@NativeQuery&lt;/code&gt; annotations in the repository — produces the worst of both approaches: SQL strings embedded in JPA annotations, losing JPQL readability without gaining the flexibility of &lt;code&gt;JdbcTemplate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The better approach is to make the boundary explicit from the start. In this system, JPA owns entity lifecycle operations. The moment a query depends on vector operators, JSONB path access, runtime query construction, or database-specific casting, it moves into JDBC-backed SQL.&lt;/p&gt;

&lt;p&gt;This is not a compromise — it is a cleaner design.&lt;/p&gt;

&lt;p&gt;The ORM does not fail here. It reaches the edge of what it was built to model.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; ORMs are useful until query behavior becomes database-specific; after that, forcing consistency across abstractions usually makes the design worse.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What This System Actually Is
&lt;/h2&gt;

&lt;p&gt;At the beginning, I thought I was building a semantic search API.&lt;/p&gt;

&lt;p&gt;As the system took shape, it became clear that result quality had little to do with how embeddings were generated. It depended on how data was structured before embedding and how results were filtered and ranked afterward. The system was deciding what counted as a useful result.&lt;/p&gt;

&lt;p&gt;What I built wasn’t a feature behind an endpoint. It was a pipeline — one that transformed input, enforced constraints, and produced outcomes shaped by decisions at each stage.&lt;/p&gt;

&lt;p&gt;How documents were structured at ingestion determined what the embedding layer could represent. How failures were captured determined what could be diagnosed. The lifecycle model defined what was eligible for retrieval. The ranking logic determined what users actually saw.&lt;/p&gt;

&lt;p&gt;Search was the surface. Beneath it was a set of interdependent decisions that had to remain consistent.&lt;/p&gt;

&lt;p&gt;That’s what made it hard. Not the vectors. Not the SQL. The design decisions that had to hold together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/oozioma/series/37061"&gt;Semantic Search Systems Series&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>systemdesign</category>
      <category>machinelearning</category>
      <category>java</category>
      <category>backend</category>
    </item>
    <item>
      <title>The Service Layer: Where Separate Components Become a System</title>
      <dc:creator>Ozioma Ochin</dc:creator>
      <pubDate>Sun, 05 Apr 2026 20:24:51 +0000</pubDate>
      <link>https://dev.to/oozioma/the-service-layer-where-separate-components-become-a-system-4oeh</link>
      <guid>https://dev.to/oozioma/the-service-layer-where-separate-components-become-a-system-4oeh</guid>
      <description>&lt;p&gt;This is Part 4 of a series building a production-ready semantic search API with Java, Spring Boot, and pgvector.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-1-architecture-58b9"&gt;Part 1&lt;/a&gt; covered the architecture. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-2-designing-the-postgresql-2jlb"&gt;Part 2&lt;/a&gt; defined the schema. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-3-the-embedding-layer-1pj0"&gt;Part 3&lt;/a&gt; handled the embeddings — how text becomes vectors.&lt;/p&gt;

&lt;p&gt;Each piece worked in isolation. &lt;/p&gt;

&lt;p&gt;But systems don't fail in isolation — they fail at the boundaries.&lt;/p&gt;

&lt;p&gt;If you've ever built a feature that worked perfectly on its own but broke the moment you connected it to everything else — this article is about preventing that.&lt;/p&gt;

&lt;p&gt;At this point, we have a schema that can store documents and an embedding layer that can generate vectors. &lt;/p&gt;

&lt;p&gt;But nothing connects them. A document has nowhere to go. A query has no pipeline.&lt;/p&gt;

&lt;p&gt;This is where the service layer comes in.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;This is a production-style implementation — not a demo. The full project structure, tests, and configuration are available on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What Does the Service Layer Actually Do?
&lt;/h3&gt;

&lt;p&gt;The database stores state, but it doesn't understand it. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;PENDING&lt;/code&gt;, &lt;code&gt;READY&lt;/code&gt;, and &lt;code&gt;FAILED&lt;/code&gt; only become meaningful once the service layer defines when those transitions happen and what triggers them.&lt;/p&gt;

&lt;p&gt;When a document arrives, the service decides the order of operations — save first, embed second, update on success, record failure explicitly if something goes wrong.&lt;/p&gt;

&lt;p&gt;Search follows the same pattern. A query doesn't go straight to the database. It's first converted into an embedding, then passed through a query that applies lifecycle constraints, metadata filters, and scoring thresholds. &lt;/p&gt;

&lt;p&gt;The service layer controls that entire pipeline.&lt;/p&gt;

&lt;p&gt;The service layer owns one thing: &lt;em&gt;&lt;strong&gt;the rules that make the system predictable.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Without it, the system is just a collection of correct but disconnected components.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP Request
     │
     ▼
Controller Layer       ← validates input, delegates to service
     │
     ▼
Service Layer          ← all decisions happen here
     │                    │
     ▼                    ▼
Repository Layer      Embedding Layer
(JPA + JdbcTemplate)  (EmbeddingClient interface)
     │                    │
     ▼                    ▼
PostgreSQL + pgvector  OpenAI API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Interface That Keeps Everything Clean
&lt;/h3&gt;

&lt;p&gt;The service layer exposes one interface to the rest of the application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;DocumentService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;CreateDocumentResponse&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CreateDocumentRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;DocumentResponse&lt;/span&gt; &lt;span class="nf"&gt;getById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;SearchResponse&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SearchRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Controllers depend on the interface, not the implementation. &lt;/p&gt;

&lt;p&gt;Defining the contract as an interface and hiding the implementation behind it is what makes the system testable and changeable without cascading updates across the codebase.&lt;/p&gt;

&lt;p&gt;The more important detail is what does not cross this boundary.&lt;/p&gt;

&lt;p&gt;The Document entity never crosses this boundary — by design. Controllers receive &lt;code&gt;DTOs&lt;/code&gt;, not persistence objects. &lt;/p&gt;

&lt;p&gt;That separation means the database schema and the API contract can evolve independently. The schema can change without breaking clients. The API can change without rewriting persistence logic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Why this matters to you:&lt;/strong&gt; If you've ever had a database change break your API — or an API change force a database rewrite — this boundary is what prevents that. Define it early and hold it firmly.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What Happens When Embedding Fails?
&lt;/h3&gt;

&lt;p&gt;From the outside, creating a document looks simple. Send a document, get an ID back.&lt;/p&gt;

&lt;p&gt;Inside the service, everything is built around one assumption: the second step might fail.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Override&lt;/span&gt;
&lt;span class="nd"&gt;@Transactional&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CreateDocumentResponse&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CreateDocumentRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;Document&lt;/span&gt; &lt;span class="n"&gt;saved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;saveAsPending&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;embedAndPersist&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTitle&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContent&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;CreateDocumentResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="nc"&gt;DocumentStatus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;READY&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two lines, two distinct operations.&lt;/p&gt;

&lt;p&gt;The first saves the document immediately with a status of &lt;code&gt;PENDING&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;The document exists in the database before any embedding call is made. &lt;/p&gt;

&lt;p&gt;If the application crashes at this point, the document is already there with a recoverable state.&lt;/p&gt;

&lt;p&gt;The second calls the OpenAI API, generates the embedding, and updates the document to &lt;code&gt;READY&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;If this step fails, the document moves to &lt;code&gt;FAILED&lt;/code&gt; instead, and the error is stored directly in the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST /documents
      │
      ▼
saveAsPending()
status = PENDING ← document is safe in the database
      │
      ▼
embedAndPersist()
      │
   ┌──┴──────────────┐
   │                 │
   ▼                 ▼
status = READY   status = FAILED
searchable       error stored in DB
                 excluded from search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's an alternative that looks simpler — embed first, then save. &lt;/p&gt;

&lt;p&gt;It removes a step but removes visibility. If embedding fails in that model, the document never exists. There's no record, no state, nothing to debug. &lt;/p&gt;

&lt;p&gt;By saving first, every attempt leaves a trace. &lt;/p&gt;

&lt;p&gt;Failures don't disappear. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They become data.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This pattern — save first, embed second — is the difference between a failure you can debug and one that just disappears.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's how the failure handling actually works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;embedAndPersist&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\n\n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;updated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;update&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;SQL_UPDATE_EMBEDDING&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;toPgVectorLiteral&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;IllegalStateException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                    &lt;span class="s"&gt;"Unexpected row count updating embedding for document id="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;IllegalStateException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;markFailed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RuntimeException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Embedding failed for document id="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three decisions here worth understanding:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Title and content are concatenated for embedding. &lt;code&gt;title + "\n\n" + content&lt;/code&gt; gives the model full context. A document titled "Payment Failure Handling Policy" with content about retry logic produces a richer embedding than the content alone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;IllegalStateException&lt;/code&gt; is re-thrown unchanged. If the update affects zero or more than one row, something is wrong with the database state — not the embedding call. That error should propagate as-is rather than being wrapped as an embedding failure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Everything else triggers &lt;code&gt;markFailed&lt;/code&gt;. Network timeouts, rate limits, malformed responses — any exception that isn't an &lt;code&gt;IllegalStateException&lt;/code&gt; records the failure and re-throws. The caller sees the failure. The database gets a record of what went wrong.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most API integration failures are silent. This makes them loud.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Search — The Pipeline That Ties Everything Together
&lt;/h3&gt;

&lt;p&gt;Search is the most complex operation in the service. It touches the embedding layer, the repository, and the database — and it has to coordinate all three correctly.&lt;/p&gt;

&lt;p&gt;What makes it manageable is not reducing that complexity, but containing it deliberately.&lt;/p&gt;

&lt;p&gt;The orchestration method is deliberately small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Override&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;SearchResponse&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SearchRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;qVector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getQuery&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SearchResultItem&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fetchResults&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;qVector&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;countResults&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;qVector&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getFilters&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMinScore&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;SearchResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getPage&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSize&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;items&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four lines. Each delegates to a private method with a clear name. &lt;/p&gt;

&lt;p&gt;The method reads like a description of the search process — embed the query, fetch the results, count the total, return the response. &lt;/p&gt;

&lt;p&gt;The how is pushed down into methods that can be reasoned about in isolation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;embedQuery&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; 
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;toPgVectorLiteral&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddingClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt; 
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query goes through the same embedding client used for documents. &lt;/p&gt;

&lt;p&gt;That symmetry matters — the query and the stored documents exist in the same vector space. Without it, similarity search would be meaningless.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;SQL&lt;/code&gt; is constructed in two layers: the inner query selects candidates and computes similarity, while the outer query applies score thresholds and pagination.&lt;/p&gt;

&lt;p&gt;The split isn't stylistic. PostgreSQL cannot reference a &lt;code&gt;SELECT&lt;/code&gt; alias in a &lt;code&gt;WHERE&lt;/code&gt; clause at the same query level — which is why &lt;code&gt;cosine_distance&lt;/code&gt; must be resolved in a subquery before the score threshold can filter on it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'category'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you've ever wondered why your JPA queries feel limiting for complex use cases — this is where you cross that line deliberately.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Why JPA Isn’t Enough for Vector Search
&lt;/h3&gt;

&lt;p&gt;The search query isn't static. &lt;/p&gt;

&lt;p&gt;Metadata filters, score thresholds, and pagination all change the SQL at runtime. &lt;/p&gt;

&lt;p&gt;At that point the abstraction provided by JPA starts to break down — you're no longer mapping objects, you're constructing a query.&lt;/p&gt;

&lt;p&gt;That's where QueryBuilder comes in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryBuilder&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

   &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;StringBuilder&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
   &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;

   &lt;span class="nc"&gt;QueryBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;baseSql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;firstParam&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
       &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StringBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;baseSql&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
       &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;firstParam&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
   &lt;span class="o"&gt;}&lt;/span&gt;

   &lt;span class="nc"&gt;QueryBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;baseSql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;QueryBuilder&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
       &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StringBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;baseSql&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
       &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;addAll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
   &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two constructors mirror the structure of the query – inner and outer. &lt;/p&gt;

&lt;p&gt;The first builds the inner query. &lt;/p&gt;

&lt;p&gt;The second builds the outer query, inheriting parameters from the inner one without tracking them manually. &lt;/p&gt;

&lt;p&gt;Where injection risk actually lives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;applyFilters&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Entry&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;entrySet&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
       &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getKey&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"^[a-zA-Z0-9_-]{1,64}$"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
           &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;IllegalArgumentException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Invalid metadata filter key: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
       &lt;span class="o"&gt;}&lt;/span&gt;

       &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"  AND (metadata-&amp;gt;&amp;gt;'"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;append&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"') = ?\n"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
       &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
   &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The filter key is appended directly into the &lt;code&gt;SQL&lt;/code&gt; string. &lt;code&gt;SQL&lt;/code&gt; doesn't allow placeholders for column names or &lt;code&gt;JSON&lt;/code&gt; path expressions — which means this is where injection risk enters the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The regex is not a convenience. It is the only control point between user input and the database.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;^[a-zA-Z0-9_-]{1,64}$&lt;/code&gt; — only alphanumeric characters, underscores, and hyphens. &lt;/p&gt;

&lt;p&gt;Anything else is rejected before it reaches the database. Filter values, on the other hand, always go through JDBC parameters and are safe regardless of input. &lt;/p&gt;

&lt;p&gt;This split — validated keys, parameterised values — is what makes the query both flexible and secure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This is one of those cases where the 'boring' regex is doing serious security work. Don't skip it.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Key validation handles injection risk. The other challenge in query construction is where to apply the score threshold.&lt;/p&gt;

&lt;p&gt;Score filtering is applied on the outer query — not the inner one. &lt;code&gt;cosine_distance&lt;/code&gt; is defined in the inner query's &lt;code&gt;SELECT&lt;/code&gt; clause. &lt;/p&gt;

&lt;p&gt;PostgreSQL cannot reference that alias in a &lt;code&gt;WHERE&lt;/code&gt; clause at the same level. Wrapping it as a subquery makes it a real column in the outer scope — which is what allows &lt;code&gt;minScore&lt;/code&gt; to work at all.&lt;/p&gt;

&lt;p&gt;This is the point where you stop “using an ORM” and start designing queries deliberately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Updating a Document means Updating Its Embedding Too
&lt;/h3&gt;

&lt;p&gt;Updating a document is not the same as updating a database row.&lt;/p&gt;

&lt;p&gt;When content changes, the stored embedding becomes stale. A document about "payment retry logic" gets updated to "refund processing." &lt;/p&gt;

&lt;p&gt;But the embedding still points toward payment retries. Searches for "refund policy" would miss it. Searches for "payment retries" would still find it — incorrectly.&lt;/p&gt;

&lt;p&gt;The update operation handles this explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;applyUpdates&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;UpdateDocumentRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setTitle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTitle&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setContent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContent&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setMetadata&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMetadata&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setStatus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DocumentStatus&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;PENDING&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setEmbeddingError&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;documentRepository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;save&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The moment content changes, the embedding becomes invalid. &lt;/p&gt;

&lt;p&gt;The system makes that explicit by resetting the document to &lt;code&gt;PENDING&lt;/code&gt;, removing it from search until a new embedding is generated.&lt;/p&gt;

&lt;p&gt;This trades availability for correctness — a document disappearing briefly is preferable to returning incorrect results.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;findOrThrow&lt;/code&gt; is called again after &lt;code&gt;embedAndPersist&lt;/code&gt; so the response reflects the document's final state — including the updated status and &lt;code&gt;embeddingUpdatedAt&lt;/code&gt; timestamp — not the state before the embedding ran.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This is easy to miss when you first build it. If a document update doesn't trigger a re-embed, your search results will silently drift out of sync with your content.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  One Place for All Your Errors
&lt;/h3&gt;

&lt;p&gt;Errors in this system fall into two categories — errors the caller caused and errors the system encountered. &lt;/p&gt;

&lt;p&gt;Those two cases should not look the same.&lt;/p&gt;

&lt;p&gt;A missing document returns a &lt;code&gt;404&lt;/code&gt;. Invalid input returns a &lt;code&gt;400&lt;/code&gt;. An embedding failure returns a &lt;code&gt;500&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What matters more than the distinction is consistency — every error, regardless of where it originates, returns the same shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NOT_FOUND"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Document not found: 42"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That consistency is enforced in one place — &lt;code&gt;GlobalExceptionHandler&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestControllerAdvice&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GlobalExceptionHandler&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@ExceptionHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ResourceNotFoundException&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ErrorResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;handleNotFound&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;ResourceNotFoundException&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ErrorResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                        &lt;span class="s"&gt;"NOT_FOUND"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@ExceptionHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MethodArgumentNotValidException&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ErrorResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;handleValidation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;MethodArgumentNotValidException&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getBindingResult&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getFieldErrors&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getField&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;": "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getDefaultMessage&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;collect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Collectors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;joining&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;", "&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ErrorResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                        &lt;span class="s"&gt;"VALIDATION_ERROR"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;message&lt;/span&gt;
                &lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@ExceptionHandler&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ErrorResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;handleGeneral&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;
    &lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ErrorResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                        &lt;span class="s"&gt;"INTERNAL_ERROR"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                        &lt;span class="s"&gt;"An unexpected error occurred"&lt;/span&gt;
                &lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@RestControllerAdvice&lt;/code&gt; annotation makes it active across all controllers without being wired into any of them. &lt;/p&gt;

&lt;p&gt;The service layer throws exceptions. The handler translates them. The controllers never see error handling code.&lt;/p&gt;

&lt;p&gt;A client that always receives code and message can handle all errors with one piece of logic. &lt;/p&gt;

&lt;p&gt;A client that receives different shapes from different endpoints has to handle each one separately.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;One handler, consistent responses everywhere — your frontend team will thank you.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How the LifecycleKeeps Bad Data Out of Search
&lt;/h3&gt;

&lt;p&gt;The document lifecycle isn't just about tracking failures. It's what keeps invalid data out of search results entirely.&lt;/p&gt;

&lt;p&gt;Every search query filters on two conditions before any similarity calculation runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;PENDING&lt;/code&gt; document is excluded. A &lt;code&gt;FAILED&lt;/code&gt; document is excluded.&lt;/p&gt;

&lt;p&gt;This is where the schema design from Part 2 pays off — the composite index on &lt;code&gt;(status, created_at DESC)&lt;/code&gt; exists specifically to support this filtering pattern. &lt;/p&gt;

&lt;p&gt;Without it, every search scans the full table and discards non-ready documents. With it, PostgreSQL jumps directly to the relevant subset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PENDING ──────────────────────────────┐
   │                                  │
   ▼                                  │
embedAndPersist()                     │
   │                                  │
┌──┴──────────────┐                   │
│                 │                   │
▼                 ▼                   ▼
READY          FAILED            not searchable
searchable     error in DB
               not searchable

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lifecycle isn't just about correctness. It's a performance optimization.&lt;/p&gt;

&lt;p&gt;If you've ever had stale or incomplete data show up in search results with no explanation — a lifecycle model like this is what prevents it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The System Now Works
&lt;/h3&gt;

&lt;p&gt;With the service layer in place, the system finally behaves like a system.&lt;/p&gt;

&lt;p&gt;A document arrives at &lt;code&gt;POST /documents&lt;/code&gt;. The controller validates the request and delegates to the service. &lt;/p&gt;

&lt;p&gt;The service saves the document as &lt;code&gt;PENDING&lt;/code&gt;, calls the embedding client, and updates the status to &lt;code&gt;READY&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;The document is now stored with a valid embedding and visible to search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmo1qdgrtffcgltmbhjky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmo1qdgrtffcgltmbhjky.png" alt="Search and Post"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A search query arrives at &lt;code&gt;POST /search&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;The service embeds the query, builds the SQL dynamically through &lt;code&gt;QueryBuilder&lt;/code&gt;, applies filters and score thresholds, and returns ranked results with three score fields — &lt;code&gt;cosineDistance&lt;/code&gt;, &lt;code&gt;cosineSimilarity&lt;/code&gt;, and &lt;code&gt;score&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Every layer has exactly one job. Every failure is visible. Every response has a consistent shape.&lt;/p&gt;

&lt;p&gt;The system that started as a schema and an embedding client in &lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-1-architecture-58b9"&gt;Part 1&lt;/a&gt; is now a complete, working API.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next
&lt;/h3&gt;

&lt;p&gt;The service layer completes the system. Everything now works end to end.&lt;/p&gt;

&lt;p&gt;But working systems still have flaws.&lt;/p&gt;

&lt;p&gt;In the next article, I’ll step back from the implementation and break down what this system gets right, what it gets wrong, and what I would change if I were to build it again.&lt;/p&gt;

&lt;p&gt;See you there.&lt;/p&gt;

</description>
      <category>java</category>
      <category>vectordatabase</category>
      <category>springboot</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why Most Developers Reach for a Vector Database Too Soon.</title>
      <dc:creator>Ozioma Ochin</dc:creator>
      <pubDate>Sat, 28 Mar 2026 23:41:26 +0000</pubDate>
      <link>https://dev.to/oozioma/why-most-developers-reach-for-a-vector-database-too-soon-5bpg</link>
      <guid>https://dev.to/oozioma/why-most-developers-reach-for-a-vector-database-too-soon-5bpg</guid>
      <description>&lt;p&gt;Most semantic search tutorials start the same way: add a vector database.&lt;/p&gt;

&lt;p&gt;The feature request sounded simple: type question, get the right internal doc back.&lt;/p&gt;

&lt;p&gt;A few hundred documents. Support notes and wiki pages.&lt;/p&gt;

&lt;p&gt;Nothing exotic. The kind of thing that should take a week, maybe less.&lt;/p&gt;

&lt;p&gt;They did what most of us would do today.&lt;/p&gt;

&lt;p&gt;They watched a couple of &lt;em&gt;LangChain&lt;/em&gt; tutorials, skimmed the OpenAI docs, and followed the same architecture every example seemed to use. &lt;/p&gt;

&lt;p&gt;Documents were chunked, embeddings generated, and everything went into a hosted vector database. &lt;/p&gt;

&lt;p&gt;An ingestion pipeline kept the index in sync. &lt;/p&gt;

&lt;p&gt;Queries hit the vector store first, then the app database. &lt;/p&gt;

&lt;p&gt;It looked like the modern, correct way to build search.&lt;/p&gt;

&lt;p&gt;Three weeks later, the feature worked &lt;em&gt;—&lt;/em&gt; technically. &lt;/p&gt;

&lt;p&gt;But updating a single document meant re-running the embedding pipeline. &lt;/p&gt;

&lt;p&gt;The vector index and the app database could drift out of sync silently. &lt;/p&gt;

&lt;p&gt;API keys just to run the thing locally. &lt;/p&gt;

&lt;p&gt;Every deployment waited on background indexing to finish before results were reliable. &lt;/p&gt;

&lt;p&gt;The system was fragile in ways that would keep compounding.&lt;/p&gt;

&lt;p&gt;A Postgres full-text search would have solved the original problem in an afternoon.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The vector database wasn't wrong. It was just answering a question nobody had asked yet.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;This article is about how to ask the right question before you start building &lt;em&gt;—&lt;/em&gt; and what the answer looks like in practice.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What a Vector Database Is Actually For&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most developers working with embeddings already know what a vector database does. &lt;/p&gt;

&lt;p&gt;Fewer stop to ask whether that specific capability is what their problem actually requires. &lt;/p&gt;

&lt;p&gt;Before arguing when not to reach for one, it's worth being precise about what the tool is actually built for.&lt;/p&gt;

&lt;p&gt;When you generate embeddings for text, images, or other data, you end up with arrays of floating-point numbers. &lt;/p&gt;

&lt;p&gt;Finding the most similar item means comparing one vector against many others. &lt;/p&gt;

&lt;p&gt;For small datasets, you can do this with a simple scan. &lt;/p&gt;

&lt;p&gt;As the number of vectors grows, brute-force comparison becomes too slow, and you need specialized indexes designed for approximate nearest neighbor search.&lt;/p&gt;

&lt;p&gt;That’s the problem vector databases are optimized to solve.&lt;/p&gt;

&lt;p&gt;Under the hood, most of them rely on approximate nearest neighbour algorithms.&lt;/p&gt;

&lt;p&gt;HNSW for graph-based search.&lt;/p&gt;

&lt;p&gt;IVF for cluster-based partitioning.&lt;/p&gt;

&lt;p&gt;They trade a small amount of recall accuracy for dramatically faster queries.&lt;/p&gt;

&lt;p&gt;For semantic search, that trade-off is almost always acceptable — you don't need the single most similar document, you need several good ones, fast.&lt;/p&gt;

&lt;p&gt;pgvector exposes this same choice directly in SQL — the query is identical with or without the index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Without IVFFlat - PostgreSQL scans every row&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- with IVFFlat — PostgreSQL searches only relevant clusters&lt;/span&gt;
&lt;span class="c1"&gt;-- same query, dramatically different performance at scale&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query looks identical — the difference is entirely in the index. This is the performance decision pgvector hands back to you.&lt;/p&gt;

&lt;p&gt;Everything else people associate with vector databases — metadata filtering, hybrid search, multi-tenant indexes, reranking — sits on top of that core capability.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Here's where the confusion starts. &lt;/p&gt;

&lt;p&gt;The term "&lt;strong&gt;vector database"&lt;/strong&gt; bundles several distinct concerns — storing embeddings, searching them, filtering results, and running the infrastructure — into what looks like a single decision. &lt;/p&gt;

&lt;p&gt;The tooling reinforces it. &lt;/p&gt;

&lt;p&gt;When every tutorial wires all four together in the same five lines of code, it stops looking like a choice and starts looking like a requirement. &lt;/p&gt;

&lt;p&gt;As soon as a project involves embeddings, it can seem like a dedicated vector database is the only correct design.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It isn’t.&lt;/p&gt;

&lt;p&gt;Embeddings are just data. &lt;/p&gt;

&lt;p&gt;They can live in &lt;strong&gt;Postgres&lt;/strong&gt;, &lt;strong&gt;SQLite&lt;/strong&gt;, or &lt;strong&gt;even memory&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;A vector database becomes the right tool when approximate nearest neighbor search is the bottleneck — not when embeddings first appear in the architecture.&lt;/p&gt;

&lt;p&gt;Until that point, it’s often extra complexity you don’t need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Developers Reach for It Too Early
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tutorial monoculture
&lt;/h3&gt;

&lt;p&gt;Most examples of semantic search, RAG, or LLM-powered features follow the same pattern: chunk documents, generate embeddings, store them in a vector database, query with similarity search. &lt;/p&gt;

&lt;p&gt;LangChain demos do it. LlamaIndex demos do it. OpenAI examples do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The scalability trap
&lt;/h3&gt;

&lt;p&gt;Once embeddings enter the design, it’s easy to assume the system will eventually need fast similarity search at scale, so the vector database gets added early to avoid rewriting things later. &lt;/p&gt;

&lt;p&gt;This is the same instinct that leads teams to introduce Kafka for a service that sends ten emails a day. The future problem might be real. &lt;/p&gt;

&lt;p&gt;But solving it before it exists adds complexity immediately, with no corresponding benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tooling and marketing
&lt;/h3&gt;

&lt;p&gt;Modern vector databases have excellent documentation, polished SDKs, and tutorials that get you from zero to similarity search in under an hour. &lt;/p&gt;

&lt;p&gt;That ease of setup is genuinely impressive — and it's also exactly what makes the tool feel mandatory before you've decided whether you need it. &lt;/p&gt;

&lt;p&gt;Great onboarding has a way of skipping the step where you ask whether you should be onboarding at all.&lt;/p&gt;

&lt;p&gt;Developers don't reach for vector databases too early because they don't understand the technology. &lt;/p&gt;

&lt;p&gt;They do it because the ecosystem makes it look like the obvious first step.&lt;/p&gt;

&lt;p&gt;At some point, the vector database became the new Redis of the AI stack - added by default, before anyone confirmed it was actually needed.&lt;/p&gt;

&lt;p&gt;The result isn't broken systems. It's systems that are harder to run, slower to change, and more expensive to maintain than the problem ever required. The complexity arrives on day one. The scale that would justify it may never come.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Simpler Stack You’re Ignoring&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Two tools get overlooked almost every time: &lt;strong&gt;&lt;em&gt;pgvector&lt;/em&gt;&lt;/strong&gt;, which runs inside the Postgres instance you already have, and plain &lt;strong&gt;&lt;em&gt;keyword search&lt;/em&gt;&lt;/strong&gt;, which still solves more problems than people want to admit.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. pgvector — When Your Database Is Already Postgres
&lt;/h3&gt;

&lt;p&gt;If your application already runs on &lt;code&gt;PostgreSQL&lt;/code&gt; — and most do — adding pgvector gives you similarity search without introducing a new service.&lt;/p&gt;

&lt;p&gt;No new deployment. No additional failure mode.&lt;/p&gt;

&lt;p&gt;pgvector adds a &lt;code&gt;VECTOR&lt;/code&gt; column type and similarity operators directly to &lt;code&gt;PostgreSQL&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Embeddings live alongside the rest of your data, queryable with &lt;code&gt;SQL&lt;/code&gt;, inside the same transactional system your application already depends on. &lt;/p&gt;

&lt;p&gt;No new monitoring, no separate backups, no second system to explain to the next engineer on the team.&lt;/p&gt;

&lt;p&gt;The setup starts with enabling the extension and creating the table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;                   &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;                &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;              &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;             &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;            &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;               &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_error&lt;/span&gt;      &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_updated_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;           &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt;           &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The status column is worth noting. &lt;/p&gt;

&lt;p&gt;Because embedding is an external API call that can fail, documents move through a lifecycle — &lt;code&gt;PENDING&lt;/code&gt; when first saved, &lt;code&gt;READY&lt;/code&gt; once the embedding succeeds, &lt;code&gt;FAILED&lt;/code&gt; if the API returns an error. &lt;/p&gt;

&lt;p&gt;This means a failed embedding never silently corrupts search results. The status is always visible in the database.&lt;/p&gt;

&lt;p&gt;For similarity search to scale beyond a few thousand documents, pgvector needs an index. The IVFFlat index is what makes this production-ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;documents_embedding_ivfflat_idx&lt;/span&gt; 
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a single SQL query that combines vector similarity search with lifecycle filtering and metadata filtering simultaneously. &lt;/p&gt;

&lt;p&gt;A dedicated vector database handles each of those concerns separately — often requiring application-level joins or multiple round trips to combine them. &lt;/p&gt;

&lt;p&gt;Here, everything runs inside one query, in one database, with full ACID guarantees.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; operator is pgvector's cosine distance operator. It returns a value between 0 and 2 — lower means more similar. Results are ordered ascending so the closest matches come first.&lt;/p&gt;

&lt;p&gt;For most products, this goes further than people expect. pgvector handles millions of vectors without meaningful performance degradation for typical query patterns. &lt;/p&gt;

&lt;p&gt;If you're building an internal tool, a document search API, or a RAG feature for a product that isn't at serious scale yet, you're almost certainly in that range.&lt;/p&gt;

&lt;p&gt;There are real limits. pgvector won't give you distributed indexing, automatic sharding, or sub-10ms latency under very high query volume. &lt;/p&gt;

&lt;p&gt;If you're storing tens of millions of vectors and serving high-QPS queries, a dedicated vector database will outperform it. &lt;/p&gt;

&lt;p&gt;But by the time you reach that point, you'll know exactly why you're making the switch — because you'll have measured the problem, not imagined it.&lt;/p&gt;

&lt;p&gt;The full implementation — including all three Flyway migrations, the IVFFlat index configuration, lifecycle tracking, and metadata filtering — is available on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You don't need a new database to add semantic search. You need pgvector and a migration file.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. BM25 and Keyword Search — The Tool Nobody Wants to Admit Still Works
&lt;/h3&gt;

&lt;p&gt;Before you generate a single embedding, it's worth asking whether your users actually need semantic search — or whether they just need search that works.&lt;/p&gt;

&lt;p&gt;A lot of features labeled “&lt;strong&gt;&lt;em&gt;AI search&lt;/em&gt;&lt;/strong&gt;” are really just keyword lookup with better marketing. &lt;/p&gt;

&lt;p&gt;If your users know the words they’re looking for, traditional full-text search is often faster, simpler, and more predictable than embeddings.&lt;/p&gt;

&lt;p&gt;BM25-based search — the ranking algorithm used by most full-text engines — is extremely good at matching short, precise queries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- standard PostgreSQL full-text search — no new infrastructure needed&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
               &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'reset password'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'reset password'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs inside the same &lt;code&gt;PostgreSQL&lt;/code&gt; instance as your pgvector queries — no new service, no new failure mode.&lt;/p&gt;

&lt;p&gt;Searches like “reset password”, “invoice template”, or a specific error message often perform better with keyword scoring than with vector similarity. &lt;/p&gt;

&lt;p&gt;In domains with strict terminology — legal references, product codes, medical terms — exact matches matter more than semantic closeness.&lt;/p&gt;

&lt;p&gt;Embeddings shine when meaning matters more than wording. If users are asking “&lt;em&gt;show me something like this&lt;/em&gt;” or “&lt;em&gt;what document explains this idea&lt;/em&gt;”, vector search makes sense. &lt;/p&gt;

&lt;p&gt;If they’re typing the name of the thing they want, it usually doesn’t.&lt;/p&gt;

&lt;p&gt;You also don’t have to choose one or the other. Postgres supports full-text search, pgvector supports similarity search, and combining the two often gives better results than either alone.&lt;/p&gt;

&lt;p&gt;A hybrid query looks like this — no new infrastructure, no new service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
            &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
            &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;text_rank&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt; 
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; 
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt; 
    &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
        &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
        &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; 
&lt;span class="k"&gt;ASC&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A simple hybrid query can rank by keyword match first and semantic distance second, without adding any new infrastructure.&lt;/p&gt;

&lt;p&gt;Before adding a vector database, answer a simpler question first: can keyword search solve 80% of this? &lt;/p&gt;

&lt;p&gt;If the answer is yes, start there. You can always add embeddings later. You can't easily remove infrastructure you didn't need.&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Actually Need a Vector Database
&lt;/h2&gt;

&lt;p&gt;Vector databases aren't the villain here. They're the right tool when similarity search becomes a real, measured performance problem — not a projected one. The question is how to recognize that moment before you've already over-built.&lt;/p&gt;

&lt;p&gt;These are the thresholds where teams consistently start feeling the limits of a general-purpose setup:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Typical threshold&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector count&lt;/td&gt;
&lt;td&gt;Millions of embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query latency&lt;/td&gt;
&lt;td&gt;Sub-50 ms p99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filtering complexity&lt;/td&gt;
&lt;td&gt;Multi-tenant filters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query volume&lt;/td&gt;
&lt;td&gt;High QPS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure maturity&lt;/td&gt;
&lt;td&gt;Dedicated team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use case&lt;/td&gt;
&lt;td&gt;Recommendation, RAG, personalization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The numbers aren’t rules. They’re patterns. The real signal is when the simple approach stops being simple.&lt;/p&gt;

&lt;p&gt;If your search feature is core to the product and needs predictable latency under load, the tradeoffs of a dedicated vector database start to make sense.&lt;/p&gt;

&lt;p&gt;You know you need a vector database when brute-force similarity becomes your actual bottleneck — not your imagined one.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Decision Flowchart
&lt;/h2&gt;

&lt;p&gt;Most projects don't need a new datastore. They need a clear decision process. &lt;/p&gt;

&lt;p&gt;If you're starting a new feature and asking whether a vector database belongs in the design, the flowchart below maps the decision from '&lt;strong&gt;&lt;em&gt;I need search&lt;/em&gt;'&lt;/strong&gt; to the right tool for your current scale.&lt;/p&gt;

&lt;p&gt;A simple way to think about the decision:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1y8603b3g8pd9r17kk46.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1y8603b3g8pd9r17kk46.png" alt="Decision flowchart for choosing between keyword search, pgvector, and a vector database based on scale and requirements" width="800" height="898"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The flowchart won't cover every edge case — no decision tool does.&lt;/p&gt;

&lt;p&gt;But if your design jumps straight to a managed vector database before working through these questions, you're probably solving a scaling problem you don't have yet.&lt;/p&gt;

&lt;p&gt;The cost of that mistake shows up slowly, in complexity that compounds before the scale ever arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Should Follow the Problem
&lt;/h2&gt;

&lt;p&gt;The best architecture isn't the most modern one. &lt;/p&gt;

&lt;p&gt;It's the one that matches the problem you actually have, at the scale you're actually at — maintained by the team you actually have, not the one you might hire later.&lt;/p&gt;

&lt;p&gt;Vector databases are powerful tools, but they come with real operational cost — another service to run, another datastore to keep in sync, another place where performance and correctness can drift apart. &lt;/p&gt;

&lt;p&gt;That cost only makes sense when the problem demands it. Before that point, simpler designs are usually easier to build, easier to reason about, and easier to change when requirements shift.&lt;/p&gt;

&lt;p&gt;Starting with pgvector or full-text search doesn’t lock you in. If you outgrow it, the path to a dedicated vector database is well understood. The reverse is harder. &lt;/p&gt;

&lt;p&gt;Removing infrastructure you didn’t need is almost always more work than adding it later.&lt;/p&gt;

&lt;p&gt;The full pgvector implementation, including schemas, index configuration, and the search query shown above, is available on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; .&lt;/p&gt;

&lt;p&gt;Most systems don’t fail because they chose the wrong tool. They fail because they chose the right tool too early.&lt;/p&gt;

&lt;p&gt;The real skill isn’t knowing how to use a vector database. It’s knowing when not to.&lt;/p&gt;

</description>
      <category>backend</category>
      <category>webdev</category>
      <category>vectordatabase</category>
      <category>semanticsearch</category>
    </item>
    <item>
      <title>Building a Semantic Search API with Spring Boot and pgvector - Part 3: The Embedding Layer.</title>
      <dc:creator>Ozioma Ochin</dc:creator>
      <pubDate>Fri, 20 Mar 2026 19:24:28 +0000</pubDate>
      <link>https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-3-the-embedding-layer-1pj0</link>
      <guid>https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-3-the-embedding-layer-1pj0</guid>
      <description>&lt;p&gt;Most semantic search tutorials treat embeddings as a single line of code — call the API, get a vector, store it.&lt;/p&gt;

&lt;p&gt;In practice, this is the part of the system where the most subtle bugs live. Not the kind that throw exceptions, but the kind that silently produces wrong similarity scores, wrong rankings, and search results that look correct but feel off.&lt;/p&gt;

&lt;p&gt;When I first built this service, I expected the difficult parts to be the database schema and the search query. Instead, most of the time went into the embedding layer. Small mistakes here don’t crash the application. They just make search behave strangely.&lt;/p&gt;

&lt;p&gt;Three things make this layer trickier than it looks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;First&lt;/em&gt;&lt;/strong&gt;, the API call is external. It can fail because of network issues, rate limits, or invalid requests, and the failure is not always obvious from the client side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Second&lt;/em&gt;&lt;/strong&gt;, the response parsing has silent failure modes. A wrong field name, a missing element, or a partially parsed response can still produce a vector — just not the right one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Third&lt;/em&gt;&lt;/strong&gt;, the normalization step is easy to get wrong, skip entirely, or apply twice. When that happens, similarity scores change even though the text hasn’t.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-2-designing-the-postgresql-2jlb"&gt;Part 2&lt;/a&gt;, the schema was designed to store embeddings safely, track their lifecycle, and support retries when something goes wrong. Now we need to generate those embeddings correctly.&lt;/p&gt;

&lt;p&gt;That responsibility lives entirely inside the embedding layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What the embedding layer is responsible for&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before looking at any implementation, it helps to define what the embedding layer is supposed to do — and just as importantly, what it is not supposed to do.&lt;/p&gt;

&lt;p&gt;At a high level, the layer has one job: &lt;strong&gt;convert text into a vector that can be stored&lt;/strong&gt; and &lt;strong&gt;compared in the database&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That sounds simple, but several steps are involved: sending the text to the API, validating the response, parsing the JSON, converting to a float array, and normalizing the vector before returning it.&lt;/p&gt;

&lt;p&gt;Everything else belongs somewhere else.&lt;/p&gt;

&lt;p&gt;The embedding layer does not know about the database.&lt;/p&gt;

&lt;p&gt;It does not know about documents, metadata, or search queries.&lt;/p&gt;

&lt;p&gt;Its only responsibility is converting text into a vector and returning it to whoever asked.&lt;/p&gt;

&lt;p&gt;That boundary is what makes this layer testable, replaceable, and easy to reason about in isolation.&lt;/p&gt;

&lt;p&gt;The service layer can call it without knowing what happens inside. The tests can mock it without spinning up an &lt;code&gt;HTTP&lt;/code&gt; client. A different provider can be swapped in without touching anything outside this layer.&lt;/p&gt;

&lt;p&gt;That boundary is captured by a single interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The EmbeddingClient interface&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before looking at the OpenAI implementation, the most important design decision in this layer is the interface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingClient&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This interface is intentionally small, but it defines the boundary for the entire embedding layer.&lt;/p&gt;

&lt;p&gt;The service layer depends on this contract, not on any specific provider. As far as the rest of the application is concerned, embedding is simply a function that takes text and returns a vector.&lt;/p&gt;

&lt;p&gt;How that vector is produced is an implementation detail.&lt;/p&gt;

&lt;p&gt;One method, one responsibility.&lt;/p&gt;

&lt;p&gt;The embedding layer should not expose &lt;code&gt;HTTP&lt;/code&gt; details, &lt;code&gt;JSON&lt;/code&gt; parsing, or model configuration.&lt;/p&gt;

&lt;p&gt;All of that stays behind the implementation.&lt;/p&gt;

&lt;p&gt;The return type is also a deliberate choice. The method returns a &lt;code&gt;float[]&lt;/code&gt;, not a List and not a custom wrapper type.&lt;/p&gt;

&lt;p&gt;The database layer ultimately writes this value into a &lt;code&gt;VECTOR&lt;/code&gt; column, and pgvector expects a primitive float array. Returning anything else would only introduce unnecessary conversion code between layers.&lt;/p&gt;

&lt;p&gt;Depending on the interface rather than the implementation means the provider is swappable.&lt;/p&gt;

&lt;p&gt;The class that implements this interface today is called &lt;code&gt;OpenAiEmbeddingClient&lt;/code&gt;, but nothing in the service layer depends on that fact.&lt;/p&gt;

&lt;p&gt;The same interface could later be backed by a local model, a different provider, or even a mock implementation for tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Wiring the client with Spring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The client is registered as a Spring component and configured through constructor injection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;OpenAiEmbeddingClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ObjectMapper&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nd"&gt;@Value&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${openai.apiKey}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nd"&gt;@Value&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${openai.embeddingModel}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;mapper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The values for the API key and model come from application configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openai.apiKey=${OPENAI_API_KEY}
openai.embeddingModel=${OPENAI_EMBEDDING_MODEL:text-embedding-3-small}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading the API key from an environment variable is not just a convention, it is a requirement for any service that runs outside a local machine.&lt;/p&gt;

&lt;p&gt;Hardcoding credentials in source code makes rotation difficult and leaks secrets into version control. Using &lt;code&gt;${OPENAI_API_KEY}&lt;/code&gt; allows the same code to run locally, in CI, and in production without changes.&lt;/p&gt;

&lt;p&gt;The model name is also injected rather than hardcoded, but with a default value. The syntax &lt;code&gt;${OPENAI_EMBEDDING_MODEL:text-embedding-3-small}&lt;/code&gt; means the property is optional.&lt;/p&gt;

&lt;p&gt;If no environment variable is provided, the client falls back to &lt;code&gt;text-embedding-3-small&lt;/code&gt;. This makes local setup easier while still allowing the model to be changed without recompiling the application.&lt;/p&gt;

&lt;p&gt;Constructor injection is used instead of field injection for a reason. All dependencies are provided when the object is created, and the fields can remain final.&lt;/p&gt;

&lt;p&gt;This makes the class easier to test and prevents partially constructed instances. It also keeps the configuration visible at the entry point of the class instead of scattered across annotations.&lt;/p&gt;

&lt;p&gt;At this point the embedding layer has a clear boundary and a concrete implementation. The remaining work is inside the client itself: building the HTTP request, validating the response, and turning the result into a normalized vector.&lt;/p&gt;

&lt;p&gt;The full source code — including &lt;code&gt;OpenAiEmbeddingClient&lt;/code&gt;, &lt;code&gt;EmbeddingUtils&lt;/code&gt;, and all three Flyway migrations — is available on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The embed() orchestration method
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt; &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sendRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;validateResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;parseEmbedding&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RuntimeException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RuntimeException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to get embedding from OpenAI"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method is intentionally small. It does not contain the implementation details of the &lt;code&gt;HTTP&lt;/code&gt; call, response parsing, or normalization. Instead, it orchestrates the process by delegating each step to a private method.&lt;/p&gt;

&lt;p&gt;Keeping the public method short makes the flow easy to read. The code describes what happens without showing how it happens: send the request, validate the response, parse the embedding.&lt;/p&gt;

&lt;p&gt;The null guard at the entry point is intentional:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The embedding call should never fail because the caller passed a null value. Converting null to an empty string ensures the method always produces a result, even if the input is missing.&lt;/p&gt;

&lt;p&gt;Handling this at the boundary keeps the rest of the code simpler because the private methods never need to check for null.&lt;/p&gt;

&lt;p&gt;The exception handling follows the same idea of keeping the boundary clean. Runtime exceptions are rethrown unchanged, while checked exceptions are wrapped in a RuntimeException.&lt;/p&gt;

&lt;p&gt;The caller never has to deal with checked exceptions coming from the embedding layer, and the service layer can treat embedding failures like any other runtime error.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Building the HTTP request&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;sendRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeValueAsString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"input"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;HttpRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HttpRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;OPENAI_EMBEDDINGS_URL&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;header&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Authorization"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Bearer "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;header&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;BodyPublishers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
&lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;BodyHandlers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Several small decisions in this method prevent bugs that are difficult to trace later.&lt;/p&gt;

&lt;p&gt;The API URL is stored in a constant at the top of the class instead of being written inline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;OPENAI_EMBEDDINGS_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="s"&gt;"https://api.openai.com/v1/embeddings"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Defining the URL once makes it visible and easy to verify. A single missing character — embedding instead of embeddings — produces a &lt;code&gt;404&lt;/code&gt; that looks nothing like a URL error because the OpenAI response body for an unknown endpoint is not obvious.&lt;/p&gt;

&lt;p&gt;The request body is built using Jackson instead of concatenating strings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeValueAsString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"input"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Manually building &lt;code&gt;JSON&lt;/code&gt; is fragile. A missing quote, an extra comma, or an unescaped character in the input text can produce a request that looks correct in code but fails at runtime.&lt;/p&gt;

&lt;p&gt;Using the &lt;code&gt;ObjectMapper&lt;/code&gt; guarantees that the &lt;code&gt;JSON&lt;/code&gt; is valid every time.&lt;/p&gt;

&lt;p&gt;The request explicitly uses &lt;code&gt;UTF-8&lt;/code&gt; when writing the body.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;HttpRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;BodyPublishers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofString&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Relying on the platform default charset can lead to different behaviour between local development and production.&lt;/p&gt;

&lt;p&gt;Specifying &lt;code&gt;UTF-8&lt;/code&gt; ensures the request is encoded the same way in every environment.&lt;/p&gt;

&lt;p&gt;The method returns the raw &lt;code&gt;HTTP&lt;/code&gt; response instead of parsing it immediately. This keeps responsibilities separate. The request method only sends the request. Validation and parsing happen in the next steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Validating and parsing the response&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Not every API response is a success. Before parsing anything, the response status needs to be verified and the parsing itself has subtle failure modes worth understanding.&lt;/p&gt;

&lt;p&gt;A typical response from the embeddings API looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;
    &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s"&gt;"embedding"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.023&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.181&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.442&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...],&lt;/span&gt;
      &lt;span class="s"&gt;"index"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;],&lt;/span&gt;
  &lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"text-embedding-3-small"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="s"&gt;"usage"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"total_tokens"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first step is to verify that the request actually succeeded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;validateResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RuntimeException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"OpenAI embeddings failed: HTTP "&lt;/span&gt;
                        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" body="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of checking for a single status code, the method verifies that the response is in the 2xx range.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integer division keeps only the hundreds digit, so this condition catches any non-2xx response with one comparison.&lt;/p&gt;

&lt;p&gt;This includes rate limits, server errors, and invalid requests, all of which should stop the embedding process immediately.&lt;/p&gt;

&lt;p&gt;Once the response is known to be valid, the next step is to extract the vector.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="nf"&gt;parseEmbedding&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;responseBody&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;JsonNode&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readTree&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responseBody&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"data"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"embedding"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;()];&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;asDouble&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingUtils&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;l2Normalized&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The parsing code uses &lt;code&gt;path()&lt;/code&gt; instead of &lt;code&gt;get()&lt;/code&gt; for most lookups — and the difference matters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;path()&lt;/code&gt; returns a &lt;code&gt;MissingNode&lt;/code&gt; if the field does not exist, while &lt;code&gt;get()&lt;/code&gt; would return &lt;code&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This avoids null pointer exceptions and makes the parsing code more predictable when the response structure changes.&lt;/p&gt;

&lt;p&gt;The values are read as doubles and then cast to float.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;asDouble&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Jackson parses &lt;code&gt;JSON&lt;/code&gt; numbers as double by default. Converting through &lt;code&gt;asDouble()&lt;/code&gt; preserves precision correctly before the cast to float, which matches the type expected by pgvector.&lt;/p&gt;

&lt;p&gt;The vector is not returned directly after parsing — it passes through one more step first.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;L2 normalization: what it is and why it matters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Normalisation is the final step before the vector is returned.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="nf"&gt;l2Normalized&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;sumOfSquares&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sumOfSquares&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

        &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;norm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sqrt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sumOfSquares&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norm&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;clone&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;];&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;norm&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In geometric terms, this moves every vector onto the surface of a unit sphere.&lt;/p&gt;

&lt;p&gt;After normalization, the magnitude of the vector no longer depends on the length of the input text, only on its direction in the embedding space.&lt;/p&gt;

&lt;p&gt;This matters because similarity search uses cosine distance.&lt;/p&gt;

&lt;p&gt;Cosine similarity compares the angle between two vectors, not their length. If vectors are not normalized, longer vectors can produce larger dot products even when the meaning is not closer.&lt;/p&gt;

&lt;p&gt;Without normalization, two documents about the same topic but different lengths can score differently against the same query. Not because one is more relevant, but because one is longer.&lt;/p&gt;

&lt;p&gt;Normalization removes this length bias and makes similarity depend only on semantic direction.&lt;/p&gt;

&lt;p&gt;The method also handles the edge case where the vector length is zero.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norm&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;clone&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Returning a &lt;code&gt;clone&lt;/code&gt; instead of the original array prevents the caller from accidentally mutating the input.&lt;/p&gt;

&lt;p&gt;Recent embedding models already return normalized vectors, including &lt;code&gt;text-embedding-3-small.&lt;/code&gt; The explicit normalization here is defensive.&lt;/p&gt;

&lt;p&gt;It guarantees correct behaviour even if the model changes later, and it documents the assumption directly in code instead of relying on external behaviour.&lt;/p&gt;

&lt;p&gt;Why the embedding layer is behind an interface&lt;/p&gt;

&lt;p&gt;When the interface was introduced in earlier section, the implementation behind it was simple. Now that the full implementation is visible — &lt;code&gt;HTTP&lt;/code&gt; requests, response validation, parsing, normalisation — the value of keeping all of that behind a single method becomes clearer.&lt;/p&gt;

&lt;p&gt;A mock implementation can return a fixed vector without making an &lt;code&gt;HTTP&lt;/code&gt; call, which allows the service layer to be tested without depending on the external API.&lt;/p&gt;

&lt;p&gt;This separation may look unnecessary when the system is small, but it becomes important as soon as the embedding logic grows.&lt;/p&gt;

&lt;p&gt;The client now handles &lt;code&gt;HTTP&lt;/code&gt; requests, response validation, parsing, and normalization. Keeping all of that behind a single method prevents those details from leaking into the rest of the application.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What's Next&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Part 4 moves up one level to the service layer — where everything built so far is orchestrated into a complete API.&lt;/p&gt;

&lt;p&gt;See you in Part 4!&lt;/p&gt;

</description>
      <category>java</category>
      <category>openai</category>
      <category>postgressql</category>
      <category>springboot</category>
    </item>
    <item>
      <title>Building a Semantic Search API with Spring Boot and pgvector - Part 2: Designing the PostgreSQL Schema</title>
      <dc:creator>Ozioma Ochin</dc:creator>
      <pubDate>Sun, 15 Mar 2026 19:41:25 +0000</pubDate>
      <link>https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-2-designing-the-postgresql-2jlb</link>
      <guid>https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-2-designing-the-postgresql-2jlb</guid>
      <description>&lt;h2&gt;
  
  
  Why the database layer matters
&lt;/h2&gt;

&lt;p&gt;In a semantic search system, the database schema isn’t just storage.&lt;/p&gt;

&lt;p&gt;It defines how embeddings are stored, indexed, and queried.&lt;/p&gt;

&lt;p&gt;Many tutorials treat the database as a detail - create a table, add a vector column, and move on. &lt;/p&gt;

&lt;p&gt;But when search quality depends on how vectors are stored and compared, the schema becomes a core architectural decision.&lt;/p&gt;

&lt;p&gt;The schema determines what the system can do and what it cannot.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A missing index means slow queries at scale.&lt;/li&gt;
&lt;li&gt;A missing status column means no visibility into embedding failures.&lt;/li&gt;
&lt;li&gt;A poorly typed metadata column means filters that silently break.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every column and every index in this schema exists because a specific part of the system depends on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running pgvector locally
&lt;/h2&gt;

&lt;p&gt;Before any migrations run, the database needs to support vector operations. That means PostgreSQL with the pgvector extension installed.&lt;/p&gt;

&lt;p&gt;Using pgvector lets us keep embeddings in the same database as the documents.&lt;/p&gt;

&lt;p&gt;This avoids the complexity of running a separate vector store.&lt;/p&gt;

&lt;p&gt;For this project, the goal is simplicity and consistency, not maximum scale.&lt;/p&gt;

&lt;p&gt;Keeping everything in PostgreSQL makes the system easier to reason about.&lt;/p&gt;

&lt;p&gt;There’s no Pinecone account to manage, no separate service to keep in sync, and no eventual consistency between documents and embeddings.&lt;/p&gt;

&lt;p&gt;Everything lives in one place and can be written in a single transaction.&lt;/p&gt;

&lt;p&gt;The local setup uses Docker with the official pgvector image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pgvector/pgvector:pg16&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;semantic_search_postgres&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;semantic_search&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;semantic&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;semantic&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5432:5432"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pgdata:/var/lib/postgresql/data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The important line is pgvector/pgvector:pg16 instead of the standard postgres:16.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This image ships with the pgvector extension pre-installed.&lt;/p&gt;

&lt;p&gt;No manual compilation, no OS-specific setup step.&lt;/p&gt;

&lt;p&gt;Pull the image and the extension is ready.&lt;/p&gt;

&lt;p&gt;The project includes two compose files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`docker-compose_dev.yml` runs only the database — useful when running the Spring Boot app from IntelliJ

`docker-compose.yml` runs the full stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full source code including all the migrations is available on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Flyway works in this project
&lt;/h2&gt;

&lt;p&gt;The schema in this project didn't arrive fully formed.&lt;/p&gt;

&lt;p&gt;It evolved over time, and the migrations show exactly how.&lt;/p&gt;

&lt;p&gt;Instead of writing the final schema all at once, the project builds it step by step through Flyway migrations, just like a real system would.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;V1 creates the foundation.&lt;/li&gt;
&lt;li&gt;V2 adds document lifecycle tracking.&lt;/li&gt;
&lt;li&gt;V3 fixes a data quality problem that V1 didn't anticipate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each migration represents a decision made at a specific point in the project's history.&lt;/p&gt;

&lt;p&gt;This approach matters for two reasons.&lt;/p&gt;

&lt;p&gt;First, reproducibility. Any developer cloning the repository gets the exact same schema by running the application. Flyway applies the migrations in order, tracks which ones have already run in its flyway_schema_history table, and skips anything that's already been applied.&lt;/p&gt;

&lt;p&gt;Second, safety. Because &lt;code&gt;spring.jpa.hibernate.ddl-auto&lt;/code&gt; is set to &lt;code&gt;validate&lt;/code&gt;, Hibernate will refuse to start if the schema doesn't match the entity definitions.&lt;/p&gt;

&lt;p&gt;Flyway owns the schema.&lt;/p&gt;

&lt;p&gt;Hibernate only validates it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;spring.flyway.enabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;spring.flyway.locations&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;classpath:db/migration&lt;/span&gt;
&lt;span class="py"&gt;spring.jpa.hibernate.ddl-auto&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;validate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The naming convention for migrations also matters.&lt;/p&gt;

&lt;p&gt;Every file follows the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;V{number}__{description}.sql 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two underscores between the version number and the description.&lt;/p&gt;

&lt;p&gt;Flyway uses the version number to determine execution order, and a checksum of each file to detect changes.&lt;/p&gt;

&lt;p&gt;If a migration file is modified after it has already been applied, Flyway refuses to start.&lt;/p&gt;

&lt;p&gt;That constraint is intentional.&lt;/p&gt;

&lt;p&gt;It forces schema changes to go through new migrations instead of editing old ones.&lt;/p&gt;

&lt;p&gt;This project ends up with &lt;strong&gt;three&lt;/strong&gt; migrations.&lt;/p&gt;

&lt;p&gt;The first migration builds the entire foundation - table, indexes, and trigger - in a single SQL file.&lt;/p&gt;

&lt;h2&gt;
  
  
  V1: Building the foundation
&lt;/h2&gt;

&lt;p&gt;The schema is designed around how the search queries will run, not just how the data is stored.&lt;/p&gt;

&lt;p&gt;Every column in V1 exists because a specific part of the system depends on it.&lt;/p&gt;

&lt;p&gt;Rather than showing the full migration at once, each part is broken down and explained in the order it appears in the file: extension first, table second, indexes third, and trigger last.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This line has to come first. PostgreSQL does not support the VECTOR type by default, so the rest of the migration would fail without it.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;IF NOT EXISTS&lt;/code&gt; clause also makes the migration safer. If the extension is already installed in a local environment, CI database, or shared dev database, Flyway can still run the migration without error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;       &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;    &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;   &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each column exists for a reason:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;id BIGSERIAL PRIMARY KEY&lt;/code&gt;&lt;br&gt;
A standard auto-incrementing identifier. BIGSERIAL is used instead of SERIAL to avoid running out of IDs in larger datasets.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;title TEXT NOT NULL&lt;/code&gt; and content &lt;code&gt;TEXT NOT NULL&lt;/code&gt;&lt;br&gt;
These are the fields that get embedded. &lt;code&gt;TEXT&lt;/code&gt; is used instead of &lt;code&gt;VARCHAR&lt;/code&gt; because PostgreSQL handles &lt;code&gt;TEXT&lt;/code&gt; efficiently, and a hard length limit would be artificial here.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;metadata JSONB&lt;/code&gt;&lt;br&gt;
Optional metadata for filtering. &lt;code&gt;JSONB&lt;/code&gt; is used instead of &lt;code&gt;JSON&lt;/code&gt; because it is faster to query and supports &lt;code&gt;GIN&lt;/code&gt; indexing.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;embedding VECTOR(1536)&lt;/code&gt;&lt;br&gt;
The vector representation of the document. 1536 matches the output size of OpenAI’s text-embedding-3-small model. If the model changes, this column definition would also need to change.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;created_at TIMESTAMPTZ NOT NULL DEFAULT now()&lt;/code&gt;&lt;br&gt;
Stores when the row was created. The database sets it automatically.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;updated_at TIMESTAMPTZ NOT NULL DEFAULT now()&lt;/code&gt;&lt;br&gt;
Stores when the row was last updated. This value is maintained by a trigger so it stays correct no matter how the row is modified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The indexes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;V1 adds three indexes, each supporting a different access pattern.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Created_at index&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_documents_created_at&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a standard B-tree index for queries that sort documents by creation time.&lt;/p&gt;

&lt;p&gt;That is useful for admin pages, auditing, and any endpoint that lists recently created documents.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;DESC&lt;/code&gt; ordering matches the most common query pattern, so PostgreSQL does not need to sort the results separately.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Metadata GIN index&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_documents_metadata_gin&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This index supports metadata filtering.&lt;/p&gt;

&lt;p&gt;Because metadata is stored as &lt;code&gt;JSONB&lt;/code&gt;, PostgreSQL can use a &lt;code&gt;GIN&lt;/code&gt; index to search inside the &lt;code&gt;JSON&lt;/code&gt; structure efficiently. Without this index, metadata filters would require a full table scan.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;embedding IVFFlat index&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;documents_embedding_ivfflat_idx&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the most important index in the migration.&lt;/p&gt;

&lt;p&gt;Without it, vector similarity search would require comparing the query embedding against every stored embedding in the table.&lt;/p&gt;

&lt;p&gt;ivfflat is pgvector’s approximate nearest-neighbour index. It improves speed by grouping vectors into clusters and searching only the clusters closest to the query vector.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That comes with a tradeoff: slightly lower recall in exchange for much faster queries.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;lists = 100&lt;/code&gt; setting controls how many clusters are created. For a small dataset, 100 is a reasonable starting point. As the dataset grows, this value should be revisited and the index rebuilt if needed.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;vector_cosine_ops&lt;/code&gt; operator class is also important. It tells PostgreSQL to optimize the index for cosine distance, which must match the operator used in the query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Search Request
     ↓
status = 'READY'
     ↓
metadata filters
     ↓
embedding &amp;lt;=&amp;gt; query_vector
     ↓
ranked results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The schema and indexes are designed around the path a search query will take through the table.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trigger&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;set_updated_at&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;trg_documents_updated_at&lt;/span&gt;
&lt;span class="k"&gt;BEFORE&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;EACH&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;set_updated_at&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that updated_at is refreshed automatically every time a document row is updated.&lt;/p&gt;

&lt;p&gt;Handling this in the database is more reliable than doing it in the service layer. Even if someone updates the row through raw SQL, the timestamp remains correct.&lt;/p&gt;

&lt;p&gt;That matters because a document content change may mean the stored embedding is now stale. An accurate timestamp is the only way to know when it was last computed.&lt;/p&gt;

&lt;p&gt;V1 is the foundation.&lt;/p&gt;

&lt;p&gt;Everything in V2 and V3 builds on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  V2: Adding the document lifecycle
&lt;/h2&gt;

&lt;p&gt;Once embeddings are generated, the database needs to track the state of each document.&lt;/p&gt;

&lt;p&gt;Embedding is not instant. It depends on an external API call, which can &lt;strong&gt;fail&lt;/strong&gt;, &lt;strong&gt;time out&lt;/strong&gt;, or &lt;strong&gt;hit rate limits&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Without a way to track embedding state, a document could exist in the database but never appear in search results, with no clear explanation why.&lt;/p&gt;

&lt;p&gt;V2 introduces a simple lifecycle model so the system always knows whether a document is searchable.&lt;/p&gt;

&lt;p&gt;This migration adds three columns and one index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;embedding_error&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;embedding_updated_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_documents_status_created&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why lifecycle tracking is needed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Embedding is performed after the document is stored, not at the same time.&lt;/p&gt;

&lt;p&gt;That means a document can exist in several states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;saved but not embedded yet.&lt;/li&gt;
&lt;li&gt;successfully embedded.&lt;/li&gt;
&lt;li&gt;failed to embed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a status column, the system cannot tell these cases apart.&lt;/p&gt;

&lt;p&gt;A failed embedding would simply result in a document that never shows up in search, which makes debugging difficult.&lt;/p&gt;

&lt;p&gt;V2 makes document state explicit in the schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The new columns&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;status&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This column tracks where a document is in its lifecycle.&lt;/p&gt;

&lt;p&gt;The system uses three values:&lt;/p&gt;

&lt;p&gt;PENDING — document saved, embedding not generated yet.&lt;br&gt;
READY — embedding generated successfully.&lt;br&gt;
FAILED — embedding request failed.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The default value is READY.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This might look strange at first, but it keeps the migration safe for existing rows. When the migration runs on a database that already has documents, those rows need a valid status value.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;READY&lt;/code&gt; assumes existing data already has embeddings, which is the safest assumption.&lt;/p&gt;

&lt;p&gt;New documents created after this migration are explicitly set to PENDING by the application before embedding runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    PENDING
       ↓
     READY

    PENDING
      ↓
    FAILED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Once embedding becomes a separate step, document state must become part of the schema.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;embedding_error&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;embedding_error TEXT&lt;/code&gt; stores the error message when embedding fails.&lt;/p&gt;

&lt;p&gt;Most documents will never use this column, so it is nullable.&lt;/p&gt;

&lt;p&gt;When a document is in &lt;code&gt;FAILED&lt;/code&gt; state, this field makes debugging much easier. Instead of searching through logs, the failure reason is visible directly in the database.&lt;/p&gt;

&lt;p&gt;The error stored here might be a network timeout, a rate limited response, or an unexpected payload, whatever the API returned at the time of failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;embedding_updated_at&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;embedding_updated_at TIMESTAMPTZ&lt;/code&gt; stores the last time the embedding was generated.&lt;/p&gt;

&lt;p&gt;This is different from &lt;code&gt;updated_at&lt;/code&gt;, which tracks when the document row changes.&lt;/p&gt;

&lt;p&gt;This column makes it possible to implement retry logic later.&lt;/p&gt;

&lt;p&gt;For example, a background job could look for documents where:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'FAILED'&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding_updated_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'1 hour'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and retry the embedding only for older failures.&lt;/p&gt;

&lt;p&gt;This avoids retrying the same document repeatedly in a tight loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The composite index&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_documents_status_created&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This index supports two important query patterns.&lt;/p&gt;

&lt;p&gt;First, search queries filter on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;before performing vector similarity search.&lt;/p&gt;

&lt;p&gt;Without an index on &lt;code&gt;status&lt;/code&gt;, PostgreSQL would have to scan many rows before the vector index can do its job.&lt;/p&gt;

&lt;p&gt;Second, admin queries often need to list documents by status, ordered by creation time.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;newest failed documents.&lt;/li&gt;
&lt;li&gt;newest pending documents.&lt;/li&gt;
&lt;li&gt;newest ready documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The column order in the index matters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;status&lt;/code&gt; comes first because it is used for filtering&lt;/p&gt;

&lt;p&gt;&lt;code&gt;created_at DESC&lt;/code&gt; comes second because it is used for sorting&lt;/p&gt;

&lt;p&gt;With this order, PostgreSQL can use the same index for both filtering and ordering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why V2 changes the schema&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once embedding becomes a separate step, document state becomes part of the database model.&lt;/p&gt;

&lt;p&gt;This is a good example of why schemas evolve.&lt;/p&gt;

&lt;p&gt;The first version only needed to store documents.&lt;/p&gt;

&lt;p&gt;The second version needs to describe their lifecycle.&lt;/p&gt;

&lt;p&gt;And that change belongs in the database, not just in the application.&lt;/p&gt;

&lt;h2&gt;
  
  
  V3: Fixing bad data
&lt;/h2&gt;

&lt;p&gt;Real systems rarely get the schema right the first time.&lt;/p&gt;

&lt;p&gt;V3 is different from previous migrations. It doesn't add columns or create indexes.&lt;/p&gt;

&lt;p&gt;Instead, it fixes data that was stored incorrectly in an earlier version of the service.&lt;/p&gt;

&lt;p&gt;Before V3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"category=billing"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After V3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;metadata&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"raw"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"category=billing"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This migration exists because the application originally allowed metadata to be saved as a &lt;code&gt;JSON&lt;/code&gt; string instead of a &lt;code&gt;JSON&lt;/code&gt; object.&lt;/p&gt;

&lt;p&gt;That turned out to be a problem later when filtering was added.&lt;/p&gt;

&lt;p&gt;This is a data migration, not a schema migration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="err"&gt;  &lt;/span&gt;&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;jsonb_typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What went wrong&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The metadata column is defined as &lt;code&gt;JSONB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That means PostgreSQL will accept any valid &lt;code&gt;JSON&lt;/code&gt; value: object, array, string, and number.&lt;/p&gt;

&lt;p&gt;The application’s filter logic assumes metadata is always a &lt;code&gt;JSON&lt;/code&gt; object, so it can query values using operators like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'category'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But earlier versions of the service allowed values like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"category=billing"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is valid &lt;code&gt;JSON&lt;/code&gt;, but it is not an object.&lt;/p&gt;

&lt;p&gt;Once filters were added, these rows stopped working correctly.&lt;/p&gt;

&lt;p&gt;Queries expecting key–value pairs would fail or return incorrect results.&lt;/p&gt;

&lt;p&gt;Later versions of the API added validation to prevent this, but by then some bad data already existed in the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the migration does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The migration finds rows where metadata is stored as a string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;jsonb_typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and converts them into a valid &lt;code&gt;JSON&lt;/code&gt; object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;jsonb_build_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="s2"&gt;"category=billing"&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"raw"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"category=billing"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach keeps the original value instead of deleting it.&lt;/p&gt;

&lt;p&gt;The document stays intact, and the metadata becomes valid &lt;code&gt;JSON&lt;/code&gt; that the filter system can handle.&lt;/p&gt;

&lt;p&gt;If needed, a developer can still inspect the original value inside the “&lt;strong&gt;raw&lt;/strong&gt;” field.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this migration matters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This migration highlights something that happens in almost every real system:&lt;/p&gt;

&lt;p&gt;Migrations do not only add features, they also repair data.&lt;/p&gt;

&lt;p&gt;Nothing new was added to the schema, but the data became consistent again.&lt;/p&gt;

&lt;p&gt;This kind of migration is common in production systems. The important part is not avoiding mistakes, it is fixing them safely without losing information.&lt;/p&gt;

&lt;p&gt;V3 is a record of a real problem that existed, and the decision made to correct it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Complete Schema
&lt;/h2&gt;

&lt;p&gt;After all three migrations, the schema is now complete.&lt;/p&gt;

&lt;p&gt;Here is the final table definition after V1, V2, and V3 have all been applied.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;embedding_error&lt;/span&gt;&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;embedding_updated_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="err"&gt;    &lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing in this table is accidental.&lt;/p&gt;

&lt;p&gt;Every column exists because a specific part of the system needs it.&lt;/p&gt;

&lt;p&gt;The schema now describes not just the data, but the behavior of the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Indexes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each index exists for a specific query pattern — remove any one of them and something in the system either breaks or slows down significantly at scale.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;idx_documents_created_at&lt;/td&gt;
&lt;td&gt;B-tree&lt;/td&gt;
&lt;td&gt;Ordering by creation time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;idx_documents_metadata_gin&lt;/td&gt;
&lt;td&gt;GIN&lt;/td&gt;
&lt;td&gt;Metadata filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;documents_embedding_ivfflat_idx&lt;/td&gt;
&lt;td&gt;IVFFlat&lt;/td&gt;
&lt;td&gt;Vector similarity search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;idx_documents_status_created&lt;/td&gt;
&lt;td&gt;B-tree (composite)&lt;/td&gt;
&lt;td&gt;Status filtering + ordering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What's Coming Next
&lt;/h2&gt;

&lt;p&gt;The schema can now store embeddings, but generating them is where things get interesting.&lt;/p&gt;

&lt;p&gt;Part 3 covers the full implementation of the embedding client using Java's built-in &lt;code&gt;HttpClient&lt;/code&gt;, with no third-party SDK.&lt;/p&gt;

&lt;p&gt;It also covers the bugs that are hardest to catch: the ones that don't throw exceptions but silently produce wrong similarity scores.&lt;/p&gt;

&lt;p&gt;See you in &lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-3-the-embedding-layer-1pj0"&gt;Part 3&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>java</category>
      <category>springboot</category>
      <category>postgressql</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Building a Semantic Search API with Spring Boot and pgvector - Part 1: Architecture</title>
      <dc:creator>Ozioma Ochin</dc:creator>
      <pubDate>Sun, 08 Mar 2026 17:52:49 +0000</pubDate>
      <link>https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-1-architecture-58b9</link>
      <guid>https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-1-architecture-58b9</guid>
      <description>&lt;h2&gt;
  
  
  The problem with Keyword Search
&lt;/h2&gt;

&lt;p&gt;Keyword search breaks more often than most engineers realize.&lt;/p&gt;

&lt;p&gt;A few months ago, I was building an internal document management tool. Users could upload policy documents, product guides, and support articles — and search through them.&lt;/p&gt;

&lt;p&gt;I implemented a simple keyword search, deployed it, and assumed I was done.&lt;/p&gt;

&lt;p&gt;Then the complaints started.&lt;/p&gt;

&lt;p&gt;One support engineer searched for "billing retries" and got zero results. The document absolutely existed. It was titled "Payment Failure Handling Policy" and covered exactly what they were looking for.&lt;/p&gt;

&lt;p&gt;The problem wasn’t the content.&lt;/p&gt;

&lt;p&gt;The problem was the search engine.&lt;/p&gt;

&lt;p&gt;It was doing exactly what keyword search is designed to do: scanning documents for the exact words “billing” and “retries.”&lt;/p&gt;

&lt;p&gt;Those words weren't in the document. So the system concluded there was no match.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Query:    "billing retries"&lt;br&gt;
Document: "Payment Failure Handling Policy"&lt;br&gt;
Keyword search:  ❌ No match   — strings don't overlap&lt;br&gt;
Semantic search: ✅ Strong match — meaning is the same&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This is the fundamental limitation of keyword search: it compares strings, not meaning.&lt;/p&gt;

&lt;p&gt;It treats "&lt;strong&gt;car&lt;/strong&gt;" and "&lt;strong&gt;automobile&lt;/strong&gt;" as completely unrelated.&lt;/p&gt;

&lt;p&gt;It sees "help me fix this bug" and "debugging assistance" as different queries. But that’s not how people search in the real world.&lt;/p&gt;

&lt;p&gt;People search using intent, and they rarely phrase a query the same way a document is written.&lt;/p&gt;

&lt;p&gt;Semantic search approaches the problem differently.&lt;/p&gt;

&lt;p&gt;Instead of matching text directly, it attempts to capture the meaning behind the words. To do that, it converts text into numerical representations called embeddings. Before we build the search system itself, we first need to understand what embeddings are and why they work.&lt;/p&gt;

&lt;p&gt;Keyword Search vs Semantic Search:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ofyhde00d38u5ywgmys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ofyhde00d38u5ywgmys.png" alt="Keyword Search vs Semantic Search" width="800" height="1296"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What Are Embeddings?
&lt;/h2&gt;

&lt;p&gt;Embeddings are the core idea behind semantic search. At a high level, an embedding is a numerical representation of text.&lt;/p&gt;

&lt;p&gt;Instead of storing meaning as words, machine learning models convert text into vectors - lists of numbers that capture semantic relationships between pieces of text.&lt;/p&gt;

&lt;p&gt;For example, a sentence like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"How do I retry a failed payment?" =&amp;gt; [0.023, -0.181, 0.442, ..., 0.091]
                                        1,536 dimensions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These numbers by themselves don’t mean much to us.&lt;/p&gt;

&lt;p&gt;What matters is how close two vectors are in this space.&lt;/p&gt;

&lt;p&gt;If two pieces of text express similar ideas, their vectors will appear close together in this space.&lt;/p&gt;

&lt;p&gt;If they describe completely different concepts, their vectors will be far apart.&lt;/p&gt;

&lt;p&gt;For example, embeddings for the words: "&lt;strong&gt;car&lt;/strong&gt;", "&lt;strong&gt;automobile&lt;/strong&gt;" and "&lt;strong&gt;vehicle&lt;/strong&gt;" will appear very close together.&lt;/p&gt;

&lt;p&gt;Meanwhile, something unrelated like: "&lt;strong&gt;Banana&lt;/strong&gt;" will be far away from them.&lt;/p&gt;

&lt;p&gt;This is what allows semantic search to work.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;p&gt;Do these documents contain the same words?&lt;/p&gt;

&lt;p&gt;The system asks:&lt;/p&gt;

&lt;p&gt;Are these documents about the same idea?&lt;/p&gt;

&lt;p&gt;That small shift fundamentally changes how search works.&lt;/p&gt;

&lt;p&gt;It allows search engines to retrieve relevant documents even when the wording is completely different.&lt;/p&gt;

&lt;p&gt;Embedding Space:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rx4cuooqujbiiui7dee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rx4cuooqujbiiui7dee.png" alt="Embedding Space" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In practice, modern embedding models produce vectors with hundreds or thousands of dimensions.&lt;/p&gt;

&lt;p&gt;The model used in this project generates vectors with 1536 dimensions, which means every piece of text becomes a point in a 1536-dimensional space.&lt;/p&gt;

&lt;p&gt;While we can't visualize that space directly, distance between vectors can still be measured mathematically.&lt;/p&gt;

&lt;p&gt;That measurement is what allows us to rank documents by semantic similarity.&lt;/p&gt;
&lt;h2&gt;
  
  
  Measuring Semantic Similarity
&lt;/h2&gt;

&lt;p&gt;Once both the query and documents are converted into embeddings, the next question becomes:&lt;/p&gt;

&lt;p&gt;How do we compare them?&lt;/p&gt;

&lt;p&gt;This is where vector similarity comes in.&lt;/p&gt;

&lt;p&gt;A semantic search system measures how close two vectors are to each other in the embedding space.&lt;/p&gt;

&lt;p&gt;If two vectors point in nearly the same direction, the underlying text likely expresses the same idea.&lt;/p&gt;

&lt;p&gt;If the vectors point in very different directions, the concepts are probably unrelated.&lt;/p&gt;

&lt;p&gt;One of the most common ways to measure this similarity is cosine similarity.&lt;/p&gt;

&lt;p&gt;Cosine similarity measures the angle between two vectors.&lt;/p&gt;

&lt;p&gt;Vectors pointing in the same direction have a similarity close to 1.&lt;/p&gt;

&lt;p&gt;Vectors pointing in different directions have a similarity closer to 0.&lt;/p&gt;

&lt;p&gt;In practice, this allows the search system to rank documents by semantic relevance.&lt;/p&gt;

&lt;p&gt;Instead of returning documents that simply contain the same words, the system returns documents whose meaning is closest to the user’s query.&lt;/p&gt;

&lt;p&gt;This is what makes semantic search so powerful.&lt;/p&gt;

&lt;p&gt;Even if the wording is different, the search engine can still retrieve the right documents.&lt;/p&gt;

&lt;p&gt;Cosine Similarity:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2m6t2rw7lij8vyj00nqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2m6t2rw7lij8vyj00nqp.png" alt="Cosine Similarity" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let's look at how this works in the system we're building.&lt;/p&gt;
&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;The service exposes six endpoints:&lt;/p&gt;

&lt;p&gt;POST   /documents        — store a document and compute its embedding&lt;/p&gt;

&lt;p&gt;GET    /documents/{id}   — retrieve a document by ID&lt;/p&gt;

&lt;p&gt;PUT    /documents/{id}   — update a document and re-compute its embedding&lt;/p&gt;

&lt;p&gt;DELETE /documents/{id}   — remove a document&lt;/p&gt;

&lt;p&gt;POST   /search           — semantic search with filters and pagination&lt;/p&gt;

&lt;p&gt;GET    /ping             — health check&lt;/p&gt;

&lt;p&gt;When a search request arrives, the system does five things in sequence.&lt;/p&gt;

&lt;p&gt;The client sends a query to the API.&lt;/p&gt;

&lt;p&gt;The API converts that query into an embedding using the same model that was used to embed the stored documents.&lt;/p&gt;

&lt;p&gt;PostgreSQL then performs a vector similarity search using pgvector, comparing the query vector against document embeddings using pgvector's vector index.&lt;/p&gt;

&lt;p&gt;The database returns the documents whose vectors are closest to the query.&lt;/p&gt;

&lt;p&gt;The API ranks them by similarity score and returns the results.&lt;/p&gt;

&lt;p&gt;The key detail is step two. The query and the documents are embedded using the same model, which means they live in the same vector space.&lt;/p&gt;

&lt;p&gt;That shared space is what makes comparison possible. A query about "billing retries" and a document about "payment failure handling" end up close together in that space, and pgvector finds that closeness in milliseconds even across thousands of documents.&lt;/p&gt;

&lt;p&gt;Search Execution Flow:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsa7tmbk73d71ombzoxef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsa7tmbk73d71ombzoxef.png" alt="Search Execution Flow" width="800" height="2092"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because pgvector runs inside PostgreSQL, the similarity search can be combined with standard database features — filtering by metadata, pagination, and indexing — all inside a single query.&lt;/p&gt;

&lt;p&gt;No separate vector database is required.&lt;/p&gt;

&lt;p&gt;Here's what a search request and response look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "query": "billing retries",
  "page": 0,
  "size": 10,
  "minScore": 0.6,
  "filters": { "category": "billing" }
} 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "page": 0,
  "size": 10,
  "totalElements": 3,
  "items": [
    {
      "id": 1,
      "title": "Payment Failure Handling Policy",
      "cosineDistance": 0.12,
      "cosineSimilarity": 0.88,
      "score": 0.94
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three score fields appear in every result.&lt;/p&gt;

&lt;p&gt;cosineDistance is the raw output from pgvector — lower means more similar.&lt;/p&gt;

&lt;p&gt;cosineSimilarity inverts that — higher means more similar.&lt;/p&gt;

&lt;p&gt;score normalises the result to a clean [0, 1] range and is the value your application should actually use.&lt;/p&gt;

&lt;p&gt;Set minScore: 0.7 in the request and only results with a score of 0.7 or above come back.&lt;/p&gt;

&lt;p&gt;The filters field narrows results to documents whose metadata matches specific values. In the example above, only documents tagged category: billing are searched. The filter keys are validated at the API boundary — malformed keys are rejected before they reach the database.&lt;/p&gt;

&lt;p&gt;The full source code is on GitHub — linked at the end of this article.&lt;/p&gt;

&lt;p&gt;System Architecture:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfjy4crpi1xs2yy3zp4x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfjy4crpi1xs2yy3zp4x.png" alt="System Architecture" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Tech Stack and Why
&lt;/h2&gt;

&lt;p&gt;The goal of this project wasn’t just to build semantic search — it was to build it using tools that many backend engineers already use in production.&lt;/p&gt;

&lt;p&gt;Instead of introducing a completely new ecosystem, the idea was to see how far we could push a familiar stack.&lt;/p&gt;

&lt;p&gt;Here’s what that stack looks like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spring Boot&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Spring Boot handles the infrastructure; dependency injection, validation, exception handling, configuration management — leaving the focus on business logic. Spring Boot 3 with Java 9+ also brings virtual threads via Project Loom, which is relevant for a service making frequent I/O calls to OpenAI.&lt;/p&gt;

&lt;p&gt;The honest reason for this choice over Quarkus or Micronaut is that Spring Boot is widely used in enterprise Java, and this service needs to be readable and maintainable by other Java developers. Familiarity is a legitimate engineering consideration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL stores the documents, metadata and timestamps. The vector storage is handled by the pgvector extension, covered next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pgvector&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The question worth addressing directly is why not a dedicated vector database like Pinecone, Weaviate, or Qdrant?&lt;br&gt;
For most production use cases, you don't need one.&lt;/p&gt;

&lt;p&gt;pgvector is a PostgreSQL extension that adds a VECTOR column type and a cosine distance operator &amp;lt;=&amp;gt;.&lt;/p&gt;

&lt;p&gt;It stores embeddings directly alongside relational data, in the same database, with the same ACID guarantees.&lt;/p&gt;

&lt;p&gt;A document and its embedding are written in a single transaction — no synchronisation between two systems, no eventual consistency to reason about.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE documents (
    id                   BIGSERIAL PRIMARY KEY,
    title                TEXT NOT NULL,
    content              TEXT NOT NULL,
    metadata             JSONB,
    embedding            VECTOR(1536),
    status               TEXT NOT NULL,
    embedding_error      TEXT,
    embedding_updated_at TIMESTAMPTZ
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One honest caveat is that pgvector works well at moderate scale, millions of documents. For billions of vectors with sub-millisecond latency requirements, a dedicated vector database makes more sense. But for the vast majority of production use cases, pgvector is the right starting point.&lt;/p&gt;

&lt;p&gt;Reach for a specialist tool only when you've proven you need it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Embeddings&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;text-embedding-3-small converts text to 1,536-dimensional vectors. It was chosen over ada-002 and text-embedding-3-large for the balance of quality, speed, and cost. It produces embeddings that are more than good enough for document search at a fraction of the cost of the larger model.&lt;/p&gt;

&lt;p&gt;More importantly, the OpenAI client is never imported directly into the service layer.&lt;/p&gt;

&lt;p&gt;The OpenAI client sits behind an EmbeddingClient interface — the provider can be swapped without touching the service layer. More on this in Part 6.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flyway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Finally, Flyway is used to manage database migrations. As the schema evolves, for example when introducing document status fields or metadata changes, Flyway ensures that database changes are applied consistently across environments.&lt;/p&gt;

&lt;p&gt;Using migrations also makes it easier for readers of this series to reproduce the database setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;

&lt;p&gt;The service is organised into four layers. Each layer has one job and communicates only with the layer directly below it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Controller&lt;/strong&gt; — the HTTP boundary.&lt;/p&gt;

&lt;p&gt;Receives requests, validates them with &lt;a class="mentioned-user" href="https://dev.to/valid"&gt;@valid&lt;/a&gt;, delegates to the service, and returns the correct status code. No business logic lives here. A GlobalExceptionHandler sits across all controllers and ensures every error response — whether a 400, 404, or 500 — returns the same structured JSON shape.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service&lt;/strong&gt; — where all decisions happen.&lt;/p&gt;

&lt;p&gt;DocumentServiceImpl orchestrates the repository and the embedding client. It controls the document lifecycle, every document is saved immediately with a PENDING status, then moves to READY once the embedding succeeds, or FAILED if OpenAI returns an error.&lt;/p&gt;

&lt;p&gt;A failed embedding is never silent — the error message is stored in the database and the document is excluded from all search results until it's resolved.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public CreateDocumentResponse create(CreateDocumentRequest request) {
    Document saved = saveAsPending(request);     // status = PENDING
    embedAndPersist(saved.getId(), ...);          // status → READY or FAILED
    return new CreateDocumentResponse(saved.getId(), DocumentStatus.READY);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Repository&lt;/strong&gt; — Spring Data JPA handles standard CRUD.&lt;/p&gt;

&lt;p&gt;JdbcTemplate handles vector operations. pgvector's &amp;lt;=&amp;gt; cosine distance operator and ::vector casting don't map to JPQL, so those queries are written in SQL directly. Two tools, two clearly defined responsibilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT id, title,
       (embedding &amp;lt;=&amp;gt; ?::vector) AS cosine_distance
FROM documents
WHERE status = 'READY'
ORDER BY cosine_distance ASC
LIMIT ?;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Embedding&lt;/strong&gt; — OpenAiEmbeddingClient sits behind an EmbeddingClient interface.&lt;/p&gt;

&lt;p&gt;Nothing else in the application imports the implementation directly. Swapping OpenAI for a local model means writing one new class — the service layer is untouched.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public interface EmbeddingClient {
    float[] embed(String text);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Architecture:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9djl8w69p5f6fwy4k3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9djl8w69p5f6fwy4k3r.png" alt="Architecture" width="800" height="818"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The full source code, including all migrations and tests, is available on GitHub: link&lt;/p&gt;

&lt;h2&gt;
  
  
  Series Roadmap
&lt;/h2&gt;

&lt;p&gt;This article covered the foundation — the problem semantic search solves, how embeddings work, and how the system is structured. The rest of the series builds out each layer in full.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 2 — The Database Layer :&lt;/strong&gt; All three Flyway migrations in detail. The documents table structure, the IVFFlat index configuration, the JSONB metadata design, and how the schema supports the document lifecycle from day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 3 — Calling the OpenAI Embeddings API in Java Without an SDK:&lt;/strong&gt; Building the HTTP request with plain java.net.http.HttpClient, parsing the response with Jackson, L2 normalisation, and the bugs worth knowing about before you write a single line.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 4 — The Full CRUD API and Service Layer:&lt;/strong&gt; The complete DocumentServiceImpl — create, read, update, delete, and search. The QueryBuilder inner class for safe dynamic SQL. The GlobalExceptionHandler for consistent error responses across the entire API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 5 — Testing Without a Real Database or API Key:&lt;/strong&gt; Mockito, MockMvc, H2 test profiles, and the specific JdbcTemplate varargs trap that catches most developers the first time — with the exact fix.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Part 6 — Lessons Learned:&lt;/strong&gt; 6 Bugs Found Before the Service Could Run: Six real bugs from this codebase — wrong API URL, missing annotation bracket, trailing comma in a JSON string, broken SQL subquery, silent double normalisation, and a RuntimeException returning 500 instead of 404. What each one taught me and how to avoid them.&lt;/p&gt;

&lt;p&gt;If you found this useful, the full source code is available on &lt;a href="https://github.com/buenas/-semantic-search-service" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and the next article in the series dives into the database layer and pgvector indexing.&lt;/p&gt;

&lt;p&gt;See you in &lt;a href="https://dev.to/oozioma/building-a-semantic-search-api-with-spring-boot-and-pgvector-part-2-designing-the-postgresql-2jlb"&gt;Part 2&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>java</category>
      <category>springboot</category>
      <category>semanticsearch</category>
      <category>postgressql</category>
    </item>
  </channel>
</rss>
