DEV Community

Cover image for One vector space for photos and words: Bedrock Titan multimodal on Aurora
Jared Lewis
Jared Lewis

Posted on

One vector space for photos and words: Bedrock Titan multimodal on Aurora

In my last post I described OpinLog — a cross-user review graph where your "burger" and my "burger" resolve to the same canonical item via pgvector on Amazon Aurora PostgreSQL. This post is about the piece that makes the matching feel like magic: multimodal embeddings from Amazon Bedrock, feeding the matcher inside Aurora, all deployed on Vercel.

A photo of a burger should match the word "burger"

Users log items two ways: by typing a name, or by snapping a photo. If text and images lived in different vector spaces, I'd need two matchers and a fusion step. Instead I used Amazon Titan Multimodal Embeddings G1 (amazon.titan-embed-image-v1), which maps both text and images into the same 1024-dimensional space. One model, one index, one query.

The entire embedding contract is tiny:

export async function embed({ text, imageBase64 }: EmbedInput): Promise<number[]> {
  const body: Record<string, unknown> = {
    embeddingConfig: { outputEmbeddingLength: 1024 },
  }
  if (text) body.inputText = text.slice(0, 8000)
  if (imageBase64) body.inputImage = imageBase64

  const res = await getClient().send(new InvokeModelCommand({
    modelId: env.BEDROCK_EMBEDDING_MODEL_ID,
    contentType: "application/json",
    accept: "application/json",
    body: JSON.stringify(body),
  }))

  const parsed = JSON.parse(new TextDecoder().decode(res.body))
  return parsed.embedding
}
Enter fullscreen mode Exit fullscreen mode

Pass text, imageBase64, or both. Because a photo of an In-N-Out burger lands near the text "In-N-Out Double-Double," a user who uploads a picture matches a user who typed the name — with zero special-casing. That single design choice is what makes the "add by photo → it already knows what this is" demo moment work.

A couple of production notes:

  • The Bedrock client uses the default AWS credential chain when explicit keys are absent, so the same code runs locally (SSO) and on Vercel (IAM user keys in env).
  • I store the returned array straight into a pgvector column via a literal helper — `[${vec.join(",")}]` — cast to ::vector inside a raw sql template.

From embedding to match

On every add, I embed the new item and run an HNSW cosine ANN search over the canonical catalog:

SELECT id, name, photo_url, rating_avg, rating_count,
       1 - (embedding <=> :q) AS similarity
FROM canonical_items
WHERE embedding IS NOT NULL
ORDER BY embedding <=> :q
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

<=> is pgvector's cosine distance. A high-confidence hit auto-suggests as the default ("we think this is it"); otherwise the user sees the top candidates plus a "None of these — create new" escape hatch. The match is non-blocking: the item saves instantly with canonical_item_id = NULL, and the link is filled in after.

Tuning the floor: where "search" beats "vibes"

Raw vector search will happily return your entire catalog, ranked by ever-fainter similarity. I measured where Titan text embeddings actually sit:

  • ~0.3 cosine for unrelated text
  • ~0.5–0.6 for genuinely on-topic text

…and gated results at 1 - distance > 0.45. Below the floor, results are dropped. It's a single constant, but it's the line between "found what you meant" and "here's everything, sorted by a coin flip." For the matcher's auto-suggest threshold I set the bar much higher, so we only pre-select a match when we're confident, and otherwise let the human choose.

The catalog sharpens itself

Each canonical item's embedding is the running centroid of its linked members. When your burger log links to mine, the canonical vector becomes the average of both. The more people log the same thing, the more representative that vector gets — and the more reliably the next person's log (text or photo) snaps to it. The embedding model does the understanding; Aurora does the remembering and the averaging, in the same transaction that updates the denormalized rating aggregates.

Why this lives on Aurora

The qualifying AWS databases were DynamoDB, Aurora DSQL, and Aurora PostgreSQL. Only Aurora PostgreSQL ships pgvector, so the embedding produced by Bedrock can be indexed (HNSW) and queried (<=>) right next to the relational data — the ratings, the contributors, the lists. Embedding meaning (Bedrock) and storing/serving it transactionally (Aurora pgvector) is the whole engine of the app, and keeping them one JOIN apart made it a weekend project instead of a distributed-systems one.

Stack: Next.js 16 on VercelAmazon Aurora PostgreSQL Serverless v2 (pgvector) → Amazon Bedrock (Titan multimodal). Photos live in S3; the matcher is one line of SQL.


Built for the H0 Hackathon ("Hack the Zero Stack with Vercel and AWS Databases"). I created this content for the purposes of entering this hackathon. #H0Hackathon

Top comments (0)