Introducing ShapedQL: The SQL Engine for Search, Feeds, and AI Agents

#vectordatabase #ai #database #machinelearning

At Shaped, we believe that while retrieval (finding 1,000 items) is largely a solved problem, relevance (finding the best 10) is still an infrastructure nightmare.

Today, we are officially launching ShapedQL, a declarative SQL language and real-time engine designed to collapse the entire ranking and retrieval stack into a single query.

The Problem: The "Frankenstein" Stack

Most engineering teams today are forced to maintain what we call a "Frankenstein stack."

To build a high-quality "For You" feed, a personalized search bar, or an AI agent with long-term memory, you typically have to glue together a dozen fragmented tools:

A Vector Database (like Pinecone) for semantic retrieval.
A Search Engine (like Elasticsearch) for keyword matching.
A Feature Store (like Redis) to hold user session data.
Thousands of lines of Python "spaghetti code" to handle business logic, filtering, and re-ranking.

The result is a "house of cards." It’s stateless, slow to iterate on, and impossible to debug. When a user asks, "Why was this item ranked first?" engineers usually don’t have an answer.

The Solution: From Documents to Decisions

ShapedQL was built to move the industry from document retrieval to real-time decisions.

Unlike traditional search engines that are stateless by design, ShapedQL treats "User Context" as a first-class citizen. It doesn’t just look for items that are similar to a query; it finds items that a specific user is most likely to engage with right now.

We’ve collapsed the entire relevance lifecycle into a 4-stage pipeline that you can define in a single SQL query:

Retrieve: Fetch candidates from multiple sources (Hybrid Search, Social Graphs, or Trending lists).
Filter: Apply hard business constraints (e.g., "only show items in stock and under $200").
Score: Rank results using real-time machine learning models optimized for your business goals (Clicks, Conversions, or Watch Time).
Reorder: Optimize the final list for Diversity and Exploration, ensuring the user experience stays fresh and avoids repetition.

ShapedQL in Action

Here is what a modern discovery feed looks like in ShapedQL. This replaces roughly 2,000 lines of traditional backend infrastructure:

SELECT video_id, creator_name
FROM 
  trending_videos(),                   -- 1. Global popularity
  following_network("$user_id")        -- 2. Social graph
WHERE 
  NOT previously_watched("$user_id")   -- 3. Stateful filtering
ORDER BY 
  p_watch_time(user, item)             -- 4. ML-powered scoring
REORDER BY 
  diversity(creator_name)              -- 5. List-wise optimization

More than just a Query Language

ShapedQL isn't just a syntax; it’s an end-to-end platform that automates the heavy lifting of data engineering and MLOps:

Real-Time Connectors: Sync data from Snowflake, BigQuery, Kafka, or Segment in milliseconds.
Generative Enrichment: Use Gemini-powered LLMs to automatically tag images, clean messy product descriptions, and normalize data on the fly.
Automated MLOps: Shaped continuously trains and fine-tunes your ranking models based on live user behavior, so you never have to manage a training pipeline again.

Real-World Impact

We’ve already seen the power of this approach with our customers. By migrating their legacy search infrastructure to Shaped, one customer was able to replace a massive, 3000 elastic search codebase of rules with a 30 line ShapedQL query.

The result? An 11% lift in search conversions and a 10x increase in experimentation velocity. They can now test new ranking theories in minutes, not weeks.