DEV Community

Cover image for Building Semantic Search That Actually Works: Beyond Basic Vector Similarity
Yash Dubey
Yash Dubey

Posted on

Building Semantic Search That Actually Works: Beyond Basic Vector Similarity

Most semantic search implementations are just fancy keyword matching. Here's how to build search that actually understands meaning and context.


The $10K Mistake Everyone Makes

I spent $10,000 and three months building what I thought was "semantic search." Users typed queries, got embedding vectors, found similar vectors, returned results. Technically correct, practically useless.

The problem? Semantic similarity ≠ Search relevance.

When users searched for "Tesla stock analysis," my system returned articles about:

  • Tesla car reviews (similar topic)
  • General stock market trends (similar words)
  • Elon Musk interviews (related entity)

But it completely missed the actual Tesla financial analysis articles because they used different vocabulary.

This is the semantic search trap that costs companies millions in wasted development time.

What Real Semantic Search Looks Like

After rebuilding our system from scratch at RapierCraft for UltraNews (now processing 15,000+ articles daily with 94% search satisfaction), here's what I learned:

Real semantic search needs three layers:

  1. Intent Understanding - What is the user actually trying to find?
  2. Context Awareness - What domain/timeframe/perspective matters?
  3. Relevance Scoring - How well does each result match the complete query context?

Layer 1: Intent Understanding

Traditional approach:

# Basic embedding similarity - what everyone does wrong
def search(query: str):
    query_embedding = embed(query)
    results = vector_db.similarity_search(query_embedding, top_k=10)
    return results
Enter fullscreen mode Exit fullscreen mode

Smart approach:

class IntentAnalyzer:
    def __init__(self):
        self.intent_classifier = self.load_intent_model()
        self.entity_extractor = EntityExtractor()
        self.temporal_analyzer = TemporalAnalyzer()

    async def analyze_query(self, query: str) -> QueryIntent:
        # Extract structured intent
        intent_type = await self.intent_classifier.classify(query)
        entities = await self.entity_extractor.extract(query)
        temporal_context = await self.temporal_analyzer.analyze(query)

        return QueryIntent(
            type=intent_type,  # analysis, news, comparison, summary
            entities=entities,  # companies, people, locations
            temporal_context=temporal_context,  # recent, historical, trending
            specificity_score=self.calculate_specificity(query),
            domain_hints=self.extract_domain_hints(query)
        )
Enter fullscreen mode Exit fullscreen mode

Layer 2: Context-Aware Embedding

Instead of generic embeddings, we create context-specific ones:

class ContextualEmbedding:
    def __init__(self):
        self.domain_models = {
            'finance': FinanceDomainModel(),
            'technology': TechDomainModel(),
            'politics': PoliticsDomainModel()
        }
        self.temporal_weights = TemporalWeighting()

    async def embed_with_context(self, text: str, context: QueryIntent) -> np.ndarray:
        # Select domain-specific model
        domain_model = self.domain_models.get(
            context.primary_domain, 
            self.default_model
        )

        # Generate base embedding
        base_embedding = await domain_model.embed(text)

        # Apply temporal weighting
        if context.temporal_context.is_time_sensitive:
            temporal_weight = self.temporal_weights.calculate_weight(
                text_timestamp=self.extract_timestamp(text),
                query_time_preference=context.temporal_context.preference
            )
            base_embedding = base_embedding * temporal_weight

        # Entity boost - increase relevance for matching entities
        entity_boost = self.calculate_entity_boost(text, context.entities)

        return base_embedding + entity_boost
Enter fullscreen mode Exit fullscreen mode

Layer 3: Multi-Signal Relevance Scoring

This is where most systems fail. They rely purely on vector similarity:

class RelevanceScorer:
    def __init__(self):
        self.signals = [
            SemanticSimilaritySignal(),
            EntityMatchingSignal(), 
            TemporalRelevanceSignal(),
            PopularitySignal(),
            QualitySignal(),
            UserContextSignal()
        ]

    async def score_result(self, document: Document, query_intent: QueryIntent, user_context: UserContext) -> float:
        signal_scores = {}

        for signal in self.signals:
            try:
                score = await signal.calculate_score(document, query_intent, user_context)
                weight = self.get_signal_weight(signal.name, query_intent.type)
                signal_scores[signal.name] = score * weight
            except Exception as e:
                # Graceful degradation - skip failed signals
                signal_scores[signal.name] = 0.0

        # Combine signals intelligently
        final_score = self.combine_signals(signal_scores, query_intent)

        return final_score

    def combine_signals(self, scores: Dict[str, float], intent: QueryIntent) -> float:
        # Dynamic weighting based on query type
        if intent.type == "factual_lookup":
            return (scores['entity_matching'] * 0.4 + 
                   scores['semantic_similarity'] * 0.3 + 
                   scores['quality'] * 0.3)

        elif intent.type == "trend_analysis":
            return (scores['temporal_relevance'] * 0.4 + 
                   scores['popularity'] * 0.3 + 
                   scores['semantic_similarity'] * 0.3)

        else:
            # Balanced approach for general queries
            return np.average(list(scores.values()))
Enter fullscreen mode Exit fullscreen mode

The Real-World Implementation

Here's our complete search pipeline:

class SemanticSearchEngine:
    def __init__(self):
        self.intent_analyzer = IntentAnalyzer()
        self.contextual_embedding = ContextualEmbedding()
        self.relevance_scorer = RelevanceScorer()
        self.result_diversifier = ResultDiversifier()

    async def search(self, query: str, user_context: UserContext) -> SearchResults:
        # Phase 1: Understand what user wants
        query_intent = await self.intent_analyzer.analyze_query(query)

        # Phase 2: Get contextually relevant candidates  
        query_embedding = await self.contextual_embedding.embed_with_context(
            query, query_intent
        )

        # Cast a wide net initially
        candidates = await self.vector_db.similarity_search(
            query_embedding, 
            top_k=100  # Get more candidates than needed
        )

        # Phase 3: Score each candidate with multiple signals
        scored_results = []
        for candidate in candidates:
            relevance_score = await self.relevance_scorer.score_result(
                candidate, query_intent, user_context
            )
            scored_results.append((candidate, relevance_score))

        # Phase 4: Re-rank and diversify
        ranked_results = sorted(scored_results, key=lambda x: x[1], reverse=True)
        diversified_results = await self.result_diversifier.diversify(
            ranked_results, query_intent
        )

        return SearchResults(
            results=diversified_results[:10],
            total_found=len(candidates),
            query_understanding=query_intent,
            search_time_ms=self.timer.elapsed()
        )
Enter fullscreen mode Exit fullscreen mode

Performance Results

Before (basic vector similarity):

  • User satisfaction: 67%
  • Click-through rate: 34%
  • Average session length: 2.3 minutes
  • Zero-result queries: 18%

After (multi-layer semantic search):

  • User satisfaction: 94%
  • Click-through rate: 78%
  • Average session length: 7.8 minutes
  • Zero-result queries: 3%

The Hidden Complexity: Entity Relationships

One breakthrough was understanding that search isn't just about documents—it's about entity relationships:

class EntityRelationshipGraph:
    def __init__(self):
        self.graph = nx.DiGraph()
        self.relationship_types = [
            'works_for', 'competes_with', 'supplies_to', 
            'influences', 'reports_on', 'similar_to'
        ]

    async def expand_query_entities(self, entities: List[Entity]) -> List[Entity]:
        expanded_entities = entities.copy()

        for entity in entities:
            # Find related entities within 2 degrees
            related = self.graph.neighbors(entity.id, depth=2)

            for related_entity in related:
                relationship_strength = self.calculate_relationship_strength(
                    entity, related_entity
                )

                if relationship_strength > 0.7:  # Strong relationship
                    expanded_entities.append(related_entity)

        return expanded_entities
Enter fullscreen mode Exit fullscreen mode

When someone searches for "Apple earnings," our system automatically includes results about:

  • Tim Cook (CEO relationship)
  • iPhone sales (product relationship)
  • Samsung (competitor relationship)
  • AAPL stock (ticker relationship)

Lessons Learned: What Not To Do

1. Don't Use Generic Embeddings for Everything

OpenAI's text-embedding-ada-002 is great, but it's not optimized for your specific domain. Train or fine-tune embeddings on your data.

2. Don't Ignore Temporal Context

News from 2020 about "remote work trends" has different relevance than news from 2024. Build time-awareness into your scoring.

3. Don't Forget About User Intent

"Python" could mean the programming language or the snake. Context matters more than similarity.

4. Don't Skip Result Diversification

Returning 10 very similar articles about the same event isn't helpful. Diversify by perspective, source, and sub-topic.

The Business Impact

This approach enabled UltraNews to:

  • Process 15,000+ articles daily with intelligent categorization
  • Maintain 94% user satisfaction with search results
  • Reduce support tickets about "can't find relevant content" by 89%
  • Enable complex queries like "show me contrarian views on Tesla's recent earnings"

What's Next: The AI Search Evolution

We're experimenting with:

  • Conversational search: Multi-turn queries that build context
  • Proactive suggestions: Anticipating what users want to search next
  • Cross-domain reasoning: Connecting insights across different topics
  • Real-time intent adaptation: Learning from user behavior within the session

Your Turn

How are you implementing semantic search? Are you stuck in the basic vector similarity trap? Share your challenges in the comments—I'd love to help troubleshoot.


Building intelligent search systems at scale is just one of many challenges we've tackled at **RapierCraft. If you're working on similar data-intensive problems, I'd love to connect and share more war stories.

Follow for more deep dives into building AI-powered systems that actually work in production.


Tags: #semanticsearch #ai #machinelearning #embeddings #search #nlp #python #vectordatabase #informationretrieval #datascience

Top comments (0)