Etop - Essien Emmanuella Ubokabasi

Posted on Dec 24

How I Built a Search Engine for my YouTube Channel using Elasticsearch Serverless

#elasticsearch #tutorial #devrel #nextjs

As my YouTube channel, Ubcodes, continues to grow, I’ve realized that finding specific technical tutorials in a sea of videos can be a challenge for my subscribers. Whether they are looking for a deep dive into GraphQL or a React Native crash course, the standard search experience doesn't always cut it.

I decided to build a solution: a dedicated YouTube Search Library powered by Elasticsearch Serverless.

View the Live Demo here | Explore the GitHub Repo

The Problem: Why Basic Search Wasn't Enough

When you have a growing library of technical content, simple client-side filtering falls short. Users might:

Misspell technical terms ("Nxtjs" instead of "Next.js")
Search for concepts that appear in descriptions, not just titles
Need to find videos by tags or related topics

I wanted to move beyond basic filtering and implement a professional Search AI experience that rivals what you'd find on major platforms.

The Goal: Three Core Requirements

My requirements were simple but high-impact:

Fuzzy Search: If a user types "Nxtjs" instead of "Next.js," they should still find the right video.
Hit Highlighting: Search terms should "glow" in the results so users immediately see why a video was recommended.
Speed: Results should appear almost instantly as the user types (search-as-you-type with debouncing).

The Tech Stack

Frontend: Next.js 14 (App Router) with TypeScript
Styling: Tailwind CSS for a modern, responsive UI
Search Engine: Elasticsearch Serverless (Elastic Cloud)
Client Library: @elastic/elasticsearch for Node.js
Icons: Lucide React
Deployment: Vercel

Architecture Overview

The application follows a clean separation of concerns:

┌─────────────────┐
│   Next.js UI    │  (React Components + Tailwind)
│  (app/page.tsx) │
└────────┬────────┘
         │ HTTP GET /api/search?q=...
         ▼
┌─────────────────┐
│  API Route      │  (app/api/search/route.ts)
│  Elasticsearch  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Elasticsearch   │
│   Serverless    │
│   (Cloud)       │
└─────────────────┘

Step-by-Step Implementation

1. Data Structure & Schema Design

I started by structuring my video metadata into a clean JSON format. Each entry includes all the fields needed for rich search:

{
  "id": "1",
  "title": "Build and Deploy a blog with Subscription & Comments",
  "description": "In this tutorial, we'll build and deploy...",
  "thumbnail_url": "https://...",
  "video_url": "https://youtu.be/...",
  "tags": ["nextjs", "react", "tutorial"],
  "published_date": "2024-03-31"
}

2. Elasticsearch Index Mapping

The key to powerful search is proper field mapping. I created an index with:

Text fields (title, description, tags) with the standard analyzer for full-text search
Keyword fields for exact matches (useful for filtering)
Date field for temporal queries
Multi-field mapping on title and tags to support both text search and exact matching

Here's the mapping configuration:

mappings: {
  properties: {
    id: { type: 'keyword' },
    title: {
      type: 'text',
      analyzer: 'standard',
      fields: {
        keyword: { type: 'keyword' }  // For exact matches
      }
    },
    description: { type: 'text', analyzer: 'standard' },
    tags: {
      type: 'text',
      analyzer: 'standard',
      fields: { keyword: { type: 'keyword' } }
    },
    published_date: { type: 'date' }
  }
}

3. The Search API: Where the Magic Happens

The heart of the application is the API route (app/api/search/route.ts). Instead of a simple term match, I implemented a sophisticated multi_match query with field boosting:

const response = await client.search({
  index: 'youtube-videos',
  body: {
    query: {
      multi_match: {
        query: searchTerm,
        fields: ['title^3', 'description^2', 'tags^2'],
        fuzziness: 'AUTO',
        type: 'best_fields'
      }
    },
    highlight: {
      fields: {
        title: {},
        description: {},
        tags: {}
      },
      fragment_size: 150,
      number_of_fragments: 2
    },
    size: 20
  }
});

Key features:

Field Boosting: title^3 means title matches are 3x more important than description/tags
Fuzziness: 'AUTO': Automatically handles typos and variations
Highlighting: Returns fragments with <em> tags around matching terms
Best Fields: Uses the highest-scoring field match

4. Frontend: Search-as-You-Type Experience

The frontend implements several UX best practices:

Debouncing for Performance

To avoid hitting the API on every keystroke, I implemented a 300ms debounce:

useEffect(() => {
  const timer = setTimeout(() => {
    setDebouncedQuery(query);
  }, 300);
  return () => clearTimeout(timer);
}, [query]);

Highlight Rendering

Elasticsearch returns highlights wrapped in <em> tags. I created a custom function to convert these to styled <mark> elements:

const highlightText = (text: string, highlights?: string[]): React.ReactNode => {
  if (!highlights || highlights.length === 0) {
    return text;
  }

  const highlighted = highlights[0];
  const parts = highlighted.split(/(<em>.*?<\/em>)/g);

  return parts.map((part, index) => {
    if (part.startsWith('<em>') && part.endsWith('</em>')) {
      const text = part.replace(/<\/?em>/g, '');
      return (
        <mark key={index} className="bg-yellow-200 dark:bg-yellow-800 px-1 rounded">
          {text}
        </mark>
      );
    }
    return part;
  });
};

Loading & Empty States

I added skeleton loaders and empty state messages to provide visual feedback:

Loading: Animated skeleton cards while fetching
Empty Results: Friendly message with suggestions
Initial State: Invitation to start searching

5. Data Indexing Script

I created a standalone Node.js script (index-data.ts) that:

Validates environment variables before connecting
Deletes existing index for clean re-indexing
Creates index with mappings (serverless-compatible)
Bulk indexes videos with error handling
Normalizes dates automatically (handles formats like "2023-10-7" → "2023-10-07")

The script handles Elasticsearch Serverless constraints (no manual shard/replica configuration) and provides clear console feedback.

Challenges & Solutions

Challenge 1: Serverless Configuration

Problem: Elasticsearch Serverless doesn't allow manual shard/replica settings.

Solution: Conditional configuration based on connection method:

// Only add settings for non-serverless deployments
if (process.env.ELASTIC_CLOUD_ID && !process.env.ELASTIC_ENDPOINT) {
  indexBody.settings = {
    number_of_shards: 1,
    number_of_replicas: 0
  };
}

Challenge 2: Date Format Validation

Problem: Inconsistent date formats in JSON data caused indexing errors.

Solution: Automatic date normalization in the indexing script:

function normalizeDate(dateString: string): string {
  const datePattern = /^(\d{4})-(\d{1,2})-(\d{1,2})$/;
  const match = dateString.match(datePattern);
  if (match) {
    const [, year, month, day] = match;
    return `${year}-${month.padStart(2, '0')}-${day.padStart(2, '0')}`;
  }
  return dateString;
}

Challenge 3: Environment Variable Loading

Problem: Scripts needed to load .env.local explicitly.

Solution: Explicit path resolution for environment files:

const envPath = path.join(process.cwd(), '.env.local');
dotenv.config({ path: envPath });
dotenv.config(); // Fallback to .env

Results & Performance

The implementation delivers:

Sub-100ms search latency (after debounce)
Fuzzy matching handles common typos automatically
Visual highlighting makes results immediately clear
Responsive design works seamlessly on mobile and desktop
Scalable architecture ready to handle thousands of videos

What I Learned

Building this project end-to-end reinforced how much Search AI can improve the Developer Experience (DX). It's not just about finding data; it's about the speed and relevance of the discovery process.

Key takeaways:

Elasticsearch Serverless removes infrastructure complexity while providing enterprise-grade search
Field boosting is crucial for relevance—titles should matter more than descriptions
Highlighting transforms search from functional to delightful
Debouncing is essential for search-as-you-type without overwhelming the API
Type safety with TypeScript caught many potential runtime errors early

Even with a small initial dataset, the scalability of Elasticsearch Serverless means this library can grow alongside my channel.

Conclusion

This project demonstrates that you don't need a massive engineering team to build professional search experiences. With modern tools like Next.js 14, Elasticsearch Serverless, and Vercel, a single developer can create search functionality that rivals major platforms.

Check out the GitHub repository to see the full implementation.

Want to build something similar? The repository includes:

Complete TypeScript implementation
Environment variable setup guide
Indexing scripts with error handling
Responsive UI components
Deployment configuration

Feel free to fork or star!

DEV Community