DEV Community

Ayi NEDJIMI
Ayi NEDJIMI

Posted on

I built a search engine over 1,600+ cybersecurity articles — here's what I actually learned

A year ago I had a problem: 1,600+ cybersecurity articles spread across a Go backend, and a search bar that returned garbage.

The standard MySQL LIKE '%keyword%' approach was embarrassing. Searching "pentest Active Directory" returned articles that happened to contain the word "pentest" on one side and "Directory" somewhere else — totally unrelated content ranked first.

So I rebuilt it from scratch. Here's the honest version of what happened.


The stack I chose (and why)

My backend is Go Fiber. I needed something that:

  • Handled typos (users search "kerberosting" not "kerberoasting")
  • Returned results in < 50ms
  • Could be self-hosted (no SaaS dependency for a small site)
  • Had a decent Go client

I went with Meilisearch. Not because it's technically the best for every use case, but because it hit every point above and took 20 minutes to set up.

// Sync article index on startup
func SyncMeilisearch(client *meilisearch.Client, articles []Article) error {
    index := client.Index("articles")

    docs := make([]map[string]interface{}, len(articles))
    for i, a := range articles {
        docs[i] = map[string]interface{}{
            "id":          a.ID,
            "title":       a.Title,
            "slug":        a.Slug,
            "excerpt":     a.Excerpt,
            "category":    a.Category,
            "tags":        a.Tags,
            "published_at": a.PublishedAt,
        }
    }

    _, err := index.AddDocuments(docs)
    return err
}
Enter fullscreen mode Exit fullscreen mode

The index auto-syncs on startup and updates via CRUD hooks — so every time an article is created, updated or deleted, Meilisearch stays in sync.


What surprised me: the content problem

After the first week, I had a painful realization: search quality is mostly a content problem, not a tooling problem.

Meilisearch was doing its job. But my articles had inconsistent metadata. Some had rich excerpts, others had none. Tags were applied loosely. Category assignments were sometimes wrong.

Three things I fixed that made the biggest difference:

1. Enforce excerpt quality at write time

I added validation that rejects articles without a proper excerpt (minimum length, no boilerplate phrases). This is boring to implement and nobody wants to do it. Do it anyway.

2. Category filtering beats keyword search

For a domain-specific corpus, letting users pre-filter by category (news / guide / analysis / checklist) reduces the search space dramatically. Precision goes up even when relevance ranking isn't perfect.

GET /api/search?q=kerberoasting&cat=guide&limit=10
Enter fullscreen mode Exit fullscreen mode

3. Fallback matters

Meilisearch goes down. Rarely, but it does. I added a MySQL LIKE fallback that kicks in automatically:

results, err := SearchMeilisearch(query, filters)
if err != nil || len(results) == 0 {
    results, err = SearchMySQL(query, filters) // fallback
}
Enter fullscreen mode Exit fullscreen mode

Users never noticed the degradation. That's the goal.


The retrieval part: what "RAG" actually means at this scale

I see a lot of articles about building RAG systems with vector embeddings, chunking strategies, cosine similarity, etc. That's the right approach when your questions are complex and open-ended.

For a domain-specific article corpus with structured metadata, it's overkill. What I actually needed was:

  • Fast keyword + semantic-ish retrieval (Meilisearch handles this with its ranking rules)
  • A way to surface the right article given a user query
  • Context injection into LLM prompts when generating summaries or related content

The architecture ended up being:

User query
    → Meilisearch (retrieval, ~10-30ms)
    → Top 3-5 articles (slug + title + excerpt)
    → LLM prompt context
    → Generated response / enriched content
Enter fullscreen mode Exit fullscreen mode

No vector DB. No embeddings pipeline. No chunking headaches. For 1,600 articles averaging 2,000 words each, this works well.


Honest numbers

Metric Before After
Avg search latency 340ms (MySQL LIKE) 28ms (Meilisearch)
Typo tolerance None Handles 1-2 char errors
Multi-word queries Poor Good
Index size N/A ~12MB
Setup time ~2 hours total

The 12MB index for 1,600+ articles is worth emphasizing — Meilisearch is lean.


What I'd do differently

1. Index full content, not just excerpts

I indexed titles, slugs, excerpts and tags — but not the full article body. This means searching for a technical term that appears deep in an article content returns nothing. I'm fixing this progressively.

2. Add synonyms from day one

Meilisearch has a synonyms API. I should have built a synonyms list for cybersecurity terminology immediately:

{
  "AD": ["Active Directory"],
  "pentest": ["penetration test", "intrusion test"],
  "MFA": ["multi-factor authentication", "2FA"]
}
Enter fullscreen mode Exit fullscreen mode

I added these late, after noticing obvious query misses.

3. Log every failed search

The most valuable dataset I have is the list of searches that returned zero results. It tells you exactly what content you're missing and what synonyms to add. I started logging these to a search_misses table — should have done it from the start.


The takeaway

If you're building a content-heavy site and want good search without a massive infrastructure investment:

  1. Meilisearch is genuinely good and genuinely easy
  2. Content quality beats algorithmic cleverness every time
  3. For domain-specific retrieval, you don't need vector embeddings unless your queries are conversational/open-ended
  4. Log your zero-result searches — it's free product research

The full search endpoint with category/difficulty/type filters, pagination and Meilisearch/MySQL fallback is about 80 lines of Go. Happy to share if useful.


I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. The corpus covers pentesting, Active Directory, cloud security and compliance — including 17 free security hardening checklists (PDF + Excel).

Top comments (0)