Jozef Môstka

Posted on Apr 7

How I turned hundreds of thousands of "dumb" SVG icons into a semantic search engine in 7 languages under 20ms (using LLM and Meilisearch)

#webdev #symfony #react #ux

Every frontend and full-stack developer knows this pain: You're building a UI, you need an icon for "settings", and you type settings into the library's search bar. The result? 0 results. Why? Because the library author named that icon heroicons-outline-cog.

Searching for icons without semantics is like looking for a life partner and the search engine offers you an e-shop with a lifetime warranty on refrigerators.

It frustrated me so much that I decided to build ycon.cc ? a tool that aggregates hundreds of open-source libraries and actually understands what you're looking for. In this article, I'll show you the technical background of how I enriched a massive icon dataset with semantics using AI and how I forced the whole thing to run under 20 milliseconds thanks to Meilisearch.

1. The Problem: Great data, zero context

When designing the architecture, I didn't want to reinvent the wheel and write my own scrapers for every icon library (Tabler, Heroicons, Material Design, etc.). Instead, I took advantage of the amazing open-source project iconify/json.

If you don't know it, it's a gigantic collection of validated, cleaned, and unified open-source icons in a single standardized JSON format. Suddenly, I had nearly 327,000 icons at my disposal without the effort of parsing SVG files.

The structure Iconify provided was clean and functional:

{
  "prefix": "tabler",
  "icons": {
    "car": {
      "body": "<g fill=\"none\" ... />",
      "width": 24,
      "height": 24
    }
  }
}

However, there was one huge catch. This data is built for rendering, not for searching. For a classic search engine, it's a nightmare. If a user searches for the word "vozidlo" or "auto" in Slovak (or Spanish), the system fails. Tagging such a large number of icons manually would take me about three lifetimes.

This is where AI enters the scene.

2. AI Magic: Semantic enrichment

I decided to use GPT-5-nano to breathe semantic life into each icon. The task was clear: look at the icon's name and generate the most accurate synonyms in English.

Here is the prompt that, after many iterations, worked best:

**System / Context:**
    You are an expert UX Copywriter and Linguist specializing in search engine optimization for UI icons. Your goal is to generate highly relevant search keywords (synonyms, related actions, and concepts) for a given list of UI icons.
**Instructions:**
    1. I will provide you with a JSON array of icons. Each icon has an `id`, a `clean_name`, and a `category`.
    2. For each icon, generate a maximum of 6 highly relevant English keywords (`en`) .
    3. Think about **what users would type into a search bar** to find this icon. Include both the physical object (e.g., "magnifying glass") and the associated action/concept (e.g., "search", "find", "zoom").
    4. Do NOT include the original `clean_name` in the tags (I already have it).
    5. Keep keywords short (1-2 words max per keyword). All lowercase.

To save time and money, I didn't send icons to the API one by one, but in batches of 25. This entire process for the full dataset cost me roughly $10 and ran in the background for about 6 hours.

The resulting JSON document, which I saved and prepared for indexing, suddenly looked like this:

{
  "id": "tabler:car",
  "name": "car",
  "keywords": ["vehicle", "auto", "transport", "drive", "machine"],
  "svg_code": "<svg ... />"
}

We had the data. Now it needed to be searchable quickly.

3. Speed: Meilisearch and why I stored SVG in it

As the search engine, I chose Meilisearch (written in Rust). It's built exactly for "typo-tolerance" and lightning-fast responses.

Originally, I pulled the SVG codes of the icons directly from the SQL database when rendering the grid. However, this turned out to be a bottleneck ? with 100 icons per page, it meant either 100 small SELECTs or one large join, which took hundreds of milliseconds with hundreds of thousands of records.

I therefore decided on a radical step: Store the SVG code (in the body attribute) directly in Meilisearch.

While this eliminated the SQL database from the search process, I ran into a new problem: Over-fetching. Meilisearch, by default, returns all attributes in the response. With a pagination of 100 icons, Meilisearch was sending me not only 100 SVG strings but also thousands of generated synonyms (6 words � 7 languages � 100 icons). PHP had to download this gigantic JSON over the network and deserialize it, which again drove latency up.

The solution? A surgical cut via attributesToRetrieve.

Synonyms (keywords) only serve for Meilisearch to find the icon. The frontend doesn't need to see them! In IconSearchService, I modified the search parameters as follows:

$searchParams = [
    'hitsPerPage'          => $query->limit,
    'page'                 => $page,
    'attributesToSearchOn' => ['name', 'clean_name', "keywords.$locale"],
    'attributesToRetrieve' => ['id', 'name', 'body', 'width', 'height'], // We pull only what we need!
];

$result = $index->search($query->query ?: null, $searchParams);

Because I'm pulling the body (SVG code) directly from Meilisearch, I don't need any database query for each icon. At the same time, I excluded the keywords fields, which would unnecessarily bloat the transferred data.

The result? Response time dropped to a stable 15-20 milliseconds.

4. Multilingualism for free: When even "cheap" AI is expensive

Once I had the basic English dataset ready, I thought of another improvement: Why limit it to English? I wanted ycon.cc to be a global tool and support the most well-known world languages.

When calculating the costs for the OpenAI API, I realized that translating hundreds of thousands of icons into 6 more languages would burn money unnecessarily given the massive number of requests. So I started looking for a way to solve it locally, "in my living room".

I chose LibreTranslate ? an open-source translation engine that I ran in Docker directly on my computer. No API keys, no monthly limits, no fees for every token.

To ensure translations were as accurate as possible, I didn't use isolated words in LibreTranslateService, but joined them into small units using implode(', ', $keywords). This gives the translator the necessary context, and the results are much more natural than if I translated each word individually.

I built the entire process on asynchronous processing via Symfony Messenger. The TranslateIconGroupHandler handler gradually took batches of icons and translated them into all activated languages.

// TranslateIconGroupHandler.php - The heart of translations
public function __invoke(TranslateIconGroupMessage $message): void
{
    // ... loading locales from DB ...
    foreach ($localeMap as $localeId => $localeCode) {
        // Idempotency: if we already have the translation, we skip
        if ($this->translationExists($message->iconGroupId, $localeId)) continue;

        // Contextual keyword translation (joined by comma)
        $translated = $this->libreTranslate->translateContextual($message->keywords, $localeCode);

        $translation = new IconGroupTranslation();
        $translation->setTranslatedKeywords($translated);
        $this->entityManager->persist($translation);
    }
    $this->entityManager->flush();
}

As a result, I had support for 7 languages in the system completely for free. The search engine thus understands not only the term "car", but also "auto", "vehicle", or "coche".

5. Developer Experience: Copy-pasting is gone too

Now that we have perfect and fast searching filled with data from iconify, the next step followed. Developers hate manually converting SVG files into components.

In ycon.cc, I therefore implemented the Strategy design pattern, which immediately transforms SVG code (with support for Iconify standards) into the format the developer currently needs (Tailwind classes, Vue/React components, or Symfony UX).


#[AutoconfigureTag('app.icon_code_generator')]
final readonly class SymfonyUxGenerator implements IconCodeGeneratorInterface
{
    public function generate(Icon $icon, ?string $alias = null): iterable
    {
        $prefix = $icon->getIconSet()?->getPrefix() ?? 'unknown';
        $originalName = $icon->getName();
        $iconName = sprintf('%s:%s', $prefix, $originalName);

        $renderName = $alias ?? $iconName;

        if ($alias !== null) {
            yield "Import Icon" => sprintf('<code>php bin/console ux:icon:import %s --as=%s</code>', $iconName, $alias);
        } else {
            yield "Import Icon" => sprintf('<code>php bin/console ux:icon:import %s</code>', $iconName);
        }

        yield "Twig Component" => "<code>".htmlspecialchars(sprintf('<twig:ux:icon name="%s" />', $renderName))."</code>";
        yield "Render Icon" => sprintf('<code>{{ ux_icon(\'%s\') }}</code>', $renderName);
    }
}

Just pick a framework and with one click, you have the code ready in your clipboard.

Conclusion

Transforming a gigantic repository like iconify/json into a fully semantic search tool was exactly the technical adventure why I love programming. The combination of LLM for data-enrichment and Meilisearch for lightning-fast querying is a combo I can only recommend.

If you are currently building a website or application and are tired of remembering exact technical names for icons, I have launched a beta version at ycon.cc.

Try entering a context into the search (e.g., "mute sound" or "add to cart") and let me know in the comments if it found what you expected. I greatly appreciate every piece of feedback (even critical).

Top comments (1)

Svg/icons • May 22

Really interesting approach. Icon search is a great use case for semantic search because developers often know the meaning they want to express, but not the exact keyword to type.

For example, an icon for "secure access", "empty state", or "team management" may not match a single obvious term. Intent-based search can make icon discovery feel much more natural than browsing categories or guessing keywords.

This kind of workflow could become especially useful for UI builders, design systems, and developer tools.