Evan Lin for Google Developer Experts

Posted on Jun 22 • Originally published at evanlin.com on Jun 22

[Gemini API in Action] Building MemeFinder: A Native Mac Menu Bar Widget for Finding Memes via Text Using Gemini Vision & Semantic Embeddings

#ai #api #gemini #python

The Origin: Mid-Conversation, Where on Earth Is That Meme?

Anyone who chats a lot has a folder full of memes on their phone and computer, but the moment you actually need one — the conversation is rolling, you want to drop a "thanks but no thanks" or an "I'm trash" reaction — you can't find it. The filename is IMG_4821.jpg, the photo library has no categories, and search is a non-starter.

I first came across a wonderful open-source project, ShiQu1218/MemeTalk. It builds a local meme semantic-search system with Python + Streamlit + SQLite: it scans your local meme folder, indexes images with OCR and vector embeddings, then does multi-route retrieval. Feature-complete, but research-oriented and requires opening a browser to run Streamlit.

What I wanted was something closer to an "everyday handy tool":

A native Mac app, one search box. I type what I'm looking for and the relevant meme pops up. Click it and it's copied straight to the clipboard.

So MemeFinder was born. This post records its journey from zero to "menu-bar resident + global hotkey," and several representative pitfalls along the way.

System Design and Architecture

The core concept is simple: point at a local meme folder → have Gemini build an index for each image → type to do a semantic search → click to copy.

I made three key technical decisions:

Native SwiftUI app, not Electron. Copying images to the clipboard, global hotkeys, menu-bar residency — with AppKit these are all first-class citizens.
Gemini does two things: the vision model gemini-3-flash-preview reads the text in each image and generates a Traditional Chinese description plus emotion tags; gemini-embedding-2 turns that semantics into a 768-dimensional vector.
Hybrid semantic-vector + keyword search. Pure keyword recall for Chinese is too poor; only semantic vectors achieve "type a related description and find the image."

System Architecture Flow

The project is deliberately split into two Swift Package targets:

Target	Type	Contents
`MemeFinder`	library	Logic, models, services, ViewModels (all unit-tested)
`MemeFinderApp`	executable	SwiftUI views + menu-bar shell (thin layer, depends on the library)

This split isn't decorative — it directly determines whether the tests can run smoothly, as "Pitfall #2" will explain.

Core Implementation

1. Auto-tagging memes with the Gemini vision model

During indexing, each image is sent to the vision model with a request to output only JSON: the text in the image, a Traditional Chinese description, tags, and emotion. responseMimeType is set to application/json to keep the output format stable:

public static func annotateRequest(apiKey: String, imageData: Data, mimeType: String) -> URLRequest {
    let prompt = """
    你是迷因圖標註助手。請閱讀這張圖，輸出 JSON，欄位：
    ocr_text(圖中所有文字), description(用繁體中文描述畫面與梗),
    tags(3-8 個繁體中文關鍵字陣列), emotion(單一情緒詞)。只輸出 JSON。
    """
    let body: [String: Any] = [
        "contents": [[
            "parts": [
                ["text": prompt],
                ["inline_data": ["mime_type": mimeType, "data": imageData.base64EncodedString()]]
            ]
        ]],
        "generationConfig": ["responseMimeType": "application/json"]
    ]
    // ... set URL, x-goog-api-key header, POST body
}

2. Hybrid semantic + keyword ranking

After the query string is embedded into a vector, we compute cosine similarity for every image, then add weight for keywords that hit the OCR text and tags, and merge-sort:

public func search(queryEmbedding: [Float], queryText: String,
                   in images: [IndexedImage], limit: Int) -> [SearchResult] {
    let tokens = queryText.lowercased().split(whereSeparator: { $0.isWhitespace }).map(String.init)
    let results: [SearchResult] = images.compactMap { image in
        let cos = cosineSimilarity(queryEmbedding, image.embedding)
        let haystack = (image.ocrText + " " + image.tags.joined(separator: " ")).lowercased()
        let matches = tokens.filter { !$0.isEmpty && haystack.contains($0) }.count
        let boost = 0.1 * Float(min(matches, 3))   // keyword boost capped at 0.3
        let score = cos + boost
        return score > 0 ? SearchResult(image: image, score: score) : nil
    }
    return Array(results.sorted { $0.score > $1.score }.prefix(limit))
}

The whole search engine is a pure function, with Gemini hidden behind a protocol, so this logic can be fully unit-tested offline without hitting the real API.

Major Pitfalls and Solutions

The real time sink in this project was never the happy path — it was the pitfalls below.

Pitfall #1: The mysterious `GeminiError error 0` — indexing and search both fail

App packaged, key set, folder chosen, hit search — and nothing shows below, just GeminiError error 0.

Rather than guessing, I hit the embedding endpoint once with a real key and printed the response:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-2:embedContent" \
  -H "x-goog-api-key: $KEY" \
  -d '{"content":{"parts":[{"text":"貓"}]},"output_dimensionality":768}'

The evidence was unmistakable:

{ "embedding": { "values": [ -0.0063, -0.0200, ... ] } }

The problem: my parser was reading the plural embeddings[0].values (that's the batchEmbedContents batch-endpoint format), but the single embedContent call returns the singular embedding.values. So every embed call failed — indexing each image failed, embedding the query string failed — all throwing badResponse (shown in the UI as GeminiError error 0).

[Solution]
Fix the parser to read the singular embedding.values, keeping the plural format as a fallback; I also hardened the annotation parser (a thinking model sometimes returns a textless "thought" part first, so skip to the first part that actually has text):

public static func embedding(fromEmbedContent data: Data) throws -> [Float] {
    guard let root = try? JSONSerialization.jsonObject(with: data) as? [String: Any] else {
        throw GeminiError.badResponse("cannot parse embedContent payload")
    }
    // A single embedContent returns {"embedding":{"values":[...]}}
    if let embedding = root["embedding"] as? [String: Any],
       let values = embedding["values"] as? [Double] {
        return values.map(Float.init)
    }
    // batchEmbedContents is {"embeddings":[{"values":[...]}]} — tolerate it too
    if let embeddings = root["embeddings"] as? [[String: Any]],
       let values = embeddings.first?["values"] as? [Double] {
        return values.map(Float.init)
    }
    throw GeminiError.badResponse("cannot parse embedContent payload")
}

Lesson: trust the actual API response over your memory or secondhand docs. A single line of curl saved countless guesses.

Pitfall #2: SwiftPM's `main` entry-point conflict and the SwiftUICore linking error

I initially made the whole project a single executableTarget with the tests depending on it directly. The result: tests failed to link no matter what. An executable target needs a main entry point, but that entry point only exists at the UI step's @main App; and casually adding a placeholder main.swift then conflicts with @main (Swift doesn't allow two entry points in one target). Worse, SwiftUI in an executable target spews SwiftUICore.tbd ... not an allowed client linker warnings.

[Root cause analysis and solution]
This is actually an architecture problem, not a compilation problem. The right approach is to split the project into two layers:

MemeFinder (library target): all logic, models, services, ViewModels — the tests depend only on this layer, it has no entry point, and it links cleanly as a library. ViewModels import Combine (not SwiftUI) to get ObservableObject.
MemeFinderApp (executable target): only SwiftUI views and @main, with import MemeFinder to use the public types above.

After the split, the library and tests don't touch SwiftUI at all, the linker warnings disappear, and the @main conflict no longer exists. "What the tests need to depend on" often forces out clean module boundaries.

Pitfall #3: Parallel indexing's rate limit and "I want to stop indexing halfway"

The first version indexed one image at a time, serially calling Gemini (annotate then embed). For hundreds of images this was painfully slow. So I switched to bounded parallelism with withTaskGroup (at most 4 at once), which brought three new problems:

The Gemini free tier has a rate limit — too much concurrency triggers 429.
The user wants to cancel halfway through a large folder.
Parallel completion order is chaotic, but the results need stable sorting.

[Solution]
Handle the three problems separately, all converging in the same buildIndex:

429 backoff retry: retry only GeminiError.rateLimited with exponential backoff (max 3 attempts); other errors are recorded without retry.
Cooperative cancellation: honor Task.isCancelled; on cancel, stop scheduling new work and keep the completed portion. Even the backoff Task.sleep lets CancellationError propagate normally instead of swallowing it and firing one more API call.
Stable sorting: collect results into a [path: image] dictionary, then reassemble the output in the order of the pre-sorted file list, decoupled from completion order.

// Seed maxConcurrent tasks first, then refill one per completion — strictly cap concurrency
for _ in 0..<maxConcurrent { if !scheduleNext() { break } }
while let res = await group.next() {
    if let img = res.image { resultsByPath[res.path] = img }
    if let err = res.error { errors.append(err) }
    done += 1
    progress(done, total)
    _ = scheduleNext()
}

Incidentally, the HTTP status code was also extracted into a pure function mapResponse(data:statusCode:): 429 → rateLimited, other non-2xx → httpError(code), 2xx → return the data. The retry logic then has a basis, and this part is easy to test too.

Pitfall #4: Evolving from a "windowed app" into "menu-bar resident + global hotkey"

Whether a tool is pleasant to use comes down to "how many steps to summon it." I wanted to hit ⌃⌘M mid-conversation to bring up the search popover, with the app tucked into the menu bar, not occupying the Dock. This step hit two classic macOS pitfalls:

(a) Does a global hotkey need accessibility permission? No. Use Carbon's RegisterEventHotKey to register a fixed hotkey, which doesn't need Accessibility permission (unlike monitoring the whole keyboard). But under Swift 6 strict concurrency, the C event callback has to dispatch through a static id → instance registry, requiring nonisolated(unsafe) and relying on the invariant that "Carbon events are delivered on the main thread" for safety. If ⌃⌘M is already taken, RegisterEventHotKey returns failure — in which case we silently degrade, log a line, and the menu-bar icon still works.

(b) The timing race in the menu-bar right-click menu. The initial approach was "set statusItem.menu → performClick → immediately clear menu," but clearing synchronously fights AppKit's menu-tracking loop, and the menu flashes and disappears.

[Solution]
Pop the menu up directly, fully bypassing the assign-and-clear of statusItem.menu:

@objc private func statusButtonClicked() {
    guard let event = NSApp.currentEvent else { togglePopover(); return }
    if event.type == .rightMouseUp {
        // Pop up directly; don't assign then synchronously clear statusItem.menu
        // (it races AppKit's menu-tracking loop)
        if let button = statusItem?.button {
            NSMenu.popUpContextMenu(makeMenu(), with: event, for: button)
        }
    } else {
        togglePopover()
    }
}

Finally, adding LSUIElement = true to the Info.plist produced by build-app.sh makes the Dock icon disappear, and MemeFinder officially becomes a pure menu-bar tool.

Pitfall #5: The settings form is blank — one symptom, three layers of cause

After moving to the menu-bar version, a user reported "the settings window is completely blank." This seemingly simple bug, peeled apart, actually had three layers, each highly representative.

Layer 1: a Form collapses to zero height inside a hand-rolled NSWindow.
Originally the settings screen lived in SwiftUI's native Settings { } scene, which sizes it sensibly. After the refactor it was hosted in a hand-rolled NSWindow(contentViewController: NSHostingController(rootView: SettingsView())), and SettingsView ended with only .frame(width: 460) — width only, no height. NSWindow(contentViewController:) sizes the window from the content's natural size, but a SwiftUI Form is vertically greedy; with no constraint, its natural height resolves to nearly 0, so the window opens as a 460-wide, near-zero-height blank strip. The fix is just to add a height:

.padding(20)
// When hosted in a hand-rolled NSWindow (not a SwiftUI Settings scene), a Form
// with no height constraint collapses to ~0, turning the window into a blank strip.
.frame(width: 460, height: 320)

Layer 2: ⌘, and the menu-bar "Settings…" go down two different paths.
After adding the height, the user said "still blank." On follow-up I found out he was summoning settings with ⌘,, while the menu-bar right-click "Settings…" went down a different path. The reason: ⌘, in a SwiftUI app triggers the Settings { } scene, and to dodge a state-sharing problem during the refactor, I had set that to Settings { EmptyView() }:

// During the refactor, the Settings scene was left empty to avoid state-sharing
// — so ⌘, opens a blank window
var body: some Scene {
    Settings { EmptyView() }
}

In other words, settings had two entry points pointing at different things: ⌘, pointed at the empty scene, the menu-bar "Settings…" pointed at the real window. The fix unifies the two paths — let the Settings scene host the real SettingsView (so ⌘, works directly), and make the menu-bar "Settings…" open the same native settings window too:

Settings {
    SettingsView(vm: appDelegate.settings, indexing: appDelegate.indexing,
                 onReindex: { appDelegate.reindexNow() },
                 onCancel: { appDelegate.cancelReindex() })
}

// The menu-bar "Settings…" now opens the same Settings scene
@objc private func openSettings() {
    NSApp.activate(ignoringOtherApps: true)
    NSApp.sendAction(Selector(("showSettingsWindow:")), to: nil, from: nil)
}

This also leverages the fact that a SwiftUI App body is @MainActor-isolated — so reading the @MainActor appDelegate.settings directly from the body is legal, with no extra bridging needed.

Layer 3 (the most insidious): open doesn't reload a menu-bar app at all.
The biggest time-waster in the process was that after recompiling, I'd ask the user to open MemeFinder.app, yet he kept seeing the old behavior. Because MemeFinder is an LSUIElement menu-bar-resident app — when an instance is already running, open only wakes the existing old process instead of relaunching with the new binary. So we were actually testing the same old build the whole time. The correct dev loop is to truly kill it first, then run from source:

killall MemeFinderApp 2>/dev/null; swift run MemeFinderApp

This layer reminds me: when debugging, first confirm "what you're testing really is the version you changed" — otherwise all your reasoning is built on faulty observations.

On the "Development Process" Itself

This project was driven almost entirely by an AI agent workflow of spec → plan → subagent task-by-task implementation → two-stage review: each feature started with a design spec, was broken into independently testable small tasks, every task wrote a failing test first (TDD) before implementing, and after completion an independent review agent checked spec compliance and code quality, followed by one final whole-branch review.

Several of the pitfalls — GeminiError error 0, the library/executable split, swallowing CancellationError during backoff, the menu timing race — were in fact caught half the time during the review stage, not written correctly on the first pass. This echoes that old principle: having tests as armor, and someone (or an agent) seriously reading the diff, matters far more than writing fast. The final project maintains 47 unit tests and a zero-warning release build.

Results and Benefits

Type to find, click to paste: type a Chinese description in the menu-bar popover, semantic search instantly lists relevant memes, click one to copy it to the clipboard and paste straight into LINE / Slack / Messages.
Privacy-friendly, searchable offline: images and the index live locally (~/Library/Application Support/MemeFinder/index.json); only the "build the index" step calls Gemini.
A truly handy tool: ⌃⌘M is available anytime, menu-bar resident, no Dock footprint; incremental indexing only processes new/changed images, and indexing can show progress and be canceled.
A clean, maintainable architecture: a two-layer library/executable design, Gemini hidden behind a protocol, pure logic fully covered by tests.

All the development code for this project is open-sourced on GitHub: kkdai/meme-finder-app. Feel free to clone it, point it at your own meme-collection folder, and experience the joy of "type to find your meme"!

DEV Community

[Gemini API in Action] Building MemeFinder: A Native Mac Menu Bar Widget for Finding Memes via Text Using Gemini Vision & Semantic Embeddings

The Origin: Mid-Conversation, Where on Earth Is That Meme?

System Design and Architecture

System Architecture Flow

Core Implementation

1. Auto-tagging memes with the Gemini vision model

2. Hybrid semantic + keyword ranking

Major Pitfalls and Solutions

Pitfall #1: The mysterious `GeminiError error 0` — indexing and search both fail

Pitfall #2: SwiftPM's `main` entry-point conflict and the SwiftUICore linking error

Pitfall #3: Parallel indexing's rate limit and "I want to stop indexing halfway"

Pitfall #4: Evolving from a "windowed app" into "menu-bar resident + global hotkey"

Pitfall #5: The settings form is blank — one symptom, three layers of cause

On the "Development Process" Itself

Results and Benefits

Top comments (0)

The Origin: Mid-Conversation, Where on Earth Is That Meme?

System Design and Architecture

System Architecture Flow

Core Implementation

1. Auto-tagging memes with the Gemini vision model

2. Hybrid semantic + keyword ranking

Major Pitfalls and Solutions

Pitfall #1: The mysterious GeminiError error 0 — indexing and search both fail

Pitfall #2: SwiftPM's main entry-point conflict and the SwiftUICore linking error

Pitfall #3: Parallel indexing's rate limit and "I want to stop indexing halfway"

Pitfall #4: Evolving from a "windowed app" into "menu-bar resident + global hotkey"

Pitfall #5: The settings form is blank — one symptom, three layers of cause

On the "Development Process" Itself

Results and Benefits

Pitfall #1: The mysterious `GeminiError error 0` — indexing and search both fail

Pitfall #2: SwiftPM's `main` entry-point conflict and the SwiftUICore linking error