I kept running into the same annoyance whenever I needed embeddings. The retrieval part of a RAG pipeline was hard enough already. Generating vectors should've been the easy part.
But every time I needed to embed something, the workflow looked like this:
- Open a notebook or write a throwaway script
- Import the SDK, set up the client
- Figure out the right model name (was it
text-embedding-004ortext-embedding-3-small?) - Write the call, handle the response format
- Copy the vector out of the output
For text that's annoying. For images or audio it's worse. Different SDKs, different input formats, different response shapes.
I kept thinking: I can curl an API in seconds. I can jq a JSON response without writing a script. Why can't I just embed something from the terminal?
The tool I wanted
Something like httpie but for embeddings. Type a command, get a vector back.
vemb text "hello world"
# {"model": "gemini-embedding-2-preview", "dimensions": 3072, "values": [0.0123, -0.0456, ...]}
vemb text "hello world" --compact
# [0.0123, -0.0456, 0.0789, ...]
Embed an image:
vemb image photo.jpg
Embed a PDF:
vemb pdf report.pdf
Compare two files:
vemb similar photo1.jpg photo2.jpg
# 0.8734
No notebooks, no scripts, no boilerplate. Just the vector.
Why Gemini Embedding 2
I looked at OpenAI's embedding models first. Their embeddings endpoint is text-only. If you want to embed images, you're stitching together separate models and separate vector spaces. No clean way to compare text against images with a single embedding call.
Google released Gemini Embedding 2 (public preview, March 2026). One model that handles text, images, audio, video, and PDFs natively. Same vector space for everything. You can embed a photo and a text description and compare them directly with cosine similarity.
That's what made the CLI possible. One model, one API, all input types.
Building it
The whole thing is ~400 lines of Python. Two files: embed.py (core logic) and cli.py (Click commands).
The interesting parts:
Auto-detection: vemb embed guesses the file type from the extension. JPEGs, PNGs, MP3s, WAVs, MP4s, PDFs all work with the same command.
Batch mode: vemb embed *.jpg --jsonl embeds every file and outputs one JSON object per line.
Directory search: vemb search ./photos/ "dark moody sunset" embeds the query, embeds every file in the directory (with caching), and ranks by cosine similarity.
# search a folder of images by text description
vemb search ./photos/ "dark moody sunset" --top 5
0.8234 ./photos/sunset-beach.png
0.7891 ./photos/evening-skyline.png
0.7654 ./photos/golden-hour.png
0.6123 ./photos/cloudy-morning.png
0.5987 ./photos/overcast-street.png
Example use cases
Retrieval experiments: embed a few chunks, check similarity scores, tune the chunking. No notebook needed.
Image search: I keep a folder of reference mockups. vemb search ./mockups/ "login screen" finds the right ones instantly.
Checking if two files are semantically close: vemb similar draft-v1.pdf draft-v2.pdf tells me how much the content actually changed between versions.
Cross-modal search: embed a text query against a folder of images and get ranked results. One model, one vector space means text and images are directly comparable.
Try it
pipx install vemb
export GEMINI_API_KEY=your_key # free at aistudio.google.com/apikey
vemb text "hello world"
Source and docs: github.com/yuvrajangadsingh/vemb
The API key is free, the model is free tier. If you're building anything with embeddings and you're tired of opening notebooks for a one-line operation, try it.
Top comments (0)