DEV Community

Cover image for What is Gemini Embedding 2?
Wanda
Wanda

Posted on • Originally published at apidog.com

What is Gemini Embedding 2?

Google’s Gemini Embedding 2 is a powerful tool for developers working with text, images, video, audio, and documents. It unifies all these content types in a single embedding space, streamlining the process of building multimodal AI applications. Released in March 2026, Gemini Embedding 2 is Google’s first model to natively process multiple media types without separate pipelines.

Try Apidog today

If you're implementing semantic search, RAG systems, or testing APIs that handle different types of media, this model can simplify your stack and boost both coverage and accuracy.

What Makes Gemini Embedding 2 Different?

Traditional embedding models are siloed—text embeddings for text, image embeddings for images, etc. Gemini Embedding 2 breaks this pattern by mapping these formats into one embedding space.

Gemini Embedding 2 Multimodal Space

Supported input types per request:

  • Text: Up to 8,192 tokens
  • Images: Up to 6 images
  • Video: Up to 128 seconds
  • Audio: Up to 80 seconds
  • PDF Documents: Up to 6 pages

This means you can search across all these formats with a single query—ask a question in text and retrieve the most relevant videos, images, or documents.

Key Features You Need to Know

1. Interleaved Multimodal Input

Mix content types in a single request. For example, send an image and text together, or combine video and audio. The model understands how these elements relate, so if your data is inherently multimodal (like product listings with images, descriptions, and demos), you get a unified embedding that captures all relationships.

2. Matryoshka Representation Learning (MRL)

Gemini Embedding 2 outputs 3,072-dimensional vectors by default, but you can truncate down to as low as 768 dimensions with minimal accuracy loss. This is efficient for storage and retrieval:

  • Full (3,072): Maximum quality
  • Medium (1,536): Balance
  • Compact (768): ~75% less storage, near-peak quality

Use higher dimensions during development or for critical tasks, then drop to 768 for production to optimize storage and costs.

3. Custom Task Instructions

Specify your task with parameters:

  • RETRIEVAL_QUERY – for search queries
  • RETRIEVAL_DOCUMENT – for indexing documents
  • SEMANTIC_SIMILARITY – compare content
  • CLASSIFICATION – for categorization

The model adjusts embeddings based on your use case, improving results without retraining.

4. Native Audio Processing

Gemini Embedding 2 processes audio directly, capturing tone and context that transcription-based models miss.

Technical Specifications

Text

  • 8,192 tokens per request
  • 100+ languages
  • Handles code and long documents

Images

  • Up to 6 per request
  • PNG, JPEG formats

Video

  • Up to 128 seconds
  • MP4, MOV (H264, H265, AV1, VP9)

Audio

  • Up to 80 seconds
  • MP3, WAV
  • No transcription required

PDF Documents

  • Up to 6 pages per request
  • Handles both text and visuals
  • Built-in OCR

Real-World Use Cases

Semantic Search Across Media Types

Build search engines that return relevant content in any format. Example: A query for “how to fix a leaky faucet” returns:

  • Tutorial videos
  • Step-by-step text articles
  • Diagram images
  • Audio instructions

All ranked for relevance in a single query.

RAG Systems with Multimodal Context

Augment your LLM with context from diverse sources:

  • Product descriptions (text)
  • User manual pages (PDF)
  • Demo videos
  • Customer review audio

Embeddings enable retrieval of the most relevant context, regardless of format.

API Testing with Semantic Similarity

With Apidog, use Gemini embeddings to semantically test API responses. Instead of string matching, compare embeddings for meaning—catching cases where the response wording changes but the intent is preserved. Useful for LLM-powered or natural language APIs.

Semantic API Testing

You can also enhance API documentation search—let developers find endpoints by describing what they want, not by memorizing parameter names.

Content Clustering and Organization

Automatically group related content across formats. Product photos, descriptions, and videos cluster by category.

Sentiment Analysis Across Channels

Aggregate feedback from:

  • Text reviews
  • Video testimonials
  • Audio support calls
  • Social media images

Get unified sentiment insights across all formats.

Performance and Benchmarks

Gemini Embedding 2 outperforms leading models in text, image, and video benchmarks, with strong speech capabilities and advanced multimodal relationship handling. It sets a new standard for depth and flexibility in embedding use cases.

Pricing

  • Text embeddings: $0.20 per million tokens (50% off with batch API)
  • Image, audio, video: Standard Gemini API media token rates

For most RAG or search systems, embedding thousands of documents costs just a few dollars.

Gemini Embedding 2 vs. Competitors

Feature Gemini Embedding 2 OpenAI text-embedding-3 Cohere Embed v3
Modalities Text, image, video, audio, PDF Text only Text only
Max Input 8,192 tokens (text) 8,191 tokens 512 tokens
Dimensions 128-3,072 (flexible) 256-3,072 1,024
Languages 100+ 100+ 100+
Task Instructions Yes No Yes
Pricing $0.20/M tokens $0.13/M tokens $0.10/M tokens
Best For Multimodal apps Text-only apps Text classification

The main differentiator is multimodal support. If you need embeddings for more than text, Gemini is the only unified solution.

Integration and Availability

Gemini Embedding 2 (gemini-embedding-2-preview) is available via:

  • Gemini API
  • Vertex AI
  • LangChain
  • LlamaIndex
  • Haystack
  • Weaviate
  • QDrant
  • ChromaDB
  • Vector Search

Most vector DBs and AI frameworks already support it. Note: The API is in public preview—expect possible changes before general release.

Important Migration Note

The embedding spaces of gemini-embedding-001 and Gemini Embedding 2 are incompatible. Mixing old and new embeddings in the same database won’t work. If you migrate, re-embed your entire dataset.

Output Dimensions: What to Choose

  • 3,072: Highest quality, largest storage
  • 1,536: Good balance
  • 768: Production sweet spot (near-peak quality, 75% smaller)

Most apps should use 768 dimensions to balance quality and storage.

When to Use Gemini Embedding 2

Choose Gemini Embedding 2 if:

  • You have multimodal data (text, images, video, audio)
  • You need semantic search across formats
  • Building RAG with diverse sources
  • Clustering or classifying mixed-media content
  • You want embeddings that capture relationships between modalities

Stick with text-only models if:

  • Your data is only text
  • You need the absolute best text-only performance
  • You can’t re-generate existing embeddings

What This Means for Developers

Gemini Embedding 2 makes multimodal AI simpler:

  • One model for all content types
  • One embedding space, one vector DB
  • Streamlined search and retrieval logic

Matryoshka means you can tune embedding size to your needs. Task instructions let you adapt embeddings without custom training.

Getting Started

  1. Get a Gemini API key from Google AI Studio.
  2. Install the Google Generative AI SDK.
  3. Call the embedding endpoint with your content.
  4. Store embeddings in your vector database.
  5. Use for search, RAG, or classification.

Example (Python):

from google.generativeai import GeminiEmbeddingClient

client = GeminiEmbeddingClient(api_key="YOUR_API_KEY")

embedding = client.embed(
    content=["Here is my text.", "Here is an image URL or binary."],
    task_type="RETRIEVAL_QUERY",
    output_dim=768
)
Enter fullscreen mode Exit fullscreen mode

Adjust task_type and output_dim as needed.

The Bottom Line

Gemini Embedding 2 unifies multimodal AI development—covering text, images, video, audio, and documents in one space. Matryoshka dimensions give you flexibility; task instructions increase task-specific accuracy. Native audio support preserves context that other models lose.

If you're building apps that span multiple content types, this model is worth testing. Public preview is live via Gemini API and Vertex AI.

For semantic search, RAG, or content understanding, Gemini Embedding 2 reduces code complexity and increases coverage. And if you're testing APIs with Apidog, use these embeddings for validating semantic similarity—especially for LLM-powered endpoints.

Top comments (0)