GPU-Bridge + LlamaIndex: Embeddings and Reranking in One Line
Most RAG pipelines manage 3-4 separate billing accounts for embeddings, reranking, LLM, and document parsing. GPU-Bridge collapses all of that into one endpoint.
We just shipped two LlamaIndex integrations:
-
llama-index-embeddings-gpubridge— high-throughput text embeddings -
llama-index-postprocessor-gpubridge-rerank— semantic reranking
Here's how to use them.
Install
pip install llama-index-embeddings-gpubridge
pip install llama-index-postprocessor-gpubridge-rerank
Get an API key at gpubridge.xyz (free to start).
Embeddings
from llama_index.embeddings.gpubridge import GPUBridgeEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Set your embed model
embed_model = GPUBridgeEmbedding(api_key="gpub_...")
# Build an index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(
documents,
embed_model=embed_model,
)
# Query
query_engine = index.as_query_engine(embed_model=embed_model)
response = query_engine.query("What is GPU-Bridge?")
print(response)
Cost: ~$0.00002/call. For 100k document chunks, that's $2. Compare to OpenAI at $0.0001/1k tokens — GPU-Bridge embeddings are 5x cheaper.
Reranking
After vector search, reranking significantly improves result quality by reordering candidates by semantic relevance:
from llama_index.embeddings.gpubridge import GPUBridgeEmbedding
from llama_index.postprocessor.gpubridge_rerank import GPUBridgeRerank
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.query_engine import RetrieverQueryEngine
embed_model = GPUBridgeEmbedding(api_key="gpub_...")
reranker = GPUBridgeRerank(api_key="gpub_...", top_n=3)
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
retriever = index.as_retriever(similarity_top_k=10) # retrieve 10, rerank to 3
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[reranker],
)
response = query_engine.query("Your question here")
print(response)
Reranking typically improves retrieval quality by 15-30% on standard benchmarks by catching semantically relevant results that vector similarity missed.
Full RAG Pipeline (Embed + Rerank + LLM)
from llama_index.embeddings.gpubridge import GPUBridgeEmbedding
from llama_index.postprocessor.gpubridge_rerank import GPUBridgeRerank
from llama_index.llms.openai import OpenAI # or any LLM
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.query_engine import RetrieverQueryEngine
# Configure GPU-Bridge for embed + rerank
Settings.embed_model = GPUBridgeEmbedding(api_key="gpub_...")
reranker = GPUBridgeRerank(api_key="gpub_...", top_n=5)
# Load and index
docs = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(docs)
# Query with reranking
engine = RetrieverQueryEngine(
retriever=index.as_retriever(similarity_top_k=15),
node_postprocessors=[reranker],
)
print(engine.query("Explain the architecture"))
x402 Autonomous Payments
For agents that need to pay for their own compute without a pre-registered API key:
# No API key needed — agent pays autonomously via x402
embed_model = GPUBridgeEmbedding() # will use x402 if wallet configured
The agent receives an HTTP 402 response, pays with USDC on Base L2, and the embedding executes. No human in the approval loop.
Available Services
GPU-Bridge exposes 30 services via POST https://api.gpubridge.xyz/run:
| Service | Use case | Price |
|---|---|---|
embedding-l4 |
Text embeddings | ~$0.00002/call |
rerank |
Semantic reranking | per call |
llm-4090 |
LLM inference (Llama 70B) | ~$0.06/call |
pdf-parse |
PDF extraction | per call |
whisper-l4 |
Audio transcription | per minute |
Full catalog: curl https://api.gpubridge.xyz/catalog
GitHub
- Embeddings: github.com/fjnunezp75/llama-index-embeddings-gpubridge
- Reranker: github.com/fjnunezp75/llama-index-postprocessor-gpubridge-rerank
- PR to LlamaIndex: #21014
GPU-Bridge is in early access. Sign up at gpubridge.xyz to get an API key.
Top comments (0)