DEV Community

GPU-Bridge
GPU-Bridge

Posted on

GPU-Bridge + LlamaIndex: Embeddings and Reranking in One Line

GPU-Bridge + LlamaIndex: Embeddings and Reranking in One Line

Most RAG pipelines manage 3-4 separate billing accounts for embeddings, reranking, LLM, and document parsing. GPU-Bridge collapses all of that into one endpoint.

We just shipped two LlamaIndex integrations:

  • llama-index-embeddings-gpubridge — high-throughput text embeddings
  • llama-index-postprocessor-gpubridge-rerank — semantic reranking

Here's how to use them.

Install

pip install llama-index-embeddings-gpubridge
pip install llama-index-postprocessor-gpubridge-rerank
Enter fullscreen mode Exit fullscreen mode

Get an API key at gpubridge.xyz (free to start).

Embeddings

from llama_index.embeddings.gpubridge import GPUBridgeEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Set your embed model
embed_model = GPUBridgeEmbedding(api_key="gpub_...")

# Build an index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model,
)

# Query
query_engine = index.as_query_engine(embed_model=embed_model)
response = query_engine.query("What is GPU-Bridge?")
print(response)
Enter fullscreen mode Exit fullscreen mode

Cost: ~$0.00002/call. For 100k document chunks, that's $2. Compare to OpenAI at $0.0001/1k tokens — GPU-Bridge embeddings are 5x cheaper.

Reranking

After vector search, reranking significantly improves result quality by reordering candidates by semantic relevance:

from llama_index.embeddings.gpubridge import GPUBridgeEmbedding
from llama_index.postprocessor.gpubridge_rerank import GPUBridgeRerank
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.query_engine import RetrieverQueryEngine

embed_model = GPUBridgeEmbedding(api_key="gpub_...")
reranker = GPUBridgeRerank(api_key="gpub_...", top_n=3)

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
retriever = index.as_retriever(similarity_top_k=10)  # retrieve 10, rerank to 3

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[reranker],
)

response = query_engine.query("Your question here")
print(response)
Enter fullscreen mode Exit fullscreen mode

Reranking typically improves retrieval quality by 15-30% on standard benchmarks by catching semantically relevant results that vector similarity missed.

Full RAG Pipeline (Embed + Rerank + LLM)

from llama_index.embeddings.gpubridge import GPUBridgeEmbedding
from llama_index.postprocessor.gpubridge_rerank import GPUBridgeRerank
from llama_index.llms.openai import OpenAI  # or any LLM
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.query_engine import RetrieverQueryEngine

# Configure GPU-Bridge for embed + rerank
Settings.embed_model = GPUBridgeEmbedding(api_key="gpub_...")
reranker = GPUBridgeRerank(api_key="gpub_...", top_n=5)

# Load and index
docs = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(docs)

# Query with reranking
engine = RetrieverQueryEngine(
    retriever=index.as_retriever(similarity_top_k=15),
    node_postprocessors=[reranker],
)

print(engine.query("Explain the architecture"))
Enter fullscreen mode Exit fullscreen mode

x402 Autonomous Payments

For agents that need to pay for their own compute without a pre-registered API key:

# No API key needed — agent pays autonomously via x402
embed_model = GPUBridgeEmbedding()  # will use x402 if wallet configured
Enter fullscreen mode Exit fullscreen mode

The agent receives an HTTP 402 response, pays with USDC on Base L2, and the embedding executes. No human in the approval loop.

Available Services

GPU-Bridge exposes 30 services via POST https://api.gpubridge.xyz/run:

Service Use case Price
embedding-l4 Text embeddings ~$0.00002/call
rerank Semantic reranking per call
llm-4090 LLM inference (Llama 70B) ~$0.06/call
pdf-parse PDF extraction per call
whisper-l4 Audio transcription per minute

Full catalog: curl https://api.gpubridge.xyz/catalog

GitHub


GPU-Bridge is in early access. Sign up at gpubridge.xyz to get an API key.

Top comments (0)