Originally published at kalyna.pro
Pinecone is a managed vector database built for production-scale similarity search. Unlike self-hosted alternatives, it handles infrastructure, replication, and scaling automatically. In this tutorial you will create a Pinecone index from scratch, generate embeddings with Sentence Transformers, upsert vectors with metadata, run semantic queries, and wire up a full RAG pipeline with the Claude API.
Common use cases: semantic search, RAG, recommendation systems, and duplicate detection.
Prerequisites
- Python 3.8+ installed
- A free Pinecone account at pinecone.io — free tier includes 2 GB storage and one serverless index
- Your Pinecone API key from the Pinecone console
Install the required packages:
pip install pinecone sentence-transformers
Create Your First Index
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
# Create a serverless index (AWS us-east-1 is available on the free tier)
pc.create_index(
name="demo",
dimension=384,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
# Connect to the index
index = pc.Index("demo")
print(index.describe_index_stats())
Guard against re-creating an existing index:
existing = [i.name for i in pc.list_indexes()]
if "demo" not in existing:
pc.create_index(
name="demo",
dimension=384,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("demo")
Generate Embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
texts = [
"Python is a high-level programming language.",
"Machine learning models require training data.",
"Vector databases store embeddings for similarity search.",
"LLMs generate text by predicting the next token.",
"Pinecone is a managed vector database for production use.",
]
embeddings = model.encode(texts)
print(f"Shape: {embeddings.shape}") # (5, 384)
model.encode() returns a NumPy array of shape (n_texts, 384). Call .tolist() on each row before upserting into Pinecone.
Upsert Vectors
vectors = [
{
"id": f"doc{i}",
"values": embeddings[i].tolist(),
"metadata": {"text": texts[i], "source": "docs"},
}
for i in range(len(texts))
]
index.upsert(vectors=vectors)
print(index.describe_index_stats())
For large datasets, batch in chunks of 100:
def upsert_in_batches(index, vectors, batch_size=100):
for i in range(0, len(vectors), batch_size):
batch = vectors[i : i + batch_size]
index.upsert(vectors=batch)
print(f" Upserted {min(i + batch_size, len(vectors))}/{len(vectors)}")
Query Vectors
query_text = "What is a vector database?"
query_emb = model.encode([query_text])[0].tolist()
results = index.query(
vector=query_emb,
top_k=3,
include_metadata=True,
)
for match in results["matches"]:
print(f"Score: {match['score']:.4f} | {match['metadata']['text']}")
Filter by metadata at query time:
results = index.query(
vector=query_emb,
top_k=5,
include_metadata=True,
filter={"source": {"$eq": "docs"}},
)
Full Semantic Search Example
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
PINECONE_API_KEY = "YOUR_PINECONE_API_KEY"
INDEX_NAME = "semantic-search-demo"
articles = [
"Pinecone is a managed vector database optimised for high-speed similarity search.",
"ChromaDB is an open-source vector store that runs locally without any API key.",
"FAISS is a Facebook AI library for efficient exact and approximate nearest neighbour search.",
"Sentence Transformers convert text to dense embedding vectors for semantic similarity.",
"RAG combines retrieval with generation to ground LLM answers in real documents.",
"Python is the dominant language for machine learning and AI development in 2026.",
"Claude is Anthropic's family of AI assistants based on Constitutional AI training.",
"LangChain provides tools for composing LLM pipelines using a pipe-operator syntax.",
"LlamaIndex specialises in document ingestion and advanced retrieval for RAG systems.",
"Cosine similarity measures the angle between two vectors, ignoring their magnitude.",
]
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(articles)
pc = Pinecone(api_key=PINECONE_API_KEY)
existing = [i.name for i in pc.list_indexes()]
if INDEX_NAME not in existing:
pc.create_index(
name=INDEX_NAME,
dimension=384,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index(INDEX_NAME)
vectors = [
{"id": f"article{i}", "values": embeddings[i].tolist(), "metadata": {"text": articles[i]}}
for i in range(len(articles))
]
index.upsert(vectors=vectors)
print(f"Indexed {len(articles)} articles\n")
queries = [
"Which vector database works offline?",
"How does semantic search differ from keyword search?",
"What tool helps build RAG pipelines?",
]
for query in queries:
qvec = model.encode([query])[0].tolist()
results = index.query(vector=qvec, top_k=2, include_metadata=True)
print(f"Query: {query}")
for r in results["matches"]:
print(f" [{r['score']:.3f}] {r['metadata']['text']}")
print()
Pinecone + RAG with Claude
import anthropic
from pinecone import Pinecone
from sentence_transformers import SentenceTransformer
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
index = pc.Index("semantic-search-demo")
model = SentenceTransformer("all-MiniLM-L6-v2")
claude = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
def rag_query(question: str, top_k: int = 3) -> str:
qvec = model.encode([question])[0].tolist()
results = index.query(vector=qvec, top_k=top_k, include_metadata=True)
context = "\n".join(r["metadata"]["text"] for r in results["matches"])
prompt = (
f"Answer the question using only the context below.\n\n"
f"Context:\n{context}\n\n"
f"Question: {question}"
)
message = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text
print(rag_query("What is the difference between Pinecone and ChromaDB?"))
Install the Claude SDK: pip install anthropic. For a deeper RAG walkthrough, see the RAG Tutorial with Python.
Pinecone vs ChromaDB
| Pinecone | ChromaDB | |
|---|---|---|
| Type | Managed cloud service | Open source, self-hosted |
| Scale | Billions of vectors | Tens of millions (single node) |
| Cost | Free tier + paid plans from $70/mo | Free to self-host |
| Setup | API key, no infra | pip install, runs in-process |
| Best for | Production SaaS, large scale | Local dev, prototypes, privacy |
- Pinecone — managed, scalable, SLA-backed. No ops required.
- ChromaDB — open source, local, zero cost. Full data control.
- Both support metadata filtering and cosine/dot/euclidean metrics.
See the ChromaDB Tutorial and Vector Databases Comparison (2026) for a full breakdown.
Pinecone Pricing
- Free (Starter) — 2 GB storage, 1 serverless index, unlimited queries within the free allocation. No credit card required.
- Standard — from $70/month — multiple indexes, higher storage, dedicated support SLA.
- Enterprise — $100+/month (custom) — private clusters, VPC peering, SSO, custom SLAs.
Costs are measured in read units (RU) and write units (WU). Check the Pinecone pricing page for current numbers.
Summary
- Create a Pinecone index with
pc.create_index()specifying dimension and metric to match your embedding model. - Use
all-MiniLM-L6-v2from Sentence Transformers for free, accurate local embeddings (384 dimensions). - Upsert vectors with metadata using
index.upsert(vectors=[...])and batch in chunks of 100. - Query by natural language with
index.query(vector=..., top_k=5, include_metadata=True). - Add a
filterdict to restrict results by metadata fields at query time. - Combine Pinecone retrieval with
claude-sonnet-4-6for a production RAG pipeline. - Choose Pinecone for managed, scalable production; choose ChromaDB for local dev and cost-sensitive projects.
Further reading:
Top comments (0)