Serhii Kalyna

Posted on May 25 • Originally published at kalyna.pro

Pinecone Tutorial: Getting Started with Vector Search in Python (2026)

#python #ai #vectordatabase #tutorial

Originally published at kalyna.pro

Pinecone is a managed vector database built for production-scale similarity search. Unlike self-hosted alternatives, it handles infrastructure, replication, and scaling automatically. In this tutorial you will create a Pinecone index from scratch, generate embeddings with Sentence Transformers, upsert vectors with metadata, run semantic queries, and wire up a full RAG pipeline with the Claude API.

Common use cases: semantic search, RAG, recommendation systems, and duplicate detection.

Prerequisites

Python 3.8+ installed
A free Pinecone account at pinecone.io — free tier includes 2 GB storage and one serverless index
Your Pinecone API key from the Pinecone console

Install the required packages:

pip install pinecone sentence-transformers

Create Your First Index

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")

# Create a serverless index (AWS us-east-1 is available on the free tier)
pc.create_index(
    name="demo",
    dimension=384,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# Connect to the index
index = pc.Index("demo")
print(index.describe_index_stats())

Guard against re-creating an existing index:

existing = [i.name for i in pc.list_indexes()]
if "demo" not in existing:
    pc.create_index(
        name="demo",
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index("demo")

Generate Embeddings

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

texts = [
    "Python is a high-level programming language.",
    "Machine learning models require training data.",
    "Vector databases store embeddings for similarity search.",
    "LLMs generate text by predicting the next token.",
    "Pinecone is a managed vector database for production use.",
]

embeddings = model.encode(texts)
print(f"Shape: {embeddings.shape}")  # (5, 384)

model.encode() returns a NumPy array of shape (n_texts, 384). Call .tolist() on each row before upserting into Pinecone.

Upsert Vectors

vectors = [
    {
        "id": f"doc{i}",
        "values": embeddings[i].tolist(),
        "metadata": {"text": texts[i], "source": "docs"},
    }
    for i in range(len(texts))
]

index.upsert(vectors=vectors)
print(index.describe_index_stats())

For large datasets, batch in chunks of 100:

def upsert_in_batches(index, vectors, batch_size=100):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i : i + batch_size]
        index.upsert(vectors=batch)
        print(f"  Upserted {min(i + batch_size, len(vectors))}/{len(vectors)}")

Query Vectors

query_text = "What is a vector database?"
query_emb = model.encode([query_text])[0].tolist()

results = index.query(
    vector=query_emb,
    top_k=3,
    include_metadata=True,
)

for match in results["matches"]:
    print(f"Score: {match['score']:.4f}  |  {match['metadata']['text']}")

Filter by metadata at query time:

results = index.query(
    vector=query_emb,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "docs"}},
)

Full Semantic Search Example

from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer

PINECONE_API_KEY = "YOUR_PINECONE_API_KEY"
INDEX_NAME = "semantic-search-demo"

articles = [
    "Pinecone is a managed vector database optimised for high-speed similarity search.",
    "ChromaDB is an open-source vector store that runs locally without any API key.",
    "FAISS is a Facebook AI library for efficient exact and approximate nearest neighbour search.",
    "Sentence Transformers convert text to dense embedding vectors for semantic similarity.",
    "RAG combines retrieval with generation to ground LLM answers in real documents.",
    "Python is the dominant language for machine learning and AI development in 2026.",
    "Claude is Anthropic's family of AI assistants based on Constitutional AI training.",
    "LangChain provides tools for composing LLM pipelines using a pipe-operator syntax.",
    "LlamaIndex specialises in document ingestion and advanced retrieval for RAG systems.",
    "Cosine similarity measures the angle between two vectors, ignoring their magnitude.",
]

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(articles)

pc = Pinecone(api_key=PINECONE_API_KEY)
existing = [i.name for i in pc.list_indexes()]
if INDEX_NAME not in existing:
    pc.create_index(
        name=INDEX_NAME,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index(INDEX_NAME)

vectors = [
    {"id": f"article{i}", "values": embeddings[i].tolist(), "metadata": {"text": articles[i]}}
    for i in range(len(articles))
]
index.upsert(vectors=vectors)
print(f"Indexed {len(articles)} articles\n")

queries = [
    "Which vector database works offline?",
    "How does semantic search differ from keyword search?",
    "What tool helps build RAG pipelines?",
]

for query in queries:
    qvec = model.encode([query])[0].tolist()
    results = index.query(vector=qvec, top_k=2, include_metadata=True)
    print(f"Query: {query}")
    for r in results["matches"]:
        print(f"  [{r['score']:.3f}] {r['metadata']['text']}")
    print()

Pinecone + RAG with Claude

import anthropic
from pinecone import Pinecone
from sentence_transformers import SentenceTransformer

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
index = pc.Index("semantic-search-demo")
model = SentenceTransformer("all-MiniLM-L6-v2")
claude = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env


def rag_query(question: str, top_k: int = 3) -> str:
    qvec = model.encode([question])[0].tolist()
    results = index.query(vector=qvec, top_k=top_k, include_metadata=True)
    context = "\n".join(r["metadata"]["text"] for r in results["matches"])

    prompt = (
        f"Answer the question using only the context below.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}"
    )

    message = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    return message.content[0].text


print(rag_query("What is the difference between Pinecone and ChromaDB?"))

Install the Claude SDK: pip install anthropic. For a deeper RAG walkthrough, see the RAG Tutorial with Python.

Pinecone vs ChromaDB

	Pinecone	ChromaDB
Type	Managed cloud service	Open source, self-hosted
Scale	Billions of vectors	Tens of millions (single node)
Cost	Free tier + paid plans from $70/mo	Free to self-host
Setup	API key, no infra	pip install, runs in-process
Best for	Production SaaS, large scale	Local dev, prototypes, privacy

Pinecone — managed, scalable, SLA-backed. No ops required.
ChromaDB — open source, local, zero cost. Full data control.
Both support metadata filtering and cosine/dot/euclidean metrics.

See the ChromaDB Tutorial and Vector Databases Comparison (2026) for a full breakdown.

Pinecone Pricing

Free (Starter) — 2 GB storage, 1 serverless index, unlimited queries within the free allocation. No credit card required.
Standard — from $70/month — multiple indexes, higher storage, dedicated support SLA.
Enterprise — $100+/month (custom) — private clusters, VPC peering, SSO, custom SLAs.

Costs are measured in read units (RU) and write units (WU). Check the Pinecone pricing page for current numbers.

Summary

Create a Pinecone index with pc.create_index() specifying dimension and metric to match your embedding model.
Use all-MiniLM-L6-v2 from Sentence Transformers for free, accurate local embeddings (384 dimensions).
Upsert vectors with metadata using index.upsert(vectors=[...]) and batch in chunks of 100.
Query by natural language with index.query(vector=..., top_k=5, include_metadata=True).
Add a filter dict to restrict results by metadata fields at query time.
Combine Pinecone retrieval with claude-sonnet-4-6 for a production RAG pipeline.
Choose Pinecone for managed, scalable production; choose ChromaDB for local dev and cost-sensitive projects.

DEV Community