DEV Community

Cover image for Pinecone Tutorial: Getting Started with Vector Search in Python (2026)
Serhii Kalyna
Serhii Kalyna

Posted on • Originally published at kalyna.pro

Pinecone Tutorial: Getting Started with Vector Search in Python (2026)

Originally published at kalyna.pro

Pinecone is a managed vector database built for production-scale similarity search. Unlike self-hosted alternatives, it handles infrastructure, replication, and scaling automatically. In this tutorial you will create a Pinecone index from scratch, generate embeddings with Sentence Transformers, upsert vectors with metadata, run semantic queries, and wire up a full RAG pipeline with the Claude API.

Common use cases: semantic search, RAG, recommendation systems, and duplicate detection.

Prerequisites

  • Python 3.8+ installed
  • A free Pinecone account at pinecone.io — free tier includes 2 GB storage and one serverless index
  • Your Pinecone API key from the Pinecone console

Install the required packages:

pip install pinecone sentence-transformers
Enter fullscreen mode Exit fullscreen mode

Create Your First Index

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")

# Create a serverless index (AWS us-east-1 is available on the free tier)
pc.create_index(
    name="demo",
    dimension=384,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# Connect to the index
index = pc.Index("demo")
print(index.describe_index_stats())
Enter fullscreen mode Exit fullscreen mode

Guard against re-creating an existing index:

existing = [i.name for i in pc.list_indexes()]
if "demo" not in existing:
    pc.create_index(
        name="demo",
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index("demo")
Enter fullscreen mode Exit fullscreen mode

Generate Embeddings

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

texts = [
    "Python is a high-level programming language.",
    "Machine learning models require training data.",
    "Vector databases store embeddings for similarity search.",
    "LLMs generate text by predicting the next token.",
    "Pinecone is a managed vector database for production use.",
]

embeddings = model.encode(texts)
print(f"Shape: {embeddings.shape}")  # (5, 384)
Enter fullscreen mode Exit fullscreen mode

model.encode() returns a NumPy array of shape (n_texts, 384). Call .tolist() on each row before upserting into Pinecone.

Upsert Vectors

vectors = [
    {
        "id": f"doc{i}",
        "values": embeddings[i].tolist(),
        "metadata": {"text": texts[i], "source": "docs"},
    }
    for i in range(len(texts))
]

index.upsert(vectors=vectors)
print(index.describe_index_stats())
Enter fullscreen mode Exit fullscreen mode

For large datasets, batch in chunks of 100:

def upsert_in_batches(index, vectors, batch_size=100):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i : i + batch_size]
        index.upsert(vectors=batch)
        print(f"  Upserted {min(i + batch_size, len(vectors))}/{len(vectors)}")
Enter fullscreen mode Exit fullscreen mode

Query Vectors

query_text = "What is a vector database?"
query_emb = model.encode([query_text])[0].tolist()

results = index.query(
    vector=query_emb,
    top_k=3,
    include_metadata=True,
)

for match in results["matches"]:
    print(f"Score: {match['score']:.4f}  |  {match['metadata']['text']}")
Enter fullscreen mode Exit fullscreen mode

Filter by metadata at query time:

results = index.query(
    vector=query_emb,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "docs"}},
)
Enter fullscreen mode Exit fullscreen mode

Full Semantic Search Example

from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer

PINECONE_API_KEY = "YOUR_PINECONE_API_KEY"
INDEX_NAME = "semantic-search-demo"

articles = [
    "Pinecone is a managed vector database optimised for high-speed similarity search.",
    "ChromaDB is an open-source vector store that runs locally without any API key.",
    "FAISS is a Facebook AI library for efficient exact and approximate nearest neighbour search.",
    "Sentence Transformers convert text to dense embedding vectors for semantic similarity.",
    "RAG combines retrieval with generation to ground LLM answers in real documents.",
    "Python is the dominant language for machine learning and AI development in 2026.",
    "Claude is Anthropic's family of AI assistants based on Constitutional AI training.",
    "LangChain provides tools for composing LLM pipelines using a pipe-operator syntax.",
    "LlamaIndex specialises in document ingestion and advanced retrieval for RAG systems.",
    "Cosine similarity measures the angle between two vectors, ignoring their magnitude.",
]

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(articles)

pc = Pinecone(api_key=PINECONE_API_KEY)
existing = [i.name for i in pc.list_indexes()]
if INDEX_NAME not in existing:
    pc.create_index(
        name=INDEX_NAME,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index(INDEX_NAME)

vectors = [
    {"id": f"article{i}", "values": embeddings[i].tolist(), "metadata": {"text": articles[i]}}
    for i in range(len(articles))
]
index.upsert(vectors=vectors)
print(f"Indexed {len(articles)} articles\n")

queries = [
    "Which vector database works offline?",
    "How does semantic search differ from keyword search?",
    "What tool helps build RAG pipelines?",
]

for query in queries:
    qvec = model.encode([query])[0].tolist()
    results = index.query(vector=qvec, top_k=2, include_metadata=True)
    print(f"Query: {query}")
    for r in results["matches"]:
        print(f"  [{r['score']:.3f}] {r['metadata']['text']}")
    print()
Enter fullscreen mode Exit fullscreen mode

Pinecone + RAG with Claude

import anthropic
from pinecone import Pinecone
from sentence_transformers import SentenceTransformer

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
index = pc.Index("semantic-search-demo")
model = SentenceTransformer("all-MiniLM-L6-v2")
claude = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env


def rag_query(question: str, top_k: int = 3) -> str:
    qvec = model.encode([question])[0].tolist()
    results = index.query(vector=qvec, top_k=top_k, include_metadata=True)
    context = "\n".join(r["metadata"]["text"] for r in results["matches"])

    prompt = (
        f"Answer the question using only the context below.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}"
    )

    message = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    return message.content[0].text


print(rag_query("What is the difference between Pinecone and ChromaDB?"))
Enter fullscreen mode Exit fullscreen mode

Install the Claude SDK: pip install anthropic. For a deeper RAG walkthrough, see the RAG Tutorial with Python.

Pinecone vs ChromaDB

Pinecone ChromaDB
Type Managed cloud service Open source, self-hosted
Scale Billions of vectors Tens of millions (single node)
Cost Free tier + paid plans from $70/mo Free to self-host
Setup API key, no infra pip install, runs in-process
Best for Production SaaS, large scale Local dev, prototypes, privacy
  • Pinecone — managed, scalable, SLA-backed. No ops required.
  • ChromaDB — open source, local, zero cost. Full data control.
  • Both support metadata filtering and cosine/dot/euclidean metrics.

See the ChromaDB Tutorial and Vector Databases Comparison (2026) for a full breakdown.

Pinecone Pricing

  • Free (Starter) — 2 GB storage, 1 serverless index, unlimited queries within the free allocation. No credit card required.
  • Standard — from $70/month — multiple indexes, higher storage, dedicated support SLA.
  • Enterprise — $100+/month (custom) — private clusters, VPC peering, SSO, custom SLAs.

Costs are measured in read units (RU) and write units (WU). Check the Pinecone pricing page for current numbers.

Summary

  • Create a Pinecone index with pc.create_index() specifying dimension and metric to match your embedding model.
  • Use all-MiniLM-L6-v2 from Sentence Transformers for free, accurate local embeddings (384 dimensions).
  • Upsert vectors with metadata using index.upsert(vectors=[...]) and batch in chunks of 100.
  • Query by natural language with index.query(vector=..., top_k=5, include_metadata=True).
  • Add a filter dict to restrict results by metadata fields at query time.
  • Combine Pinecone retrieval with claude-sonnet-4-6 for a production RAG pipeline.
  • Choose Pinecone for managed, scalable production; choose ChromaDB for local dev and cost-sensitive projects.

Further reading:

Top comments (0)