Why Weaviate?
Weaviate is an open-source vector database for AI applications. It stores data as vectors (embeddings) and enables semantic search — find results by meaning, not just keywords.
Weaviate Cloud free tier (sandbox): 500K objects, 14-day persistence.
Getting Started
Option 1: Weaviate Cloud (Free Sandbox)
Sign up at weaviate.io — get a free sandbox cluster.
Option 2: Docker
docker run -d -p 8080:8080 -p 50051:50051 \
-e QUERY_DEFAULTS_LIMIT=25 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
-e DEFAULT_VECTORIZER_MODULE=text2vec-transformers \
-e ENABLE_MODULES=text2vec-transformers \
semitechnologies/weaviate:latest
Python Client
import weaviate
import weaviate.classes as wvc
client = weaviate.connect_to_weaviate_cloud(
cluster_url="https://your-sandbox.weaviate.network",
auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY")
)
# Create collection with auto-vectorization
client.collections.create(
name="Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
properties=[
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="category", data_type=wvc.config.DataType.TEXT)
]
)
articles = client.collections.get("Article")
# Add data (auto-vectorized!)
articles.data.insert_many([
{"title": "Introduction to Machine Learning", "content": "ML is a subset of AI that learns from data...", "category": "AI"},
{"title": "Web Scraping Best Practices", "content": "Learn how to scrape websites ethically...", "category": "Dev"},
{"title": "Building REST APIs with FastAPI", "content": "FastAPI is a modern Python framework...", "category": "Dev"},
{"title": "Neural Networks Explained", "content": "Deep learning uses layers of neurons...", "category": "AI"}
])
# Semantic search — finds by MEANING, not keywords!
results = articles.query.near_text(
query="artificial intelligence and deep learning",
limit=3
)
for obj in results.objects:
print(f"{obj.properties['title']} [{obj.properties['category']}]")
# Returns ML and Neural Networks articles — even without exact keyword match!
# Hybrid search (vector + keyword)
results = articles.query.hybrid(
query="Python web framework",
limit=3,
alpha=0.5 # 0=keyword, 1=vector, 0.5=balanced
)
# Filtered vector search
results = articles.query.near_text(
query="learning algorithms",
filters=wvc.query.Filter.by_property("category").equal("AI"),
limit=5
)
JavaScript Client
import weaviate from "weaviate-client";
const client = await weaviate.connectToWeaviateCloud(
"https://your-sandbox.weaviate.network",
{ authCredentials: new weaviate.ApiKey("YOUR_KEY") }
);
const articles = client.collections.get("Article");
// Semantic search
const result = await articles.query.nearText("machine learning basics", { limit: 5 });
result.objects.forEach(obj => console.log(obj.properties.title));
// Generate (RAG — Retrieval Augmented Generation)
const ragResult = await articles.generate.nearText(
"AI concepts",
{ singlePrompt: "Summarize this article in one sentence: {title} - {content}" },
{ limit: 3 }
);
ragResult.objects.forEach(obj => {
console.log(`${obj.properties.title}`);
console.log(`Summary: ${obj.generated}`);
});
Use Cases
- Semantic search — find products/docs by meaning
- RAG — feed context to LLMs from your data
- Recommendation — "similar items" based on embeddings
- Image search — search images by description
- Anomaly detection — find outliers in vector space
Need to scrape data for your vector database? I build production-ready scrapers. Check out my Apify actors or email spinov001@gmail.com for custom data pipelines.
Building AI search? Share your vector DB experience below!
Top comments (0)