Weaviate is a vector database that can vectorize your data automatically using built-in ML models. No need to manage embeddings separately — just send text and Weaviate handles the rest.
Why Weaviate?
- Auto-vectorization — built-in OpenAI, Cohere, HuggingFace integrations
- GraphQL + REST — query with GraphQL or REST API
- Hybrid search — combine BM25 keyword + vector similarity
- Generative search — RAG built into the database
- Free cloud — 14-day sandbox, self-host unlimited
Quick Start
# Docker
docker run -d -p 8080:8080 -p 50051:50051 \
-e QUERY_DEFAULTS_LIMIT=20 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
-e PERSISTENCE_DATA_PATH='/var/lib/weaviate' \
-e CLUSTER_HOSTNAME='node1' \
semitechnologies/weaviate:latest
# Python client
pip install weaviate-client
Python SDK
import weaviate
import weaviate.classes as wvc
client = weaviate.connect_to_local() # localhost:8080
# Create collection with auto-vectorization
articles = client.collections.create(
name="Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
generative_config=wvc.config.Configure.Generative.openai(),
properties=[
wvc.config.Property(name="title", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="category", data_type=wvc.config.DataType.TEXT),
],
)
# Insert — Weaviate auto-vectorizes the text!
articles.data.insert_many([
{"title": "Intro to RAG", "content": "RAG combines retrieval with generation...", "category": "AI"},
{"title": "Vector DBs", "content": "Vector databases store embeddings...", "category": "Database"},
])
# Semantic search (just send text, not vectors!)
results = articles.query.near_text(
query="how to build AI search",
limit=5,
return_properties=["title", "content"],
)
for obj in results.objects:
print(obj.properties["title"])
# Hybrid search (keyword + semantic)
results = articles.query.hybrid(
query="vector database performance",
alpha=0.5, # 0=keyword, 1=vector
limit=5,
)
# Generative search (RAG in one query!)
results = articles.generate.near_text(
query="explain RAG",
grouped_task="Summarize these articles in 2 sentences",
limit=3,
)
print(results.generated) # AI-generated summary using retrieved context
REST API
BASE="http://localhost:8080/v1"
# Schema
curl $BASE/schema
# Get objects
curl $BASE/objects?class=Article&limit=10
# Create object
curl -X POST $BASE/objects \
-H 'Content-Type: application/json' \
-d '{
"class": "Article",
"properties": {
"title": "New Article",
"content": "Article content here"
}
}'
GraphQL API
{
Get {
Article(
nearText: { concepts: ["machine learning"] }
limit: 5
) {
title
content
_additional {
distance
certainty
}
}
}
}
Key Features
| Feature | Details |
|---|---|
| Vectorizers | OpenAI, Cohere, HuggingFace, Ollama |
| Search | Semantic, keyword, hybrid, generative |
| API | REST, GraphQL, gRPC |
| Filtering | Metadata + vector combined |
| Multi-tenancy | Built-in tenant isolation |
| Replication | Built-in for HA |
Resources
Building AI-powered search? Check my Apify actors or email spinov001@gmail.com.
Top comments (0)