Weaviate is an AI-native vector database with built-in vectorization, hybrid search, and generative modules. Store objects, auto-embed them, and search semantically — all in one platform.
Why Weaviate?
- Auto-vectorization: Built-in OpenAI, Cohere, HuggingFace modules
- Hybrid search: Combine vector + keyword (BM25) in one query
- Generative search: RAG built into the database
- GraphQL API: Rich query language
- Multi-tenancy: Isolated data per tenant
- Self-hosted or cloud: Both options
Docker Setup
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:latest
ports:
- '8080:8080'
- '50051:50051'
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai,generative-openai'
OPENAI_APIKEY: 'sk-your-key'
volumes:
- weaviate-data:/var/lib/weaviate
volumes:
weaviate-data:
Create Schema
curl -X POST http://localhost:8080/v1/schema \
-H 'Content-Type: application/json' \
-d '{
"class": "Article",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {"model": "text-embedding-3-small"}
},
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
{"name": "category", "dataType": ["text"]}
]
}'
Add Objects (Auto-Embedded)
curl -X POST http://localhost:8080/v1/objects \
-H 'Content-Type: application/json' \
-d '{
"class": "Article",
"properties": {
"title": "Building RAG Applications",
"content": "Retrieval-Augmented Generation combines search with LLMs...",
"category": "AI"
}
}'
Weaviate automatically generates the embedding — no manual vectorization.
Semantic Search (GraphQL)
{
Get {
Article(
nearText: { concepts: ["machine learning tutorials"] }
limit: 5
) {
title
content
category
_additional { certainty distance }
}
}
}
Hybrid Search
{
Get {
Article(
hybrid: {
query: "RAG applications"
alpha: 0.75
}
limit: 5
) {
title
content
}
}
}
alpha: 0 = pure keyword, alpha: 1 = pure vector.
Generative Search (RAG)
{
Get {
Article(
nearText: { concepts: ["building APIs"] }
limit: 3
) {
title
content
_additional {
generate(
groupedResult: {
task: "Summarize these articles into a beginner guide"
}
) { groupedResult }
}
}
}
}
Python Client
import weaviate
client = weaviate.connect_to_local()
articles = client.collections.get('Article')
# Semantic search
results = articles.query.near_text(query='how to build APIs', limit=5)
for obj in results.objects:
print(obj.properties['title'])
# Hybrid search
results = articles.query.hybrid(query='REST API tutorial', alpha=0.7, limit=5)
client.close()
Real-World Use Case
A legal tech company indexed 100,000 contracts in Weaviate. Lawyers search by concept ('non-compete clause with 2-year restriction') instead of exact keywords. Search relevance improved 5x compared to Elasticsearch, and the generative module auto-summarizes matching clauses.
Need to automate data collection? Check out my Apify actors for ready-made scrapers, or email spinov001@gmail.com for custom solutions.
Top comments (0)