Weaviate is an open-source vector database with built-in AI modules for automatic vectorization. Upload text or images — Weaviate generates embeddings automatically using OpenAI, Cohere, HuggingFace, or local transformers.
Free, open source, with a free cloud sandbox. No need to generate embeddings yourself.
Why Use Weaviate?
- Auto-vectorization — built-in modules generate embeddings for you
- GraphQL API — powerful query language for vector + scalar search
- Multi-modal — search text, images, and more in one database
- Generative search — RAG built into the query language
- Free sandbox — 14-day free cloud instance, no credit card
Quick Setup
1. Install
# Docker Compose
wget https://configuration.weaviate.io/v2/docker-compose/docker-compose.yml
docker compose up -d
# With vectorizer modules
docker compose -f docker-compose-openai.yml up -d
2. Create a Class
curl -s -X POST http://localhost:8080/v1/schema \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"vectorizer": "text2vec-openai",
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
{"name": "category", "dataType": ["text"]},
{"name": "views", "dataType": ["int"]}
]
}' | jq
3. Add Objects
curl -s -X POST http://localhost:8080/v1/objects \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"properties": {
"title": "Complete Guide to Web Scraping in Python",
"content": "Web scraping is the process of extracting data from websites...",
"category": "tutorial",
"views": 5000
}
}' | jq '{id: .id, class: .class}'
4. Semantic Search (GraphQL)
# Search by natural language
curl -s -X POST http://localhost:8080/v1/graphql \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(nearText: {concepts: [\"data extraction from websites\"]}, limit: 5) { title content category views _additional { certainty distance } } } }"
}' | jq '.data.Get.Article[] | {title: .title, certainty: ._additional.certainty}'
# With filters
curl -s -X POST http://localhost:8080/v1/graphql \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(nearText: {concepts: [\"scraping\"]}, where: {path: [\"views\"], operator: GreaterThan, valueInt: 1000}, limit: 5) { title views } } }"
}' | jq
5. Generative Search (RAG)
curl -s -X POST http://localhost:8080/v1/graphql \
-H "Content-Type: application/json" \
-d '{
"query": "{ Get { Article(nearText: {concepts: [\"web scraping\"]}, limit: 3) { title content _additional { generate(singleResult: {prompt: \"Summarize this article in one sentence: {content}\"}) { singleResult } } } } }"
}' | jq
Python Example
import weaviate
client = weaviate.connect_to_local()
# Create collection
articles = client.collections.create(
name="Article",
vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
properties=[
weaviate.classes.config.Property(name="title", data_type=weaviate.classes.config.DataType.TEXT),
weaviate.classes.config.Property(name="content", data_type=weaviate.classes.config.DataType.TEXT),
]
)
# Add objects (auto-vectorized!)
articles.data.insert({"title": "Web Scraping Guide", "content": "Learn how to extract data..."})
# Semantic search
results = articles.query.near_text(query="data extraction", limit=5)
for o in results.objects:
print(f"{o.properties['title']} | Distance: {o.metadata.distance:.4f}")
client.close()
Key REST Endpoints
| Endpoint | Description |
|---|---|
| /v1/schema | Manage classes/collections |
| /v1/objects | CRUD for data objects |
| /v1/graphql | GraphQL queries |
| /v1/batch/objects | Batch import |
| /v1/.well-known/ready | Health check |
| /v1/meta | Server metadata |
| /v1/nodes | Cluster node info |
Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors
Top comments (0)