Weaviate Has a Free API — Vector Database with Built-In AI Modules

#ai #tutorial #database #api

Weaviate is an open-source vector database with built-in AI modules for automatic vectorization. Upload text or images — Weaviate generates embeddings automatically using OpenAI, Cohere, HuggingFace, or local transformers.

Free, open source, with a free cloud sandbox. No need to generate embeddings yourself.

Why Use Weaviate?

Auto-vectorization — built-in modules generate embeddings for you
GraphQL API — powerful query language for vector + scalar search
Multi-modal — search text, images, and more in one database
Generative search — RAG built into the query language
Free sandbox — 14-day free cloud instance, no credit card

Quick Setup

1. Install

# Docker Compose
wget https://configuration.weaviate.io/v2/docker-compose/docker-compose.yml
docker compose up -d

# With vectorizer modules
docker compose -f docker-compose-openai.yml up -d

2. Create a Class

curl -s -X POST http://localhost:8080/v1/schema \
  -H "Content-Type: application/json" \
  -d '{
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "properties": [
      {"name": "title", "dataType": ["text"]},
      {"name": "content", "dataType": ["text"]},
      {"name": "category", "dataType": ["text"]},
      {"name": "views", "dataType": ["int"]}
    ]
  }' | jq

3. Add Objects

curl -s -X POST http://localhost:8080/v1/objects \
  -H "Content-Type: application/json" \
  -d '{
    "class": "Article",
    "properties": {
      "title": "Complete Guide to Web Scraping in Python",
      "content": "Web scraping is the process of extracting data from websites...",
      "category": "tutorial",
      "views": 5000
    }
  }' | jq '{id: .id, class: .class}'

4. Semantic Search (GraphQL)

# Search by natural language
curl -s -X POST http://localhost:8080/v1/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(nearText: {concepts: [\"data extraction from websites\"]}, limit: 5) { title content category views _additional { certainty distance } } } }"
  }' | jq '.data.Get.Article[] | {title: .title, certainty: ._additional.certainty}'

# With filters
curl -s -X POST http://localhost:8080/v1/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(nearText: {concepts: [\"scraping\"]}, where: {path: [\"views\"], operator: GreaterThan, valueInt: 1000}, limit: 5) { title views } } }"
  }' | jq

5. Generative Search (RAG)

curl -s -X POST http://localhost:8080/v1/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ Get { Article(nearText: {concepts: [\"web scraping\"]}, limit: 3) { title content _additional { generate(singleResult: {prompt: \"Summarize this article in one sentence: {content}\"}) { singleResult } } } } }"
  }' | jq

Python Example

import weaviate

client = weaviate.connect_to_local()

# Create collection
articles = client.collections.create(
    name="Article",
    vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
    properties=[
        weaviate.classes.config.Property(name="title", data_type=weaviate.classes.config.DataType.TEXT),
        weaviate.classes.config.Property(name="content", data_type=weaviate.classes.config.DataType.TEXT),
    ]
)

# Add objects (auto-vectorized!)
articles.data.insert({"title": "Web Scraping Guide", "content": "Learn how to extract data..."})

# Semantic search
results = articles.query.near_text(query="data extraction", limit=5)
for o in results.objects:
    print(f"{o.properties['title']} | Distance: {o.metadata.distance:.4f}")

client.close()

Key REST Endpoints

Endpoint	Description
/v1/schema	Manage classes/collections
/v1/objects	CRUD for data objects
/v1/graphql	GraphQL queries
/v1/batch/objects	Batch import
/v1/.well-known/ready	Health check
/v1/meta	Server metadata
/v1/nodes	Cluster node info

Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors

DEV Community