Weaviate Has a Free API — Heres How to Build AI-Native Search Applications

#ai #tutorial #opensource #database

Weaviate is an AI-native vector database with built-in vectorization, hybrid search, and generative modules. Store objects, auto-embed them, and search semantically — all in one platform.

Why Weaviate?

Auto-vectorization: Built-in OpenAI, Cohere, HuggingFace modules
Hybrid search: Combine vector + keyword (BM25) in one query
Generative search: RAG built into the database
GraphQL API: Rich query language
Multi-tenancy: Isolated data per tenant
Self-hosted or cloud: Both options

Docker Setup

services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:latest
    ports:
      - '8080:8080'
      - '50051:50051'
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,generative-openai'
      OPENAI_APIKEY: 'sk-your-key'
    volumes:
      - weaviate-data:/var/lib/weaviate

volumes:
  weaviate-data:

Create Schema

curl -X POST http://localhost:8080/v1/schema \
  -H 'Content-Type: application/json' \
  -d '{
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "moduleConfig": {
      "text2vec-openai": {"model": "text-embedding-3-small"}
    },
    "properties": [
      {"name": "title", "dataType": ["text"]},
      {"name": "content", "dataType": ["text"]},
      {"name": "category", "dataType": ["text"]}
    ]
  }'

Add Objects (Auto-Embedded)

curl -X POST http://localhost:8080/v1/objects \
  -H 'Content-Type: application/json' \
  -d '{
    "class": "Article",
    "properties": {
      "title": "Building RAG Applications",
      "content": "Retrieval-Augmented Generation combines search with LLMs...",
      "category": "AI"
    }
  }'

Weaviate automatically generates the embedding — no manual vectorization.

Semantic Search (GraphQL)

{
  Get {
    Article(
      nearText: { concepts: ["machine learning tutorials"] }
      limit: 5
    ) {
      title
      content
      category
      _additional { certainty distance }
    }
  }
}

Hybrid Search

{
  Get {
    Article(
      hybrid: {
        query: "RAG applications"
        alpha: 0.75
      }
      limit: 5
    ) {
      title
      content
    }
  }
}

alpha: 0 = pure keyword, alpha: 1 = pure vector.

Generative Search (RAG)

{
  Get {
    Article(
      nearText: { concepts: ["building APIs"] }
      limit: 3
    ) {
      title
      content
      _additional {
        generate(
          groupedResult: {
            task: "Summarize these articles into a beginner guide"
          }
        ) { groupedResult }
      }
    }
  }
}

Python Client

import weaviate

client = weaviate.connect_to_local()

articles = client.collections.get('Article')

# Semantic search
results = articles.query.near_text(query='how to build APIs', limit=5)
for obj in results.objects:
    print(obj.properties['title'])

# Hybrid search
results = articles.query.hybrid(query='REST API tutorial', alpha=0.7, limit=5)

client.close()

Real-World Use Case

A legal tech company indexed 100,000 contracts in Weaviate. Lawyers search by concept ('non-compete clause with 2-year restriction') instead of exact keywords. Search relevance improved 5x compared to Elasticsearch, and the generative module auto-summarizes matching clauses.

Need to automate data collection? Check out my Apify actors for ready-made scrapers, or email spinov001@gmail.com for custom solutions.

DEV Community