DEV Community: Sohan

Build a Multi-Tenant RAG with Fine-Grain Authorization using Motia and SpiceDB

Sohan — Fri, 21 Nov 2025 11:51:34 +0000

This post was inspired by Stardew Valley 😎

If I was hard-pressed to pick my favourite computer game of all time, I'd go with Stardew Valley (sorry, Dangerous Dave). The stats from my Nintendo Profile is all the proof you need:

Stardew Valley sits atop with 430 hours played and in second place is Mario Kart (not pictured) with ~45 hours played. That's a significant difference, and should indicate how much I adore this game.

I've been talking about the importance of Fine-Grained Authorization and RAG recently, so when I sat down to build a sample usecase for a production-grade RAG with Fine-Grained Permissions, my immediate thought went to Stardew Valley.

For those not familiar, Stardew Valley is a farm life simulation game where players manage a farm by clearing land, growing seasonal crops, and raising animals. So I thought I could build a logbook for a large farm that one could query using natural language processing. This usecase is ideal for RAG Pipelines (a technique that uses external data to improve the accuracy, relevancy, and usefulness of a LLM model’s output).

I focused on building something that was as close to production-grade as possible (and perhaps strayed from the original intent of a single farm) where an organization (not Joja Corporation though!) can own farms and data from the farms. The farms contain harvest data, users can log and query data for the farms they're part of. This provides a sticky situation for the authorization model. How does a LLM know who has access to what data?

Here's where SpiceDB and ReBAC was vital. By using metadata to indicate where the relevant embedings came from, the RAG system returned harvest data to the user only based on what data they had access to. In fact, OpenAI uses SpiceDB for their fine-grained authorization in ChatGPT Connectors.

While I know my way around SpiceDB and authorization, I needed help to build out the other components for a production-grade harvest logbook. So I reached out to my friend Rohit Ghumare from Motia for his expertise. Motia.dev is a backend framework that unifies APIs, background jobs, workflows, and AI Agents into a single core primitive with built-in observability and state management

Here's a photo of Rohit and myself at Kubecon Europe in 2025

What follows below is a tutorial-style post on building a Retrieval Augmented Generation system with fine-grained authorization using the Motia framework and SpiceDB. We'll use Pinecone as our vector database, and OpenAI as our LLM.

What You'll Build

In this tutorial, you'll create a complete RAG system with authorization that:

Stores harvest data and automatically generates embeddings for semantic search
Splits text into optimized chunks with overlap for better retrieval accuracy
Implements fine-grained authorization using SpiceDB's relationship-based access control
Queries harvest history using natural language with AI-powered responses
Returns contextually relevant answers with source citations from vector search
Supports multi-tenant access where users only see data they have permission to access
Logs all queries and responses for audit trails in CSV or Google Sheets
Runs as an event-driven workflow orchestrated through Motia's framework

By the end of the tutorial, you'll have a complete system that combines semantic search with multi-tenant authorization.

Prerequisites

Before starting the tutorial, ensure you have:

OpenAI API key for embeddings and chat
Pinecone account with an index created (1536 dimensions, cosine metric)
Docker installed for running SpiceDB locally

Getting Started

1. Create Your Motia Project

Create a new Motia project using the CLI:

npx motia@latest create

The installer will prompt you:

Template: Select Base (TypeScript)
Project name: Enter harvest-logbook-rag
Proceed? Type Yes

Navigate into your project:

cd harvest-logbook-rag

Your initial project structure:

harvest-logbook-rag/
├── src/
│   └── services/
│       └── pet-store/
├── steps/
│   └── petstore/
├── .env
└── package.json

The default template includes a pet store example. We'll replace this with our harvest logbook system. For more on Motia basics, see the Quick Start guide.

2. Install Dependencies

Install the SpiceDB client for authorization:

npm install @authzed/authzed-node

This is the only additional package needed.

3. Setup Pinecone

Pinecone will store the vector embeddings for semantic search.

Create a Pinecone Account

Go to app.pinecone.io and sign up
Create a new project

Create an Index

Click Create Index
Configure:

Name: harvest-logbook (or your preference)
Dimensions: 1536 (for OpenAI embeddings)
Metric: cosine
1. Click Create Index

Get Your Credentials

Go to API Keys in the sidebar
Copy your API Key
Go back to your index
Click the Connect tab
Copy the Host (looks like: your-index-abc123.svc.us-east-1.pinecone.io)

Save these for the next step.

4. Setup SpiceDB

SpiceDB handles authorization and access control for the system.

Start SpiceDB with Docker

Run this command to start SpiceDB locally:

docker run -d \
  --name spicedb \
  -p 50051:50051 \
  authzed/spicedb serve \
  --grpc-preshared-key "sometoken"

Verify SpiceDB is Running

Check that the container is running:

docker ps | grep spicedb

You should see output similar to:

6316f6cb50b4   authzed/spicedb   "spicedb serve --grp…"   31 seconds ago   Up 31 seconds   0.0.0.0:50051->50051/tcp   spicedb

SpiceDB is now running on localhost:50051 and ready to handle authorization checks.

5. Configure Environment Variables

Create a .env file in the project root:

# OpenAI (Required for embeddings and chat)
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx


# Pinecone (Required for vector storage)
PINECONE_API_KEY=pcsk_xxxxxxxxxxxxx
PINECONE_INDEX_HOST=your-index-abc123.svc.us-east-1.pinecone.io


# SpiceDB (Required for authorization)
SPICEDB_ENDPOINT=localhost:50051
SPICEDB_TOKEN=sometoken


# LLM Configuration (OpenAI is default)
USE_OPENAI_CHAT=true


# Logging Configuration (CSV is default)
USE_CSV_LOGGER=true

Replace the placeholder values with your actual credentials from the previous steps.

6. Initialize SpiceDB Schema

SpiceDB needs a schema that defines the authorization model for organizations, farms, and users.

Create the Schema File

Create src/services/harvest-logbook/spicedb.schema with the authorization model. A SpiceDB schema defines the types of objects found your application, how those objects can relate to one another, and the permissions that can be computed off of those relations.

Here's a snippet of the schema that defines user, organization and farm and the relations and permissions between them.

definition user {}


definition organization {
    relation admin: user
    relation member: user

    permission view = admin + member
    permission edit = admin + member
    permission query = admin + member
    permission manage = admin
}


definition farm {
    relation organization: organization
    relation owner: user
    relation editor: user
    relation viewer: user

    permission view = viewer + editor + owner + organization->view
    permission edit = editor + owner + organization->edit
    permission query = viewer + editor + owner + organization->query
    permission manage = owner + organization->admin
}

View the complete schema on GitHub

The schema establishes:

Organizations with admins and members
Farms with owners, editors, and viewers
Harvest entries linked to farms
Permission inheritance (org members can access farms in their org)

Create Setup Scripts

Create a scripts/ folder and add three files:

scripts/setup-spicedb-schema.ts - Reads the schema file and writes it to SpiceDB

View on GitHub

scripts/verify-spicedb-schema.ts - Verifies the schema was written correctly

View on GitHub

scripts/create-sample-permissions.ts - Creates sample users and permissions for testing

View on GitHub

Install Script Runner

npm install -D tsx

Add Scripts to package.json

"scripts": {
  "spicedb:setup": "tsx scripts/setup-spicedb-schema.ts",
  "spicedb:verify": "tsx scripts/verify-spicedb-schema.ts",
  "spicedb:sample": "tsx scripts/create-sample-permissions.ts"
}

Run the Setup

# Write schema to SpiceDB
npm run spicedb:setup

You should see output confirming the schema was written successfully:

Verify it was written correctly:

npm run spicedb:verify

This displays the complete authorization schema showing all definitions and permissions:

The output shows:

farm definition with owner/editor/viewer roles
harvest_entry definition linked to farms
organization definition with admin/member roles
query_session definition for RAG queries
Permission rules for each resource type

Create sample user (user_alice as owner of farm_1):

npm run spicedb:sample

This creates user_alice as owner of farm_1, ready for testing.

Your authorization system is now ready.

7. Start Development Server

Start the Motia development server:

npm run dev

The server starts at http://localhost:3000. Open this URL in your browser to see the Motia Workbench.

You'll see the default pet store example. We'll replace this with our harvest logbook system in the next sections.

Your development environment is now ready. All services are connected:

Motia running on localhost:3000
Pinecone index created and connected
SpiceDB running with schema loaded
Sample permissions created (user_alice owns farm_1)

Exploring the Project

Before we start building, let's understand the architecture we're creating.

System Architecture

┌─────────────────────────────────────────────────────────────┐
│  POST /harvest_logbook                                      │
│  (Store harvest data + optional query)                      │
└─────────┬───────────────────────────────────────────────────┘
          │
          ├─→ Authorization Middleware (SpiceDB)
          │   - Check user has 'edit' permission on farm
          │
          ├─→ ReceiveHarvestData Step (API)
          │   - Validate input
          │   - Emit events
          │
          ├─→ ProcessEmbeddings Step (Event)
          │   - Split text into chunks (400 chars, 40 overlap)
          │   - Generate embeddings (OpenAI)
          │   - Store vectors (Pinecone)
          │
          └─→ QueryAgent Step (Event) [if query provided]
              - Retrieve similar content (Pinecone)
              - Generate response (OpenAI/HuggingFace)
              - Emit logging event
              │
              └─→ LogToSheets Step (Event)
                  - Log query & response (CSV/Sheets)

The RAG Pipeline

Our system processes harvest data through these stages:

API Entry - Receive harvest data via REST endpoint
Text Chunking - Split content into overlapping chunks (400 chars, 40 overlap)
Embedding Generation - Convert chunks to vectors using OpenAI
Vector Storage - Store embeddings in Pinecone for semantic search
Query Processing - Search vectors and generate AI responses
Audit Logging - Log all queries and responses

Event-Driven Architecture

The system uses Motia's event-driven model:

API Steps handle HTTP requests
Event Steps process background tasks
Steps communicate by emitting and subscribing to events
Each step is independent and can be tested separately

Authorization Layer

Every API request passes through SpiceDB authorization:

Users have relationships with resources (owner, editor, viewer)
Permissions are checked before processing requests
Multi-tenant by design (users only access their farms)

What We'll Build

We'll create five main steps:

ReceiveHarvestData - API endpoint to store harvest entries
ProcessEmbeddings - Event handler for generating and storing embeddings
QueryAgent - Event handler for AI-powered queries
QueryOnly - Separate API endpoint for querying without storing data
LogToSheets - Event handler for audit logging

Each component is a single file in the steps/ directory. Motia automatically discovers and connects them based on the events they emit and subscribe to.

Step 1: Create the Harvest Entry API

What We're Building

In this step, we'll create an API endpoint that receives harvest log data and triggers the processing pipeline. This is the entry point that starts the entire RAG workflow.

Why This Step Matters

Every workflow needs an entry point. In Motia, API steps serve as the gateway between external requests and your event-driven system. By using Motia's api step type, you get automatic HTTP routing, request validation, and event emission, all without writing boilerplate server code. When a farmer calls this endpoint with their harvest data, it validates the input, checks authorization, stores the entry, and emits events that trigger the embedding generation and optional query processing.

Create the Step File

Create a new file at steps/harvest-logbook/receive-harvest-data.step.ts.

The complete source code for all steps is available on GitHub. You can reference the working implementation at any time.

View the complete Step 1 code on GitHub →

Now let's understand the key parts you'll be implementing:

Input Validation

const bodySchema = z.object({
  content: z.string().min(1, 'Content cannot be empty'),
  farmId: z.string().min(1, 'Farm ID is required for authorization'),
  metadata: z.record(z.any()).optional(),
  query: z.string().optional()
});

Zod validates that requests include the harvest content and farm ID. The query field is optional - if provided, the system will also answer a natural language question about the data after storing it.

Step Configuration

export const config: ApiRouteConfig = {
  type: 'api',
  name: 'ReceiveHarvestData',
  path: '/harvest_logbook',
  method: 'POST',
  middleware: [errorHandlerMiddleware, harvestEntryEditMiddleware],
  emits: ['process-embeddings', 'query-agent'],
  bodySchema
};

type: 'api' makes this an HTTP endpoint
middleware runs authorization checks before the handler
emits declares this step triggers embedding processing and optional query events
Motia handles all the routing automatically

Authorization Check

middleware: [errorHandlerMiddleware, harvestEntryEditMiddleware]

The harvestEntryEditMiddleware checks SpiceDB to ensure the user has edit permission on the specified farm. If authorization fails, the request is rejected before reaching the handler. Authorization info is added to the request for use in the handler.

View authorization middleware →

Handler Logic

export const handler: Handlers['ReceiveHarvestData'] = async (req, { emit, logger, state }) => {
  const { content, farmId, metadata, query } = bodySchema.parse(req.body);
  const entryId = `harvest-${Date.now()}`;

  // Store entry data in state
  await state.set('harvest-entries', entryId, {
    content, farmId, metadata, timestamp: new Date().toISOString()
  });

  // Emit event to process embeddings
  await emit({
    topic: 'process-embeddings',
    data: { entryId, content, metadata }
  });
};

The handler generates a unique entry ID, stores the data in Motia's state management, and emits an event to trigger embedding processing. If a query was provided, it also emits a query-agent event.

Event Emission

await emit({
  topic: 'process-embeddings',
  data: { entryId, content, metadata: { ...metadata, farmId, userId } }
});


if (query) {
  await emit({
    topic: 'query-agent',
    data: { entryId, query }
  });
}

Events are how Motia steps communicate. The process-embeddings event triggers the next step to chunk the text and generate embeddings. If a query was provided, the query-agent event runs in parallel to answer the question using RAG.

This keeps the API response fast as it returns immediately while processing happens in the background.

Test the Step

Open the Motia Workbench and test this endpoint:

Click on the harvest-logbook flow
Find POST /harvest_logbook in the sidebar
Click on it to open the request panel
Switch to the Headers tab and add:

   {
     "x-user-id": "user_alice"
   }

Switch to the Body tab and add:

   {
     "content": "Harvested 500kg of tomatoes from field A. Weather was sunny.",
     "farmId": "farm_1",
     "metadata": {
       "field": "A",
       "crop": "tomatoes"
     }
   }

Click Send button.

You should see a success response with the entry ID. The Workbench will show the workflow executing in real-time, with events flowing to the next steps.

Step 2: Process Embeddings

What We're Building

This event handler takes the harvest data from Step 1, splits it into chunks, generates vector embeddings, and stores them in Pinecone for semantic search.

Why This Step Matters

RAG systems need to break down large text into smaller chunks for better retrieval accuracy. By chunking text with overlap and generating embeddings for each piece, we enable semantic search that finds relevant context even when queries don't match exact keywords.

This step runs in the background after the API returns, keeping the user experience fast while handling the background work of embedding generation and vector storage.

Create the Step File

Create a new file at steps/harvest-logbook/process-embeddings.step.ts.

View the complete Step 2 code on GitHub →

Now let's understand the key parts you'll be implementing:

Input Schema

const inputSchema = z.object({
  entryId: z.string(),
  content: z.string(),
  metadata: z.record(z.any()).optional()
});

This step receives the entry ID, content, and metadata from the previous step's event emission.

Step Configuration

export const config: EventConfig = {
  type: 'event',
  name: 'ProcessEmbeddings',
  subscribes: ['process-embeddings'],
  emits: [],
  input: inputSchema
};

type: 'event' makes this a background event handler
subscribes: ['process-embeddings'] listens for events from Step 1
No emits - this is the end of the embedding pipeline

Text Chunking

const vectorIds = await HarvestLogbookService.storeEntry({
  id: entryId,
  content,
  metadata,
  timestamp: new Date().toISOString()
});

The service handles text splitting (400 character chunks with 40 character overlap), embedding generation via OpenAI, and storage in Pinecone. This chunking strategy ensures semantic continuity across chunks.

View text splitter service →

Embedding Generation

The OpenAI service generates 1536-dimension embeddings for each text chunk using the text-embedding-ada-002 model.

View OpenAI service →

Vector Storage

await state.set('harvest-vectors', entryId, {
  vectorIds,
  processedAt: new Date().toISOString(),
  chunkCount: vectorIds.length
});

After storing vectors in Pinecone, the step updates Motia's state with the vector IDs for tracking. Each chunk gets a unique ID like harvest-123-chunk-0, harvest-123-chunk-1, etc.

View Pinecone service →

The embeddings are now stored and ready for semantic search when users query the system.

Test the Step

Step 2 runs automatically when Step 1 emits the process-embeddings event. To test it:

Send a request to the POST /harvest_logbook endpoint (from Step 1)
In the Workbench, watch the workflow visualization
You'll see the ProcessEmbeddings step activate automatically
Check the Logs tab at the bottom to see:

Text chunking progress
Embedding generation
Vector storage confirmation

The step completes when you see "Successfully stored embeddings" in the logs. The vectors are now in Pinecone and ready for semantic search.

Step 3: Query Agent

What We're Building

This event handler performs the RAG query, it searches Pinecone for relevant content, retrieves matching chunks, and uses an LLM to generate natural language responses based on the retrieved context.

Why This Step Matters

This is where retrieval-augmented generation happens. Instead of the LLM generating responses from its training data alone, it uses actual harvest data from Pinecone as context. This ensures accurate, source-backed answers specific to the user's farm data.

The step supports both OpenAI and HuggingFace LLMs, giving you flexibility in choosing your AI provider based on cost and performance needs.

Create the Step File

Create a new file at steps/harvest-logbook/query-agent.step.ts.

View the complete Step 3 code on GitHub →

Now let's understand the key parts you'll be implementing:

Input Schema

const inputSchema = z.object({
  entryId: z.string(),
  query: z.string(),
  conversationHistory: z.array(z.object({
    role: z.enum(['user', 'assistant', 'system']),
    content: z.string()
  })).optional()
});

The step receives the query text and optional conversation history for multi-turn conversations.

Step Configuration

export const config: EventConfig = {
  type: 'event',
  name: 'QueryAgent',
  subscribes: ['query-agent'],
  emits: ['log-to-sheets'],
  input: inputSchema
};

subscribes: ['query-agent'] listens for query events from Step 1
emits: ['log-to-sheets'] triggers logging after generating response

RAG Query Process

const agentResponse = await HarvestLogbookService.queryWithAgent({
  query,
  conversationHistory
});

The service orchestrates the RAG pipeline: embedding the query, searching Pinecone for similar vectors, extracting context from top matches, and generating a response using the LLM.

View RAG orchestration service →

Vector Search

The query is embedded using OpenAI and searched against Pinecone to find the top 5 most similar chunks. Each result includes a similarity score and the original text.

View Pinecone query implementation →

LLM Response Generation

await state.set('agent-responses', entryId, {
  query,
  response: agentResponse.response,
  sources: agentResponse.sources,
  timestamp: agentResponse.timestamp
});

The LLM generates a response using the retrieved context. The system supports both OpenAI (default) and HuggingFace, controlled by the USE_OPENAI_CHAT environment variable. The response includes source citations showing which harvest entries informed the answer.

View OpenAI chat service →\
View HuggingFace service →

Event Emission

await emit({
  topic: 'log-to-sheets',
  data: {
    entryId,
    query,
    response: agentResponse.response,
    sources: agentResponse.sources
  }
});

After generating the response, the step emits a logging event to create an audit trail of all queries and responses.

Test the Step

Step 3 runs automatically when you include a query field in the Step 1 request. To test it:

Send a request to POST /harvest_logbook with a query:

   {
     "content": "Harvested 500kg of tomatoes from field A. Weather was sunny.",
     "farmId": "farm_1",
     "query": "What crops did we harvest?"
   }

In the Workbench, watch the QueryAgent step activate
Check the Logs tab to see:

Query embedding generation
Vector search in Pinecone
LLM response generation
Source citations

The step completes when you see the AI-generated response in the logs. The query and response are automatically logged by Step 5.

Step 4: Query-Only Endpoint

What We're Building

This API endpoint allows users to query their existing harvest data without storing new entries. It's a separate endpoint dedicated purely to RAG queries.

Why This Step Matters

While Step 1 handles both storing and optionally querying data, users often need to just ask questions about their existing harvest logs. This dedicated endpoint keeps the API clean and focused - one endpoint for data entry, another for pure queries.

This separation also makes it easier to apply different rate limits or permissions between data modification and read-only operations.

Create the Step File

Create a new file at steps/harvest-logbook/query-only.step.ts.

View the complete Step 4 code on GitHub →

Now let's understand the key parts you'll be implementing:

Input Validation

const bodySchema = z.object({
  query: z.string().min(1, 'Query cannot be empty'),
  farmId: z.string().min(1, 'Farm ID is required for authorization'),
  conversationHistory: z.array(z.object({
    role: z.enum(['user', 'assistant', 'system']),
    content: z.string()
  })).optional()
});

The request requires a query and farm ID. Conversation history is optional for multi-turn conversations.

Step Configuration

export const config: ApiRouteConfig = {
  type: 'api',
  name: 'QueryHarvestLogbook',
  path: '/harvest_logbook/query',
  method: 'POST',
  middleware: [errorHandlerMiddleware, harvestQueryMiddleware],
  emits: ['query-agent']
};

path: '/harvest_logbook/query' creates a dedicated query endpoint
harvestQueryMiddleware checks for query permission (not edit)
emits: ['query-agent'] triggers the same RAG query handler as Step 3

Authorization Middleware

middleware: [errorHandlerMiddleware, harvestQueryMiddleware]

The harvestQueryMiddleware checks SpiceDB for query permission. This is less restrictive than edit - viewers can query but cannot modify data.

View authorization middleware →

Handler Logic

export const handler: Handlers['QueryHarvestLogbook'] = async (req, { emit, logger }) => {
  const { query, farmId } = bodySchema.parse(req.body);
  const queryId = `query-${Date.now()}`;

  await emit({
    topic: 'query-agent',
    data: { entryId: queryId, query }
  });

  return {
    status: 200,
    body: { success: true, queryId }
  };
};

The handler generates a unique query ID and emits the same query-agent event used in Step 1. This reuses the RAG pipeline from Step 3 without duplicating code.

The API returns immediately with the query ID. The actual processing happens in the background, and results are logged by Step 5.

Test the Step

This is the dedicated query endpoint. Test it directly:

Click on POST /harvest_logbook/query in the Workbench
Add the header:

   {
     "x-user-id": "user_alice"
   }

Add the body:

   {
     "query": "What crops did we harvest?",
     "farmId": "farm_1"
   }

Click Send

You'll see a 200 OK response with the query ID. In the Logs tab, watch for:

QueryHarvestLogbook - Authorization and query received
QueryAgent - Querying AI agent
QueryAgent - Agent query completed

The query runs in the background and results are logged by Step 5. This endpoint is perfect for read-only query operations without storing new data.

Step 5: Log to Sheets

What We're Building

This event handler creates an audit trail by logging every query and its AI-generated response. It supports both local CSV files (for development) and Google Sheets (for production).

Why This Step Matters

Audit logs are essential for understanding how users interact with your system. They help with debugging, monitoring usage patterns, and maintaining compliance. By logging queries and responses, you can track what questions users ask, identify common patterns, and improve the system over time.

The dual logging strategy (CSV/Google Sheets) gives you flexibility, use CSV locally for quick testing, then switch to Google Sheets for production without changing code.

Create the Step File

Create a new file at steps/harvest-logbook/log-to-sheets.step.ts.

View the complete Step 5 code on GitHub →

Now let's understand the key parts you'll be implementing:

Input Schema

const inputSchema = z.object({
  entryId: z.string(),
  query: z.string(),
  response: z.string(),
  sources: z.array(z.string()).optional()
});

The step receives the query, AI response, and optional source citations from Step 3.

Step Configuration

export const config: EventConfig = {
  type: 'event',
  name: 'LogToSheets',
  subscribes: ['log-to-sheets'],
  emits: [],
  input: inputSchema
};

subscribes: ['log-to-sheets'] listens for logging events from Step 3
No emits - this is the end of the workflow

Logging Service Selection

const useCSV = process.env.USE_CSV_LOGGER === 'true' || !process.env.GOOGLE_SHEETS_ID;


await HarvestLogbookService.logToSheets(query, response, sources);

The service automatically chooses between CSV and Google Sheets based on environment variables. This keeps the step code simple while supporting different deployment scenarios.

View CSV logger →\
View Google Sheets service →

Error Handling

try {
  await HarvestLogbookService.logToSheets(query, response, sources);
  logger.info(`Successfully logged to ${destination}`);
} catch (error) {
  logger.error('Failed to log query response');
  // Don't throw - logging failures shouldn't break the main flow
}

The step catches logging errors without throwing. This ensures that even if logging fails, the main workflow completes successfully. Users get their query results even if the audit log has issues.

CSV Output Format

The CSV logger saves entries to logs/harvest_logbook.csv with these columns:

Timestamp
Query
Response
Sources (comma-separated)

Each entry is automatically escaped to handle quotes and commas in the content.

Test the Step

Step 5 runs automatically after Step 3 completes. To verify it's working:

Run a query using POST /harvest_logbook/query
Check the Logs tab for LogToSheets entries
Verify the CSV file was created:

   cat logs/harvest_logbook.csv

You should see your query and response logged with a timestamp. Each subsequent query appends a new row to the CSV file.

Testing the System

Now that all steps are built, let's test the complete workflow using the Motia Workbench.

Start the Server

npm run dev

Open http://localhost:3000 in your browser to access the Workbench.

Test 1: Store Harvest Data

Select the harvest-logbook flow from the dropdown
Find the POST /harvest_logbook endpoint in the workflow
Click on it to open the request panel
Add the authorization header:

   {
     "x-user-id": "user_alice"
   }

Set the request body:

   {
     "content": "Harvested 500kg of tomatoes from field A. Weather was sunny, no pest damage observed.",
     "farmId": "farm_1",
     "metadata": {
       "field": "A",
       "crop": "tomatoes",
       "weight_kg": 500
     }
   }

Click Play Button

Watch the workflow execute in real-time. You'll see:

Authorization check passes (user_alice has edit permission)
Text chunked into embeddings
Vectors stored in Pinecone
Success response returned

Test 2: Query the Data

Find the POST /harvest_logbook/query endpoint
Add the authorization header:

   {
     "x-user-id": "user_alice"
   }

Set the request body:

   {
     "farmId": "farm_1",
     "query": "What crops did we harvest recently?"
   }

Click Send

Watch the RAG pipeline execute:

Query embedded via OpenAI
Similar vectors retrieved from Pinecone
AI generates response with context
Query and response logged to CSV

Test 3: Verify Authorization

Try querying as a user without permission:

Use the same query endpoint
Change the header:

   {
     "x-user-id": "user_unauthorized"
   }

Click Send

You'll see a 403 Forbidden response - authorization works correctly.

View the Logs

Check the audit trail:

cat logs/harvest_logbook.csv

You'll see all queries and responses logged with timestamps.

The Workbench also provides trace visualization showing exactly how data flows through each step, making debugging straightforward.

Conclusion

You've built a complete RAG system with multi-tenant authorization using Motia's event-driven framework. You learned how to:

Build event-driven workflows with Motia steps
Implement RAG with text chunking, embeddings, and vector search
Add fine-grained authorization using SpiceDB's relationship model
Handle async operations with event emission
Integrate multiple services (OpenAI, Pinecone, SpiceDB)

Your system now handles:

Semantic search over harvest data with AI-powered embeddings
Natural language querying with contextually relevant answers
Multi-tenant access control with role-based permissions
Event-driven processing for fast API responses
Audit logging for compliance and debugging
Flexible LLM options (OpenAI or HuggingFace)

Your RAG system is ready to help farmers query their harvest data naturally while keeping data secure with proper authorization.

Final Thoughts

This was a fun exercise in tackling a complex authorization problem and also building something production-grade. I also got to play out some of my Stardew Valley fancies IRL. Maybe it's time I actually move to a cozy farm and grow my own crops (so long as teh farm has a good Internet connection!)

The repository can be found on the Motia GitHub.

Feel free to reach out to us on LinkedIn or jump into the SpiceDB Discord if you have any questions. Happy farming!

Friends Don't Let Friends Write Custom Authorization Code

Sohan — Thu, 17 Jul 2025 09:56:58 +0000

Let’s get right to it:

🗣️ Never write your own authorization code. Just don’t

You’ve probably done it. We all have. That innocent-looking if statement checking user roles:

func myApi() {
  roles := fetch_roles_for(request.user)
  if "admin" in roles || "editor" in roles {
      approve()
  }
}

It feels fine… until six months later, you’re drowning in permission bugs, scrambling to debug why Bob from Finance can edit your production database. Sound familiar?

Let’s unpack why this happens—and why you should stop before writing another line of custom AuthZ code.

🧠 Code Is Debt, Especially AuthZ Code

Every line of authorization code you write is debt:

It must be tested.
It must be maintained.
It must be reviewed and secured.

AuthZ bugs are security bugs. One wrong if statement, and your customer’s sensitive data leaks. Good luck explaining that breach to your CISO or customers. Just look at the number of recent data breaches that have occurred thanks to broken access control.

🔄 Hard to Evolve

Here’s a scenario:
Your simple role checks work until your CTO suddenly says,

"We need fine-grained access control for every document by next quarter."

Now your tidy if role == 'admin' logic needs to handle document-level permissions, delegation, and auditing. You’re staring at your old code thinking: "How did we get here?”

Even worse if you’re a polyglot shop: you now need to replicate your homegrown logic across Go, Node.js, Python, and Java… or expose it as an RPC service and accidentally create a distributed dependency nightmare.

📉 Role-Based Agony (RBAC Explosion)

Role-Based Access Control (RBAC) always starts simple:

Admin
Editor
Viewer

Then marketing wants a custom dashboard.
Finance wants read-only access to sales data.
Support needs limited write access to customer records.

Before long you’ve got 50+ roles hardcoded across endpoints and environments. Want to change one? Enjoy your week-long deploy cycle and pray nothing breaks.

Can I offer you a nice egg in this trying time?

🧍 Authorization Is a Human Problem, Too

Here’s the thing:

Who decides access?

Well typically that is done on the Business side of things - HR, InfoSec, People Managers and the like.

Who implements it?

You, the engineer.

When these two worlds drift apart, your authorization logic becomes a graveyard of outdated roles, guesswork, and subtle bugs nobody understands.

Your codebase reflects services, modules, endpoints.
Your access model reflects teams, departments, projects. Trying to map one to the other is a recipe for friction—and failure.

🚨 Distributed Mess

In a distributed system, your goal is simple: make many computers behave like one computer. But when every service embeds its own ad-hoc AuthZ logic, subtle differences creep in:

Marketing thinks they have access.
Engineering says otherwise.

Your access control looks like spaghetti… but distributed spaghetti.

🔬 You’re Not an Authorization Expert

Let’s be blunt:
You wouldn’t write your own database, would you? (Please say no.)

Authorization research is as old as database research. Why reinvent the wheel on one of your application’s most critical paths?

⚙️ AuthZ Must Perform at Scale

Authorization isn’t just about correctness, it’s also about performance. After all permission checks are on the critical path.

Fast checks at low latency.
Always available.
Resilient under load.

This isn’t something you hack together in a sprint. It’s infrastructure-grade engineering that involves thinking about caching, distributed consistency, failover, and observability.

Why This Matters Now?

Sure, maybe a few years ago you could “get away with it.” But today’s requirements demand:

✅ Fine-grained permissions
✅ Microservices architectures
✅ Global scale
✅ Collaborative apps with dynamic access policies

That dials up the complexity all the way up to 11.

And the stakes are higher:
OWASP’s current Top 10 security risks for web apps puts Broken Access Control at #1 but it doesn't have to be.

image source: owasp.org

Don't believe me? Here's an example from the industry:

In the 2010s, Broken Authentication was consistently in the Top 3 in OWASP's lists¹. So as an industry we:

✅ Stopped writing our own authentication
✅ Adopted off-the-shelf identity providers

Result?
The latest OWASP list puts 'Identification and Authentication Failures' all the way down at #7

It’s time we do the same for authorization.

The Better Way: Centralized Authorization

The good news? You have options:

OWASP themselves recommend adopting modern models like ABAC (Attribute-Based Access Control) or ReBAC (Relationship-Based Access Control) to fix broken access control.

Using centralized, open source AuthZ systems (such as SpiceDB, inspired by Google Zanzibar) offer:

Fine-grained checks
Fast, low-latency queries
Clear separation of business logic and access policy

In short: let experts design, build, and maintain the thing that keeps your user data secure—so you can focus on building your product.

🔔 Final word

Before I get flamed in the comments, here's a disclaimer:

If you’re writing a weekend hobby app, maybe a simple if role == admin is fine.
If you’re scaling a business or product, you can’t afford to wing it anymore.

Remember:
“Just because you can write your own AuthZ doesn’t mean you should.”

Got a custom AuthZ horror story? Share it in the comments 👉

[1]Broken Authentication was #2 in 2017, #2 in 2013, #3 in 2010
(source)

Cover Photo by Diego PH on Unsplash

Beware of the New Enemy Problem ⚠️

Sohan — Thu, 06 Mar 2025 12:36:29 +0000

Google Zanzibar is a globally distributed authorization system capable of processing "more than 10 million client queries per second," and powers all of Google's services including YouTube, Docs and Cloud IAM. Zanzibar was first described in a paper presented in 2019 and can be read here.

In this paper, there's the first mention of The New Enemy Problem - Described as: “[a failure] to respect the ordering between ACL updates or when we apply old ACLs to new content.”

Essentially, this is the name Google gave to two undesirable properties we wish to prevent in a distributed authorization system:

Neglecting an Access Control List (ACL) update order
Misapplying old ACLs to new content

In other words, the New Enemy Problem is a scenario where unauthorized access can occur when changes to permissions and the resources they protect are not updated together consistently.

A New Enemy Example

Here's an example featuring Lex (a bad actor) and Kara

Essentially, Lex had permissions to view SeCrEt PlaNs but his access was revoked. But due to a stale ACL check, he is now able to read this document. In this example, Lex is the New Enemy.

Where New Enemy is Present

There are different ways Authorization is implemented in the industry today. The most common pattern is for organizations to implement their own custom Authorization. Some of these approaches can be prone to not solving the New Enemy Problem.

For example: I'd earlier discussed why using JSON Web Tokens (JWTs) for Authorization isn't a good idea and one of the reasons is because of the New Enemy Problem. You could revoke someone's access on your server, but they'd still have access if they're holding a valid JWT from before.

How Zanzibar solves it

In the Zanzibar paper there is a description to get an opaque token to a snapshot of the permissions as evaluated at a single point in time. This token is called a Zookie (possibly a portmanteau of Zanzibar and cookie). By combining a token which represents the exact permissions used to protect a specific version of the content, and the content itself, we can make sure that the permissions we use to check access to that content in the future is at least as fresh as the permissions when the content was created.

Here's an example of how it works in SpiceDB - a database inspired by Google Zanzibar. A zookie is passed when making a permissions check request, and guarantees that the policy and individual relationships used to compute the answer will be at least as fresh as the Zookie presented requires.

def write_content(user, content_id, new_content):
    is_allowed, zookie = authzed.content_change_check(content_id, user)
    if is_allowed:
        storage.write_content(contentd_id, new_content, zookie)
        return success
    return forbidden

And when accessing the data, we use the following pseudocode:

def read_content(user, content_id):
    content, zookie = storage.get_content(content_id)
    is_allowed = authzed.check(content_id, user, zookie)
    if is_allowed:
        return content
    return forbidden

This is a mechanism for enforcing that we will never give access to a version of the content to which the user has had their access revoked! New Enemy problem solved.

Here's our previous example, but this time with Zookies.

I did it all for the Zookie

You may have picked up on the fact that there are probably some inconsistencies that can be introduced by using a version of the permissions that are at least as fresh, but not always the exact current permissions. So what are they?

Let’s say at some point a user is granted access to a document in a way that doesn’t cause the document to store a new zookie. It may take some time for that access grant to propagate everywhere, and we may issue some false negative responses to check requests. This is an explicit choice to improve the performance of the system, while always guaranteeing that no false positives are ever issued.

Permissions mutations also return a Zookie so if you can easily identify the content to which permission is being granted, you can optionally update the zookie on the content when the new permissions are granted. This will enforce a causal ordering between the permissions change and the next access request! This can alleviate the problem with false negatives, by trading off higher load to the datastore which stores the content. This will make sense for some use cases, but not for others.

New Enemies New Friends

Open Worldwide Application Security Project (OWASP) publishes a "Top 10 Security Risks for Web Apps" list and currently Broken Access Control sits at the top of their list. Authorization systems based on Google Zanzibar such as SpiceDB solve for the New Enemy Problem, and in turn improve Access Control mechanisms in your software.

secretplans.doc can't be found

Let me know in the comments how you've solved for the New Enemy problem in your Authorization code.

Safeguarding Your Data When Using DeepSeek R1 In RAG Pipelines - Part II

Sohan — Fri, 31 Jan 2025 20:04:15 +0000

In Part I we learnt about why we should secure our RAG pipelines with Fine Grained Authorization, and also what are the methods to do so.

Let's now get our hands dirty and write code to actually do so.

We'll authorizing access to view blog articles and get information from it. We'll see what happens when a request is authorized and when it isn't. Here's our RAG pipeline with the software we're using.

1. Let's Talk Schema!

Let's set up our permissions system. Once you've installed SpiceDB, create a schema about two objects: users and articles. The setup is simple - users can be "viewers" of articles, and if you're tagged as a viewer, you get the all-access pass to view that article.

from authzed.api.v1 import (
    Client,
    WriteSchemaRequest,
)

import os

#change to bearer_token_credentials if you are using tls
from grpcutil import insecure_bearer_token_credentials

SCHEMA = """definition user {}

definition article {
    relation viewer: user

    permission view = viewer
}"""

client = Client(os.getenv('SPICEDB_ADDR'), insecure_bearer_token_credentials(os.getenv('SPICEDB_API_KEY')))

try:
    resp = await(client.WriteSchema(WriteSchemaRequest(schema=SCHEMA)))
except Exception as e:
    print(f"Write schema error: {type(e).__name__}: {e}")

2. Write a Relationship

Alright, first things first - we're gonna tell SpiceDB that Tim should be able to peek at document 123 and document 456. Think of it like giving Tim a special pass to view these specific files.

This is how we write a Relationship in SpiceDB. Once we've done this, SpiceDB will know exactly what Tim can and can't see.

from authzed.api.v1 import (
    ObjectReference,
    Relationship,
    RelationshipUpdate,
    SubjectReference,
    WriteRelationshipsRequest,
)

try:
    resp = await (client.WriteRelationships(
        WriteRelationshipsRequest(
            updates=[
                RelationshipUpdate(
                    operation=RelationshipUpdate.Operation.OPERATION_TOUCH,
                    relationship=Relationship(
                        resource=ObjectReference(object_type="article", object_id="123"),
                        relation="viewer",
                        subject=SubjectReference(
                            object=ObjectReference(
                                object_type="user",
                                object_id="tim",
                            )
                        ),
                    ),
                ),
                RelationshipUpdate(
                    operation=RelationshipUpdate.Operation.OPERATION_TOUCH,
                    relationship=Relationship(
                        resource=ObjectReference(object_type="article", object_id="456"),
                        relation="viewer",
                        subject=SubjectReference(
                            object=ObjectReference(
                                object_type="user",
                                object_id="tim",
                            )
                        ),
                    ),
                ),
            ]
        )
    ))
except Exception as e:
    print(f"Write relationships error: {type(e).__name__}: {e}")

3. Writing to our Vector DB

Pinecone is a vector database where we store our embeddings. Let's set up our Pinecone serverless index - don't worry, it's not as complicated as it sounds!

#from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
from pinecone import Pinecone
import os

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

index_name = "oscars"

pc.create_index(
    name=index_name,
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

Here's where it gets fun - we're going to create a totally made-up fact: "Bill Gates won the 2025 Oscar for best football movie." (I know, wild right? 😄). We're using this made-up fact to show how RAG handles information that LLMs don't already know about.

We'll also add a little tag (article_id) to keep track of where this info came from. This is super important because it helps us link everything back to our permission system.

from langchain_pinecone import PineconeEmbeddings
from langchain_pinecone import PineconeVectorStore

from langchain.schema import Document
import os

# Create a Document object that specifies our made up article and specifies the document_id as metadata.
text = "Bill Gates won the 2025 Oscar for best football movie"
metadata = {
    "article_id": "123"
}
document = Document(page_content=text,metadata=metadata)


# Initialize a LangChain embedding object.
model_name = "multilingual-e5-large"
embeddings = PineconeEmbeddings(
    model=model_name,
    pinecone_api_key=os.environ.get("PINECONE_API_KEY")
)

namespace_name = "oscar"

# Upsert the embedding into your Pinecone index.
docsearch = PineconeVectorStore.from_documents(
    documents=[document],
    index_name=index_name,
    embedding=embeddings,
    namespace=namespace_name
)

4. Checking Tim's VIP Permissions

Now comes the cool part! We'll ask SpiceDB what documents Tim can actually see. This is how you can check for permissions and look up resources in SpiceDB. Here we're using the LookupResources API to get a list of articles that Tim has permission to view.

from authzed.api.v1 import (
    LookupResourcesRequest,
    ObjectReference,
    SubjectReference,
)

subject = SubjectReference(
    object=ObjectReference(
        object_type="user",
        object_id="tim",
    )
)

def lookupArticles():
    return client.LookupResources(
        LookupResourcesRequest(
            subject=subject,
            permission="view",
            resource_object_type="article",
        )
    )
try:
    resp = lookupArticles()

    authorized_articles = []

    async for response in resp:
            authorized_articles.append(response.resource_object_id)
except Exception as e:
    print(f"Lookup error: {type(e).__name__}: {e}")

print("Article IDs that Tim is authorized to view:")
print(authorized_articles)

Output:

Article IDs that Tim is authorized to view:
['123', '456']

With that sorted, we can chat with our DeepSeek R1 model, but only about stuff Tim's allowed to see. It's like having a really smart assistant who's also great at keeping secrets!

Quick side notes:

We're using OpenRouter to access the DeepSeek R1 LLM
We're sticking with OpenAI for the embeddings part because they're pretty much the gold standard for this kind of thing.

from langchain_community.chat_models import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import (
    RunnableParallel,
    RunnablePassthrough
)
import os

# Custom wrapper for OpenRouter
class ChatOpenRouter(ChatOpenAI):
    openai_api_base: str
    openai_api_key: str
    model_name: str

    def __init__(self,
                 model_name: str,
                 openai_api_base: str = "https://openrouter.ai/api/v1",
                 **kwargs):
        openai_api_key = os.environ.get("OPENROUTER_API_KEY") 
        super().__init__(openai_api_base=openai_api_base,
                         openai_api_key=openai_api_key,
                         model_name=model_name, **kwargs)

# Define the ask function
def ask():

    # Initialize a LangChain object for DeepSeek via OpenRouter.
    llm = ChatOpenRouter(
      model_name="deepseek/deepseek-r1-distill-llama-70b",
      max_tokens=None,
      max_retries=2,
    )

    # Initialize a LangChain object for a Pinecone index with OpenAI embeddings model.
    knowledge = PineconeVectorStore.from_existing_index(
        index_name=index_name,
        namespace=namespace_name,
        embedding=OpenAIEmbeddings(
            openai_api_key=os.environ.get("OPENAI_API_KEY"),
            dimensions=1024,
            model="text-embedding-3-large"
        )
    )

    # Initialize a retriever with a filter that restricts the search to authorized documents.
    retriever = knowledge.as_retriever(
        search_kwargs={
            "filter": {
                "article_id":
                    {"$in": authorized_articles},
            },
        }
    )

    # Initialize a string prompt template for context and question.
    prompt = ChatPromptTemplate.from_template(
        "Answer the question below using the context:\n\nContext: {context}\nQuestion: {question}\nAnswer:"
    )

    # Combine retrieval and prompt to pass through DeepSeek LLM via OpenRouter
    retrieval = RunnableParallel(
        {"context": retriever, "question": RunnablePassthrough()}
    )
    chain = retrieval | prompt | llm | StrOutputParser()

    # Example question
    question = "Who won the 2025 Oscar for best football movie?"

    print("Prompt: \n")
    print(question)
    result = chain.invoke(question)
    print(result)


# Invoke the ask function
ask()

Output:

Prompt: 

Who won the 2025 Oscar for best football movie?
Bill Gates won the 2025 Oscar for best football movie.

Answer: Bill Gates

There you go! Our RAG pipeline got this information that LLM didn't already know about.

5. What Happens When Tim's Pass Expires?

Let's shake things up and see what happens when Tim loses access to some docs.

First step: we're gonna revoke Tim's viewing privileges fora document. This code snippet updates a relationship between Tim and document 123

try: 
    resp = await client.WriteRelationships(
        WriteRelationshipsRequest(
            updates=[
                RelationshipUpdate(
                    operation=RelationshipUpdate.Operation.OPERATION_DELETE,
                    relationship=Relationship(
                        resource=ObjectReference(object_type="article", object_id="123"),
                        relation="viewer",
                        subject=SubjectReference(
                            object=ObjectReference(
                                object_type="user",
                                object_id="tim",
                            )
                        ),
                    ),
                ),
            ]
        )
    )
except Exception as e:
    print(f"Write relationships error: {type(e).__name__}: {e}")

Then we'll double-check what Tim can still see.

#this function was defined above
try:
        resp = lookupArticles()

        authorized_articles = []

        async for response in resp:
                authorized_articles.append(response.resource_object_id)
except Exception as e:
    print(f"Lookup error: {type(e).__name__}: {e}")

print("Documents that Tim can view:")
print(authorized_articles)

Output:

Documents that Tim can view:
['456']

Tim's lost access to document_123 which had the vital piece of info about the "2025 Oscar for Best Football Movie".

Time to try our query again!

#this function was defined above
ask()

Output

Prompt: 

Who won the 2025 Oscar for best football movie?
The 2025 Oscars, which honored films released in 2024, did not include a category for "best football movie." The Academy Awards do not have a specific category dedicated to sports films or football-themed movies. Therefore, no award was given in that non-existent category. It's possible there might be confusion with another award ceremony that recognizes sports-related films. 

Answer: No one won an Oscar for best football movie in 2025 because the Academy Awards do not have such a category.

And... plot twist! The system won't spill the beans anymore because Tim's not authorized to see that document. It's like trying to read a book that's been checked out of the library.

Conclusion

This was a step-by-step guide on how you can have fine grained authorization for your RAG pipelines. Do you have other ways of writing authorization logic for your LLMs and RAGs? Let me know in the comments!

As for the image: Well this is what DALL-E thinks what "Bill Gates won the 2025 Oscar for best football movie" looks like!

As promised, here is a link to the working Jupyter Notebook. Have fun!

Safeguarding Your Data When Using DeepSeek R1 In RAG Pipelines - Part 1

Sohan — Fri, 31 Jan 2025 19:38:58 +0000

DeepSeek is the talk of the tech world right now, and rightfully so!

If you're implementing the DeepSeek Large Language Model (or any LLM for that matter) in your Retrieval-Augmented Generation (RAG) Pipeline, you have to ensure that the LLM accesses only the data its authorized to.

This guide will walk you through the nuts and bolts of securing your RAG pipelines with Fine Grained Authorization while also about making your queries secure and super efficient! There's also a notebook linked at the end if you want to look at some code.

Note: This example uses DeepSeek R1 but works with any LLM. Using Authorization for RAG Pipelines is a best practice regardless of which LLM and Emebedding model you are using.

How is this image relevant? It's relevant to our RAG Pipeline and you'll find out how at the end of this guide. 🤭

Software used in this guide:

DeepSeek R1 LLM (through OpenRouter)
OpenAI for Embeddings
SpiceDB for permissions
Pinecone as our Vector Database
Langchain for language model integration

Why is this important?

Because we now need to think of Day2 AI Ops.

Enterprises are working extra hard to keep sensitive info (like personal details and company secrets) from leaking out. The go-to solution? Setting up some solid guardrails around RAG to keep data safe while making sure everything runs smoothly and efficiently.

To get these guardrails just right, you need to set up some smart permission systems that can keep track of who can see what and which resources they can access.

How It Works

Let me break down how a typical RAG pipeline works - it's pretty straightforward with two main parts:

1. Ingestion

Think of this as preparing your knowledge base. We grab all sorts of data, clean it up a bit, turn it into embeddings (vectors that represent real-world objects), and store them in a vector database. It's like organizing your digital library, where each book (or document) gets a special tag - like "document123" - so we can keep track of where everything came from.

2. Query & Response

Here's where it gets fun! When someone asks the chatbot a question, it transforms their question into the same kind of embedding format and goes hunting through the vector database for relevant matches. It's like having a super-smart librarian who knows exactly where to look! Once it finds the answer, the chatbot feeds this information to the LLM, which crafts a nice, helpful response based on what it found.

But here's the catch - and it's a big one - this setup is missing something crucial: authorization checks! 🚨

For example, if someone who shouldn't have access to sensitive financial data asks "What was our Q4 revenue?", they might get an answer they're not supposed to see. Not ideal, right?

Authorization, ReBAC & SpiceDB

In case you're new to the world of AuthZ, here's a quick primer:

Authorization determines whether you have permission to access a resource. Traditional models like Role-Based Access Control (RBAC) work well for simple setups, but as systems grow more complex, defining permissions based on roles alone can get messy. That’s where Relationship-Based Access Control (ReBAC) comes in. Instead of just assigning roles, ReBAC uses relationships—like “Alice is a manager of Project X” or “Bob is a friend of Charlie”—to determine access dynamically. This makes it ideal when it comes to securing your RAG pipelines.

This guide uses SpiceDB, a powerful, open-source database designed to handle ReBAC at scale. Inspired by Google’s Zanzibar (which powers Google's Authorization systems across Docs, YouTube and more), SpiceDB lets you define and enforce complex access rules efficiently. With it, you can model relationships between users and resources, then perform lightning-fast permission checks.

Three things about SpiceDB

Here's a quick TL;DR of how SpiceDB works:

Schema: This defines the types of objects found, how those objects relate to one another, and the permissions that can be computed off of those relations. Developers can read and write a schema based on their use-case and then store & query data.
Relationships: Relationships are what binds together a Subject and a Resource via a Relation. A functioning Permissions System that uses ReBAC is the combination of Schema and Relationships
Checks & Lookups: Now that we have a schema and relationships in the database, we can issue checks on whether a subject has a permission on a specific resource, or what resources a subject can access whether via a computed permission or relation membership.

Adding Authorization to your RAG Pipeline

Now there are two approaches to adding AuthZ to your RAG Pipeline.

Post-filter Authorization

So here's the deal: each embedding can have meta data showing which document it came from (like document123). We use this to check if you're actually allowed to see that content.

The process? We can perform a check for each relevant embedding to see if the user has permissions to view the document that the embedding originated from. You can specify the contexts you require: Ex: “I need 5 pieces of additional context before I make the prompt to the LLM” or “exhaust all the embeddings returned”

Pre Filter Authorization

Here we make a query and embed it. But before diving in, we check with our permissions system to see what stuff we're actually allowed to peek at. It gives us back a list of all the documents we can access.

Then we just use that list as our filter, grab all the relevant embeddings we're allowed to see, and boom - we're good to go! That's what we'll be playing with in this guide.

Step-by-step guide

Where's the code you ask? Well that's in Part II of this guide. Now that you've understood the concepts, here's the step-by-step guide to securing your RAG Pipelines.

Don't use JWT for Authorization!

Sohan — Tue, 14 Jan 2025 14:30:00 +0000

What's with the shouty title? Well, I wanted to grab your attention and get straight to the point:

🗣️🗣️ Don't use JWT for your backend authorization

Look, there's a time and place for every piece of technology and the tricky part is determining if your use case actually is the time and place. Hopefully this post will walk you through why JWTs might not be your best friend, and the rare cases where they actually make sense.

🔄 Quick Crash Course: What's a JWT?

So, JWT (pronounced "jot") stands for JSON Web Token. It's part of this whole family of specs called JOSE (no way!) that deal with encrypting and signing JSON. JWT is the cool kid of the family - it's defined in RFC7519 and gets all the attention. Why? Because while its siblings (JWA, JWE, JWK, JWS) handle the nitty-gritty encryption stuff, JWT is the one carrying the actual payload.

Think of a JWT as a JSON object wearing a fancy coat (some headers) and carrying an ID card (a signature) to prove it's legit. It's got these things called "claims" - like when it expires (exp), who created it (iss), who it's for (aud), and so on. The most popular claim for authorization is called "scope", which, fun fact, isn't even from JOSE - it's borrowed from OAuth2. Most developers end up mixing and matching these pieces like a authorization puzzle until something works.

⚔️ The New Enemy Problem: JWT's Achilles' Heel

Here's the thing: JWTs have a major weakness - once they're out there, you can't take them back (except waiting for them to expire). It's like giving someone an all-access pass and not being able to revoke it if they go rogue. This becomes super awkward with web sessions - ever tried implementing a proper "logout" with JWTs? Good luck with that! You're basically crossing your fingers hoping users will play nice and throw away their old tokens.

But wait, it gets worse for backend services. Imagine this: you revoke someone's access on your server, but they're still holding a valid JWT from before. They can keep accessing stuff they shouldn't - this is what the smart folks call the "New Enemy Problem" (first spotted in Google's Zanzibar paper). It's like changing the locks but forgetting about all the spare keys you handed out. Centralized authorization systems fix this by having a central service (think of it as like a bouncer at a bar) checking everyone's credentials in real-time. The New Enemy problem is a really hard and interesting distributed systems problem (and perhaps a future post here)

An example of the New Enemy problem:

Alice removes Bob from the ACL of a document;
Alice then asks Charlie to add new contents to the document;
Bob should not be able to see the new contents, but may do so if the ACL check is evaluated with a stale ACL from before Bob's removal

📏 JWT Scopes: Not as Fine-Grained as You'd Think

While JWTs look good on paper, things get messy in practice. Remember that scope claim I mentioned? It's... kinda vague. The spec basically just says "here's what characters you can use" and calls it a day. You'll see examples such as 'email profile phone address' floating around, and developers often try to get fancy with stuff like 'profile:admin'. But here's the million-dollar question: what does that actually mean? The whole site? Just one user's profile? Even GitHub's REST API has been wrestling with this for ages!

Modern apps need super specific permissions - we're talking granular stuff like 'issue/authzed/spicedb/52:author' instead of just 'issue:author'. When your users might need access to billions of things, you can't stuff all that into a token that's bouncing between services.

Centralized authorization is like having a smart assistant who keeps track of everything in one place. Need to check something? Just ask! For example: SpiceDB does this using something called ReBAC (Relationship-Based Access Control) - it's like a Swiss Army knife that can handle super detailed permissions while still playing nice with other permissions systems such as Role Based Access Control (RBAC), Attribute Based Access Control (ABAC), and other fancy patterns. Google also uses ReBAC for authorization across their services such as YouTube, Docs, and more.

🔮 The Crystal Ball Problem with JWT Authorization

Let's play pretend and say you're cool with using just a few JWT scopes. Even then, you've got a problem: how do you know what permissions you'll need? When your JWT gets created at the front door (like in an API gateway), it needs to predict what every downstream service might want. For anything beyond a super simple setup, that's like trying to predict next week's lottery numbers!

Plus, if you send a token with too many permissions to the next service, you're basically giving attackers a bigger target to hit. This headache led to the creation of Macaroons. These tokens can actually be trimmed down before being passed along - cool idea, right? But in reality, they're so complicated that most folks who tried them ended up saying "thanks, but no thanks."

Centralized authorization systems take a different approach. They're like "Hey, we know we can't predict the future, so just ask us when you need something!" Sure, you have to make an extra call, but systems like SpiceDB are optimized to keep data in-memory - so latency looks similar to reaching out to any other cache like redis or memcache.

🤔 So... Are JWTs Ever the Right Choice?

After all this JWT-bashing, you might think they're completely useless. But there is one scenario where they shine: one-time grants where access cannot be revoked. Though honestly, that's about as rare as finding a unicorn in your backyard!

What system do you use for your authorization needs? Let me know in the comments below.

UPDATE 1:
There's a nice discussion in the comments about either adding state to the JWT, or a system of using a denylist for token revocation. Both these approaches have their downsides and can be fraught with errors. Check the comments below for more info

How I'm Learning SpiceDB

Sohan — Tue, 12 Nov 2024 16:30:00 +0000

(Cover pic by Kelly Sikkema on Unsplash)

A Life Update

I recently joined AuthZed as a Developer Advocate, and I want to document my learning journey for those going through a similar process.

Here are the 4 steps that helped me ramp up my knowledge of SpiceDB. I hope you'll find these helpful on your own learning journey!

1. Start with the Basics

It's always beneficial to have strong foundational knowledge. In the past, my eagerness to code got the better of me, and I dove headfirst into building something only to backtrack to actually understand how it works. This time, I didn't want to repeat that mistake, so I started with a refresher on Authorization, and ABAC RBAC & ReBAC. If these acronyms are new to you, I'd suggest starting here.

I then read the Google Zanzibar paper that inspired SpiceDB, and re-read it - this time with annotations. I have to admit - I find it hard to parse academic papers (who doesn't wish for a TikTok-style summary sometimes?)

That's where this presentation by Jake Moshenko came in really handy. His explanation brings to life all the concepts listed in the paper and reinforces understanding of how Zanzibar works.

Although SpiceDB is inspired by Zanzibar, there are some key differences. Here are some differences in a Q&A format that helped clarify the concepts. If the number of new concepts and terminologies seems overwhelming, that's okay! You don't have to understand all of it from the start, and hopefully, the rest of this article will help with your learning journey.

2. Get the Hang of Schema Design

Schema design is central to SpiceDB and was a new concept for me. A schema essentially defines the types of objects in your system, how those objects relate to one another, and the permissions that can be computed from those relations. I started by watching this video on modeling the GitHub permissions system using Schema.

For practice, I used real-life examples (such as Google Groups or a banking system) and sketched out the different users, objects, and relationships between them. Progressing from a basic user-document schema to a complex real-life example provides valuable practice in designing schemas for SpiceDB.

You can experiment with modeling these in the SpiceDB playground. I encourage you to try it out.

(My niece calls Github as Gibbut so that's the name I refer to it now 😎)

3. Build Something Starting from a Point of Familiarity

Having worked at companies like Amazon Web Services (AWS) and Fermyon, I have background knowledge in Cloud, Compute, and Serverless technologies. I looked through the documentation for familiar territory and found Deploying SpiceDB on Elastic Kubernetes Service. My experience with Amazon EKS helped me understand how SpiceDB integrates into that system.

If you come from an application development background, you might prefer starting with one of our client libraries to build a simple app that communicates with a local SpiceDB instance. Our getting started guide Protecting A Blog Application can be particularly helpful. For those with authorization experience, we offer guides on how SpiceDB compares with Open Policy Agent (OPA) or a comparison with Ruby on Rails CanCanCan. Both show different approaches but share some common ground.

Good time to shout-out that SpiceDB is completely open-source, and we welcome community contributions! Whether you'd like to suggest improvements, fix documentation typos, or contribute to the community, please feel free to do so. Check out our Good First Issues and join our Discord community.

4. Use AI Strategically

While learning to deploy SpiceDB on Amazon EKS, I encountered some challenges (a natural part of learning) and consulted ChatGPT about these errors. Here's a debugging step that I received:

(For context: zed is the AuthZed CLI tool)

Pretty straightforward, right?

Well, except that config is not a zed CLI command. LLMs can hallucinate and often do so with a lot of confidence. Watch out for inconsistencies like these that could trip you up when copying code from an LLM.

This highlights an important distinction between "learning something" and "building something". Asking ChatGPT "How do I install SpiceDB on EKS" and then just spamming the copy-paste keys is not the best way to learn something. I can attest to this because it's exactly what I did at the start! Only partway through did I realize that I hadn't achieved what I set out to do and had to backtrack. On the other hand, asking an LLM about how I could start debugging certain errors gave me a good understanding of what's under the hood. Use these tools thoughtfully and purposefully.

One Final Thought

I'm on a roll with the advice, so here's one more thing (yes, that's a Stevenote reference). This has held me in good stead over the years when learning anything new: enjoy the process, the results will follow.

Happy Learning!

P.S. Here's a webinar I recorded for CNCF about Deploying SpiceDB in EKS. There's nothing quite like learning in public! 😎

The Complete Guide to Serverless Apps II - Functions and Apps

Sohan — Wed, 03 Jul 2024 15:00:00 +0000

In Part I we took a close look at the term “serverless” as it is used in cloud computing. We spoke about a serverless application - where you do not have to write a software server. Instead, you focus only on writing a request handler. Let’s now spend some time talking about this programming model; easily creating serverless functions and serverless applications.

Your program is started when a request is received. The request object is passed into a function in your code. That function is expected to run to completion, possibly handing back a response. Once the function has been completed, the program exits.

There are three characteristics of this sort of program:

It is short running, often running for only milliseconds.
It is triggered by an event or a request.
It is responsible merely for dealing with that request (often returning a response).

Hello World 👋

For the sake of clarity, let’s look at a simple example of this kind of program. We will use the world’s most popular programming language, JavaScript, for this example. But the pattern is similar across languages. Also, we will write an example of a serverless function that handles an HTTP request.

const encoder = new TextEncoder()

// Declare a function that handles a request (in this case, an HTTP request)
export async function handleRequest(request) {

    // Send back an object that describes a response (in this case, an HTTP response)
    return {
        status: 200,
        body: encoder.encode("I'm a Serverless Function").buffer
    }
}

There are three things to note about the example above:

We do not set up a server of any sort (we don't even import any libraries).
There is a function called handleRequest() that takes a request object. This function is called when an inbound HTTP request occurs.
The function returns a response. In this case, it's an HTTP response with a 200 response code (which means no error occurred) and the content that will be displayed in the web browser.

Here is the same example in Python

class IncomingHandler(http.IncomingHandler):
    def handle_request(self, request: Request) -> Response:
        return Response(
            200,
            {"content-type": "text/plain"},
            bytes("I'm a Serverless Function written in Python", "utf-8")
        )

We don't start a server, map ports, handle interrupts, declare SSL/TLS certificates, or anything like that. The serverless app platform does all that stuff on our behalf outside of our code. When a request comes in, this app is started, the handleRequest function is called, and then the app exits.

And how fast is this? Different Serverless platforms have different speeds. With Spin, the handler can be started in under a millisecond. That is why there is no reason to run a server. If we can start this fast, it's much more efficient (and much cheaper) to not be running idle servers.

The above is an example of a serverless function. And when we package that up and send it off to a server, we have built a simple serverless app.

More Definitions 😅

"Wait, i'm confused! If this is a Serverless function, what are Functions as a Service? How does it differ from an Edge Function?"

These are valid questions so let's clarify the two:

Functions as a Service (FaaS)

When AWS Lambda first hit the scene, cloud mavens were keen on collapsing all cloud service names into “as-a-Service”-isms. For example:

core infrastructure services like compute and networking became “Infrastructure-as-a-Service (IaaS)”.
serverless databases were called “DB-as-a-Service (DBaaS)” and so on.

In such an environment, it is no surprise that the first wave of serverless app platforms was given the unattractive monicker “Function-as-a-Service”.

Personally, I prefer using "Serverless functions" in favour of FaaS and here's why:

The term FaaS is opaque. If you don’t know what it means, there are not many clues embedded in the term itself. As with all the “aaS”es, one finds oneself mentally listing words that start with F for clarification.
The term itself refers to the service that runs. So what do you call an application that runs in a FaaS? A Function-as-a-Service Function? A Function-as-a-Service App? That just sounds confusing.
Lastly, in English "FaaS" can be verbally hard to distinguish from "PaaS".

In contrast, an app run inside of a PaaS is usually called a server or a microservice. Thus, most people in the field refer to apps that run in a FaaS as serverless apps or serverless functions.

The most famous PaaS, Heroku, does not refer to itself as a PaaS, and for the same reason, we don’t use FaaS. Much of their documentation uses the term “cloud application platform.”

Cloud Functions and Edge Functions

The terms cloud functions and edge functions occasionally arise when talking about serverless applications. For example, Netlify uses these terms in its documentation. The distinction between these "cloud" and "edge" serverless functions does not concern the functions themselves but rather where the specific function is being executed.

A cloud function executes "in the cloud," which usually means at one of the main hyperscalers such as AWS, Azure, or Google Cloud.
An edge function executes on the edge, a concept we will cover more later. In a nutshell, "edge" refers to the proximity between the end user and the function which they are calling. The term edge also refers to the proximity of the function being executed and the data being processed. The ultimate goal is to obtain the most efficient round-trip between the user, the function and any data being processed.

Providers like Vercel and Netlify must make this distinction because the APIs they provide for the functions that run in the cloud are different from the APIs they provide for the functions that they run in edge providers like CloudFlare. This is an implementation-specific API difference that bubbles up to the developer.

Our view is that “edge functions” and “cloud functions” are varieties of serverless apps. Keep in mind, when we talk about cloud and edge computing later on that the term edge is niche and only relates to a subset of service providers.

Conclusion 😌

Thanks for staying with us thus far! In this post, we saw what the code for a Serverless function looks like and what happens when it is triggered by an event. In the upcoming posts we'll deep-dive in the characteristics of a Serverless App. We'll look at execution time, CPU & Memory, Statelessness, and more.

Let us know if you have come across any other terminology around Serverless Apps, and we'll try and compare and contrast it for you.

The Complete Guide to Serverless Apps I - Introduction

Sohan — Wed, 26 Jun 2024 15:00:00 +0000

'Serverless' as a term first appeared around 2012 but it was only in 2014 when it piqued interest after the launch of AWS Lambda. The term spiked in usage in 2020 but went into decline soon after. Early in 2022, interest began to climb again. And right now, the term is as popular as it's ever been.

There are few reasons for this increase in popularity (which we will delve into shortly) but I thought I'd take this opportunity to draw on my experience working in the cloud, and write a primer for Serverless Applications. This is intended to be read like an e-book and will cover all the facets of Serverless applications.

Here's the Complete Developer’s Guide to Serverless Apps which will be serialized over the course of the next few weeks.

Defining Serverless 💫
- Definition 1: Serverless as SaaS
- Definition 2: Serverless as Hosted Application
- Definition 3: Serverless as a Software Concept
What is a Serverless App 🤓
Comparing Serverless Apps and PaaS 🗒️
Conclusion 😊

Defining Serverless 💫

A friend of mine once argued that there is no such thing as serverless because, of course there are servers underneath every one of the services mentioned above.

While he is not wrong in his statement that the things described as serverless do indeed have servers running somewhere, he missed the main point: Serverless is a statement about what resources you must be concerned with and not so much about the actual presence of physical server hardware.

Let’s dive into three definitions of Serverless, and you’ll soon see what I mean. We’ll start with the most generic definition and work toward the most specific.

Definition 1: Serverless as SaaS

The most generic term “serverless” indicates that some offering is run using the Software-as-a-Service (SaaS) model. In SaaS, an entire application is owned, built, and operated by one organization, and other individuals or organizations create accounts to use that application, typically via the web.

Some SaaS providers prefer to use the term “serverless” to describe the SaaS model, particularly when they want to emphasize that you, as the user, don’t have to do any management of the cloud resources required to run the SaaS.

If this is server-less, then what does “server” mean here? The server, that this serverless offering is hiding from you is what used to be called the application server (back in the pre-cloud days). In other words, the machine or machines whose job it was to execute a specific piece of software.

Err..not that Sass

Definition 1: ”Serverless” merely means no management of cloud resources

You can spot this usage of the term easily. For example, when serverless is being used as a synonym for SaaS it signals that:

the offering is targeted toward non-engineers,
there is no REST API, library, SDK, or CLI to use, and
the web interface is the only way to interact with the application.

Definition 2: Serverless as Hosted Application

Consider the following cloud services, each of which claims to be serverless:

Serverless database hosting.
Serverless logging and monitoring.
Serverless messaging framework.

What do these three have in common? A serverless database is one in which someone else manages the database software (starting and stopping, upgrading, patching security issues) and you simply use the database (creating tables, running queries, inserting data). Similarly, “serverless logging and monitoring” suggests the same: Someone else manages a bunch of servers that do log processing and monitoring, and you merely use the APIs provided to attach to your application.

This definition treats the word “server” as referring to the hardware or Virtual Machine (VM) instance. While you (the user) may be required to say what kind of Operating System (OS) or architecture you prefer their cloud service to run on. In this case, you perhaps also choose your memory and/or storage requirements but you are not responsible for the day-to-day operations relating to any of this infrastructure.

Definition 2: ”Serverless” means you (the user) do not have to manage the infrastructure that your cloud services runs on.

You can spot the meaning of “serverless” in this context when the offering:

is engineering-oriented,
has a REST API, libraries, SDKs, or CLI client,
allows you to create instances of this offering for your usage (but you don’t have to manage the day-to-day operations like upgrading, patching, or monitoring), and
typically does not allow you to gain access directly to the operating system (such as a shell prompt or system administrator account).

Definition 3: Serverless as a Software Concept

Consider the process of building a Ruby on Rails, Python Django, or Node.js Express application. One of the first things you must do is write the code that starts a server to listen on a particular port. Or, if you go a level lower than these common frameworks, you might even have to create a socket server, attach a thread pool, and map incoming requests to an HTTP (or other protocol) handler. When you are doing any of these things, you are writing a software server.

In the software world, a server is a long-running process that listens for incoming requests (usually on a network connection) and then handles those requests. A server typically handles hundreds to millions of individual requests over its lifetime, which may span hours, days, months, or even years before the server is restarted.

Contrast this with a program where, instead of standing up an entire server, you merely write a function that starts up (receives a single request), handles that request, and then optionally returns a response before it shuts down. That is, each request executes the program from start to finish. Such a program may run for milliseconds, seconds, or perhaps several minutes. But rarely does it run longer.

This is the serverless app model. And this is our third usage of the term “serverless”.

This model gained the name “serverless function” when Amazon released its Lambda offering.

Definition 3: “Serverless” means you do not have to write the software server that will listen for requests, nor do you need to manage the hardware or operating system

You can spot the meaning of the term “serverless” in this context when:

you, the developer, write request (event) handlers instead of software servers,
a variety of SDKs, APIs, or tooling are provided to make it easier for you to write programs,
you do not need to manage server hardware or virtual machines, and
you also typically do not have administrator access or shell access to the environment executing your code

What is a Serverless App 🤓

A serverless app is a thing that runs inside of an environment that manages all of the protocol-level and process-level aspects of serving content. Additionally, that serverless environment provides a layer of secure isolation from other serverless apps. In the strongest cases, this allows multitenant hosting, where two different customers or users can run their apps on the same serverless app platform without fear that the other users or customers can tamper with the app.

A serverless app is the piece of software you, as the developer, write and upload to a serverless application platform. And the code you write is started when a new request is received. Your code is expected to handle that request and perhaps return a response, at which point your code is shut down again (ready to be started again in the future).

For the most part, it is acceptable to use the terms “serverless app” and “serverless function” interchangeably. In this guide, we tend to use serverless app because it is more generic (and also shorter to type). Where it is necessary to distinguish between the two, we do so carefully.

Comparing Serverless Apps and PaaS 🗒️

It is useful to contrast a serverless app with an app written for a Platform-as-a-Service (PaaS). Heroku and the open-source Cloud Foundry are examples of PaaS. In contrast, Fermyon Cloud is an example of a serverless app platform, and Spin is a developer tool for building serverless apps.

For starters, Serverless apps are more cost-effective than long-running servers in a PaaS environment. A serverless app is simpler, faster, and more resource-efficient than a PaaS. But let's dive into the conceptual details to see why this is the case.

In a PaaS, you (the developer) write an application in any language the PaaS supports. And then you deploy that application to the PaaS service, where it is hosted on your behalf. Developer self-service is a core feature of a PaaS. That means developers can deploy their applications without relying upon an operations (DevOps or platform engineering) team.

Most serverless app platforms, including Fermyon Cloud, also provide the same kind of developer self-service, including a web dashboard, metrics and monitoring, and a host of tools to assist in developing and debugging.

Where a PaaS differs from serverless apps is, again, in the programming model. A PaaS follows definition 2 of serverless where the hardware is managed, but the developer must still write a software server.

Meanwhile, serverless apps follow the 3rd definition of serverless. Whereby a developer writes only the small program that handles a request and does not have to worry about writing the software server.

If you are looking for a developer self-service platform to run long-running services that are always on, a PaaS is probably the solution you are after. If you are looking for a developer self-service platform where you can quickly write highly efficient apps that need to execute instantly and scale rapidly, you may prefer to take a look at serverless apps.

Conclusion 😊

That was a comprehensive look at defining Serverless and Serverless apps. In the rest of the series we will look at the characteristics of a serveless function, the use cases and of course some code.

Let us know in the comments if you are using Serverless already, and what your usecases are!