This is Part 2 of a 6-part series on building your own AI-powered internal assistant. Part 1 covers the why and architecture overview.
Harper Setup, Schema & Your First Endpoint
In Part 1, you saw what Harper Eye can do, structured incident analysis in Slack, expert routing, and a self-improving knowledge base. Now we build the foundation. In this part, you'll go from zero to a running application with a database, vector-indexed tables, and a deployed, live REST API. Takes about 30 minutes.
By the end of this article, you'll have the infrastructure for everything you see in this dashboard:
Step 1: Install Harper & Create Your Project
If you want to develop locally first (recommended), install Harper:
npm install -g harperdb
Now scaffold your project:
mkdir harper-eye && cd harper-eye
npm init -y
Update your package.json with the dependencies we'll need:
{
"name": "harper-eye",
"version": "1.0.0",
"description": "AI-powered ops assistant — searches internal tools, synthesizes with Claude, learns from feedback",
"type": "module",
"engines": {
"harperdb": "^4.4"
},
"dependencies": {
"@anthropic-ai/sdk": "^0.39.0",
"@slack/web-api": "^7.8.0"
},
"scripts": {
"start": "harperdb run .",
"dev": "harperdb dev .",
"deploy": "npx -y dotenv-cli -- harperdb deploy . restart=rolling replicated=true"
}
}
npm install
Notice what's not there: no Express, no database driver, no vector DB client, no Redis. Harper handles HTTP routing, data storage, and vector search natively. Your node_modules stays small.
Step 2: Define Your Database Schema
This is where Harper shines. Instead of writing migrations or setting up a separate database, you define your schema in a single GraphQL file and Harper creates everything automatically.
Create schema.graphql in your project root:
type IncidentQuery @table @export {
id: ID @primaryKey
slackUserId: String
slackChannelId: String
conversationId: String
query: String
response: String
sourcesUsed: [String]
createdAt: String
resolvedAt: String
}
type KnowledgeEntry @table @export {
id: ID @primaryKey
query: String
queryEmbedding: [Float] @indexed(type: "HNSW", distance: "cosine")
answer: String
sources: [String]
originalIncidentId: String
approvedByUserId: String
approvedAt: String
channelId: String
useCount: Int
lastUsedAt: String
negativeCount: Int
}
type NegativeFeedback @table @export {
id: ID @primaryKey
queryId: String
originalQuery: String
queryEmbedding: [Float] @indexed(type: "HNSW", distance: "cosine")
category: String
details: String
userId: String
channelId: String
knowledgeEntryId: String
createdAt: String
}
type AuditLog @table @export {
id: ID @primaryKey
action: String
userEmail: String
query: String
details: String
source: String
queryId: String
entryId: String
createdAt: String
}
type SlackExpertise @table @export {
id: ID @primaryKey
userId: String
userName: String
topic: String
topicEmbedding: [Float] @indexed(type: "HNSW", distance: "cosine")
channelId: String
channelName: String
answerCount: Int
questionCount: Int
lastActiveAt: String
indexedAt: String
}
type SourceRelevance @table @export {
id: ID @primaryKey
query: String
queryEmbedding: [Float] @indexed(type: "HNSW", distance: "cosine")
mode: String
sourceSignals: String
createdAt: String
}
type CodeKnowledge @table @export {
id: ID @primaryKey
repo: String
filePath: String
fileName: String
fileType: String
content: String
contentEmbedding: [Float] @indexed(type: "HNSW", distance: "cosine")
description: String
tier: Int
lastIndexedAt: String
commitSha: String
url: String
}
Let me break down what's happening here, because there's a lot of power in these few lines:
@table tells Harper to create a database table. @export exposes it as a REST endpoint automatically. You get full CRUD (PUT, GET, DELETE) for free.
@primaryKey designates the unique identifier.
@indexed(type: "HNSW", distance: "cosine"): this is the magic line. It tells Harper to build an HNSW (Hierarchical Navigable Small World) vector index on that field, using cosine distance for similarity. This is the same algorithm that powers vector search in dedicated vector databases like Pinecone, but it's built into Harper natively. No separate service, no extra cost.
We have four vector-indexed fields across our tables:
| Table | Vector Field | Purpose |
|---|---|---|
KnowledgeEntry |
queryEmbedding |
Find similar past questions to return cached answers |
NegativeFeedback |
queryEmbedding |
Find similar past complaints to avoid repeating mistakes |
SlackExpertise |
topicEmbedding |
Find engineers who are experts on similar topics |
CodeKnowledge |
contentEmbedding |
Find relevant source code by semantic meaning |
SourceRelevance |
queryEmbedding |
Learn which data sources matter for which query types |
That's five HNSW indexes running natively inside Harper, with zero additional infrastructure.
Step 3: Configure Your Application
Create config.yaml in your project root. This is Harper's application configuration, it tells Harper what to load and how to serve it:
# Load environment variables from CONFIG.env
loadEnv:
files: 'CONFIG.env'
# Enable the REST API for all exported resources
rest: true
# Read GraphQL schemas to define database tables
graphqlSchema:
files: 'schema.graphql'
# Load JavaScript modules as HTTP endpoints
jsResource:
files: 'resources/*.js'
# Serve static web UI files at /app/
static:
files: 'site/**'
urlPath: 'app'
index: true
extensions: ['html']
Five lines of YAML and you have: environment variable loading, REST API generation, schema-driven database creation, custom HTTP endpoints, and static file serving. No boilerplate. No middleware chain. No app.use() rituals.
Now create your CONFIG.env file for secrets (this file is gitignored, never commit it):
# Claude (Anthropic)
CLAUDE_API_KEY=sk-ant-your-key-here
CLAUDE_MODEL=claude-sonnet-4-20250514
# Gemini (Embeddings)
GEMINI_API_KEY=your-gemini-key-here
GEMINI_EMBEDDING_MODEL=text-embedding-004
# Slack
SLACK_BOT_TOKEN=xoxb-your-bot-token
SLACK_VERIFICATION_TOKEN=your-verification-token
SLACK_SIGNING_SECRET=your-signing-secret
# Confluence (optional — add your data sources as you have them)
CONFLUENCE_BASE_URL=https://yourcompany.atlassian.net
CONFLUENCE_EMAIL=your-email@company.com
CONFLUENCE_API_TOKEN=your-confluence-token
# Zendesk (optional)
ZENDESK_SUBDOMAIN=yourcompany
ZENDESK_EMAIL=your-email@company.com
ZENDESK_API_TOKEN=your-zendesk-token
# Datadog (optional)
DATADOG_API_KEY=your-datadog-api-key
DATADOG_APP_KEY=your-datadog-app-key
# GitHub (optional)
GITHUB_TOKEN=ghp_your-github-token
GITHUB_ORG=your-org-name
Don't worry if you don't have all of these yet. Harper Eye is designed to gracefully skip any source that isn't configured, you can start with just Claude and one data source, then add more over time.
Create a config loader at lib/config.js:
/**
* Config loader — reads values from environment variables (loaded from CONFIG.env).
* Each getter throws if the value is missing, so unconfigured sources
* are caught gracefully by the orchestrator's try/catch blocks.
*/
function env(key) {
return () => {
const val = process.env[key];
if (!val) throw new Error(`Missing env: ${key}`);
return val;
};
}
export const config = {
claude: {
apiKey: env('CLAUDE_API_KEY'),
model: () => process.env.CLAUDE_MODEL || 'claude-sonnet-4-20250514',
},
gemini: {
apiKey: env('GEMINI_API_KEY'),
embeddingModel: () => process.env.GEMINI_EMBEDDING_MODEL || 'text-embedding-004',
},
slack: {
botToken: env('SLACK_BOT_TOKEN'),
verificationToken: env('SLACK_VERIFICATION_TOKEN'),
signingSecret: env('SLACK_SIGNING_SECRET'),
},
confluence: {
baseUrl: env('CONFLUENCE_BASE_URL'),
email: env('CONFLUENCE_EMAIL'),
apiToken: env('CONFLUENCE_API_TOKEN'),
},
zendesk: {
subdomain: env('ZENDESK_SUBDOMAIN'),
email: env('ZENDESK_EMAIL'),
apiToken: env('ZENDESK_API_TOKEN'),
},
datadog: {
apiKey: env('DATADOG_API_KEY'),
appKey: env('DATADOG_APP_KEY'),
},
github: {
token: env('GITHUB_TOKEN'),
org: () => process.env.GITHUB_ORG || '',
},
};
This pattern is intentional. Each config getter is a function that throws if the value is missing. That means the orchestrator can wrap source calls in try/catch and gracefully skip any source that isn't configured. You never have to comment out code or manage feature flags; just add or remove environment variables.
Step 4: Build Your First Resource Class Endpoint
Resource Classes are how you define HTTP endpoints in Harper. If you've used Express, think of them as route handlers, but with a few important differences.
Create the resources/ directory and add a health check endpoint:
mkdir -p resources lib mcp site
Create resources/HealthCheck.js:
import { Resource } from 'harperdb';
export class HealthCheck extends Resource {
static loadAsInstance = false;
async get() {
return {
status: 'ok',
service: 'harper-eye',
timestamp: new Date().toISOString(),
};
}
}
That's it. That's a complete HTTP endpoint. The class name HealthCheck becomes the URL path /HealthCheck. The get() method handles GET requests. The return value is automatically serialized to JSON.
Critical rules for Resource Classes (these will save you hours of debugging):
Always use
static loadAsInstance = false: this tells Harper to create instances per-request. Without this, you'll get confusing behavior.Instance methods, not static methods: write
async get(), notstatic async get(). The static methods are Harper's internal entry points.The first argument to
post()is NOT the request body — it's the "target" (resource context). The body is the second argument:
// ✅ CORRECT
async post(target, data) {
const body = data; // This is your POST body
}
// ❌ WRONG — target is not the body
async post(request) {
const body = request.body; // This won't work
}
-
Headers come from
this.getContext(), not from a request parameter:
async post(target, data) {
const context = this.getContext();
const authHeader = context.headers.get('authorization');
}
-
To return errors, throw with a
statusCodeproperty, don't construct a Response object:
// ✅ CORRECT
const err = new Error('Unauthorized');
err.statusCode = 401;
throw err;
// ❌ WRONG
return new Response('Unauthorized', { status: 401 });
Step 5: Build the REST API Endpoint
Now let's build the endpoint that the web dashboard will use. This accepts questions via HTTP POST and returns AI-generated answers.
Create resources/Api.js:
import { Resource, tables } from 'harperdb';
import crypto from 'crypto';
import { config } from '../lib/config.js';
export class Api extends Resource {
static loadAsInstance = false;
async post(target, data) {
const context = this.getContext();
// Basic auth check (we'll build proper auth later)
const authHeader = context.headers.get('authorization');
if (!authHeader) {
const err = new Error('Authorization required');
err.statusCode = 401;
throw err;
}
const { query, mode = 'ask' } = data ?? {};
if (!query?.trim()) {
const err = new Error('Missing "query" in request body');
err.statusCode = 400;
throw err;
}
// For now, return a placeholder — we'll wire in the orchestrator in Part 3
const queryId = crypto.randomUUID();
const result = {
queryId,
summary: `Received your question: "${query}" — orchestrator coming in Part 3!`,
sources: [],
steps: [],
};
// Save the query to the audit trail
await tables.IncidentQuery.put({
id: queryId,
query: query.trim(),
response: JSON.stringify(result),
sourcesUsed: [],
createdAt: new Date().toISOString(),
});
return result;
}
async get() {
// Return recent queries for the dashboard
const queries = [];
for await (const entry of tables.IncidentQuery.search({
sort: { attribute: 'createdAt', descending: true },
limit: 20,
})) {
queries.push({
id: entry.id,
query: entry.query,
createdAt: entry.createdAt,
});
}
return { queries };
}
}
Notice how we're using tables.IncidentQuery directly, no connection strings, no query builders, no ORM. Harper tables are imported directly from the harperdb module and are available anywhere in your application code. You put() to write, get() to read by ID, and search() to query.
Step 6: Run It Locally
Start the Harper dev server:
npm run dev
Harper will read your config.yaml, create the tables defined in schema.graphql, and start serving your Resource Classes. You should see output like:
Harper is running
Application loaded from /path/to/harper-eye
Test your health check:
curl http://localhost:9926/HealthCheck
{
"status": "ok",
"service": "harper-eye",
"timestamp": "2025-02-24T10:30:00.000Z"
}
Test the API endpoint:
curl -X POST http://localhost:9926/Api \
-H "Content-Type: application/json" \
-H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
-d '{"query": "How does replication work?", "mode": "ask"}'
{
"queryId": "a1b2c3d4-...",
"summary": "Received your question: \"How does replication work?\" — orchestrator coming in Part 3!",
"sources": [],
"steps": []
}
Your data is already being stored. Check it:
curl http://localhost:9926/IncidentQuery \
-H "Authorization: Basic $(echo -n 'admin:password' | base64)"
You'll see the query you just sent, stored in the database with a UUID, timestamp, and the response. This is the audit trail — every question your team ever asks Harper Eye gets recorded here.
Step 7: Deploy to Harper Fabric
Local development is great, but let's get this live. Sign up for Harper Fabric at fabric.harper.fast if you haven't already, then create a cluster.
Create a .env file (also gitignored) with your deploy credentials:
CLI_TARGET=your-cluster.your-org.harperfabric.com:9925
CLI_USERNAME=your-username
CLI_PASSWORD=your-password
Deploy:
npm run deploy
That's it. Harper deploys your application code, creates the tables from your schema, loads your environment variables, and starts serving. The deploy takes about 5-8 seconds for a restart.
Your application is now live at:
https://your-cluster.your-org.harperfabric.com/
Test it:
curl https://your-cluster.your-org.harperfabric.com/HealthCheck
What You Have So Far
Let's take stock. In about 30 minutes, you've built and deployed:
- A database with 7 tables, including 5 HNSW vector indexes for semantic search
- A REST API with automatic CRUD for all tables, plus two custom endpoints
- An audit trail that records every query
- A production deployment on Harper Fabric with a public HTTPS endpoint
Your project structure looks like this:
harper-eye/
├── config.yaml # Harper app config
├── schema.graphql # Database tables + vector indexes
├── CONFIG.env # Secrets (gitignored)
├── .env # Deploy credentials (gitignored)
├── package.json
├── lib/
│ └── config.js # Config loader
├── resources/
│ ├── HealthCheck.js # GET /HealthCheck
│ └── Api.js # POST /Api (query endpoint)
├── mcp/ # (empty — data source wrappers in Part 3)
└── site/ # (empty — web UI in Part 6)
Total lines of code: about 120. Total external services: zero (Harper handles everything). Total monthly cost so far: whatever your Harper Fabric plan costs — which is as low as free.
What's Next
In Part 3, we build the AI brain, the orchestrator that searches all your data sources in parallel, feeds results to Claude, and returns structured, cited responses. That's where Harper Eye goes from "a database with an API" to "the most useful tool your engineering team has ever had."
We'll build:
- The MCP tool wrappers for Confluence, Zendesk, Datadog, and GitHub
- The embedding generation pipeline using Gemini
- The parallel source fetcher
- The Claude orchestration loop with structured JSON output
- The response parser with fallback strategies
It's about 400 lines of code, and it's the most satisfying code you'll write all year.

Top comments (0)