This is Part 6 of a 6-part series. Part 5 covers the knowledge loop.
Deploy to Production & What to Build Next
You've built an AI assistant that searches your internal tools in parallel, synthesizes answers with Claude, learns from feedback, and lives in Slack. Now let's deploy it to production, look at real costs, and talk about the extensions that take Harper Eye from good to indispensable.
Step 1: Deploy to Harper Fabric
If you've been developing locally, deploying to production is a single command. Make sure your .env has your Fabric credentials:
CLI_TARGET=your-cluster.your-org.harperfabric.com:9925
CLI_USERNAME=your-username
CLI_PASSWORD=your-password
And your CONFIG.env has all your API keys and tokens.
Deploy:
npm run deploy
This runs:
npx -y dotenv-cli -- harperdb deploy . restart=rolling replicated=true
What happens behind the scenes:
- Your application code gets pushed to the Fabric cluster
- Harper reads
schema.graphqland creates/updates all tables and indexes - Environment variables from
CONFIG.envare loaded - The cluster does a rolling restart with zero downtime
- Your Resource Classes become live HTTPS endpoints
The whole process takes about 5-8 seconds. Your application is now live at:
https://your-cluster.your-org.harperfabric.com/
Update your Slack app's event URLs to point at this endpoint, and you're done.
Step 2: Verify Everything Works
Run through the checklist:
# Health check
curl https://your-cluster.your-org.harperfabric.com/HealthCheck
# API query
curl -X POST https://your-cluster.your-org.harperfabric.com/Api \
-H "Content-Type: application/json" \
-H "Authorization: Basic $(echo -n 'user:pass' | base64)" \
-d '{"query": "How does replication work?", "mode": "ask"}'
Then in Slack:
/harper-ask how does sharding work?
If you get a structured response with sources from your actual internal systems, you're live. You've just shipped a tool that would cost thousands per month from a vendor.
The Real Cost Breakdown
I've been running Harper Eye in production for our team. Here are the actual numbers:
Monthly Costs
| Resource | Cost | Notes |
|---|---|---|
| Harper Fabric | ~$25 | Single instance handles everything: HTTP, database, vector search |
| Claude API (Anthropic) | ~$30-50 | ~200-400 queries/month at ~$0.10-0.15/query (Sonnet) |
| Gemini Embeddings | $0 | Free tier: 1,500 requests/min. We use maybe 1,000/month |
| Confluence API | $0 | Included in existing Atlassian subscription |
| Zendesk API | $0 | Included in existing subscription |
| Datadog API | $0 | Included in existing subscription |
| GitHub API | $0 | Included (authenticated requests: 5,000/hr) |
| Slack API | $0 | Free for workspace apps |
| Total | ~$55-75/mo | For the entire organization |
Cost Per Query
With the knowledge base doing its job, many queries hit the cache and never touch Claude:
| Query Type | Cost | Percentage of Queries |
|---|---|---|
| KB exact hit | ~$0.001 (just the embedding call) | ~30% after 2 months |
| Full orchestration | ~$0.10-0.15 (embedding + Claude) | ~70% initially, dropping over time |
| Blended average | ~$0.07/query | Getting cheaper every month |
The knowledge loop is doing real work. Every verified answer that gets cached means one fewer Claude API call in the future. After two months, about 30% of our queries return instant cached results. That percentage keeps climbing.
What This Replaces
| SaaS Alternative | Monthly Cost |
|---|---|
| Glean or Dashworks (AI search) | $800-2,000 |
| PagerDuty AIOps add-on | $500-1,000 |
| Incident.io or similar | $400-800 |
| Pinecone (vector DB) | $70-200 |
| Total replaced | $1,770-4,000/mo |
Savings: $1,700-3,900/month. That's $20,000-47,000/year.
And your custom system is better — it knows your terminology, your architecture, your people. It learns from your team's feedback. It doesn't forget when you cancel a subscription.
Extensions Worth Building
Once you have the core running, these extensions are each a day or less of work:
PagerDuty Webhooks
Create resources/PagerDutyWebhook.js that receives PagerDuty incident webhooks. When a new incident fires, Harper Eye automatically runs the full orchestration and posts the analysis to your Slack incident channel, before any human even looks at it.
export class PagerDutyWebhook extends Resource {
static loadAsInstance = false;
async post(target, data) {
const event = data?.event;
if (event?.event_type === 'incident.triggered') {
const incident = event.data;
// Auto-analyze the incident
const result = await orchestrate(
`PagerDuty incident: ${incident.title}. Service: ${incident.service?.summary}`,
{ mode: 'incident' }
);
// Post to your incident channel
const slack = getSlackClient();
await slack.chat.postMessage({
channel: config.slack.incidentChannel(),
blocks: [
...formatPagerDutyHeader(incident),
...formatIncidentResponse(result),
],
});
}
return { ok: true };
}
}
The result: PagerDuty fires at 2 am, and by the time your on-call engineer opens Slack, there's already an AI analysis with root cause candidates, relevant runbooks, customer impact assessment, and which colleague to escalate to.
Web Dashboard
The site/ directory serves static HTML at /app/. Build a simple dashboard for:
- Knowledge base management: view, edit, and delete KB entries
- Query analytics: most asked questions, resolution times, source utilization
- Feedback trends: which topics get the most negative feedback (signals for documentation gaps)
- Expert map: who on your team is the go-to person for which topics
No React. No build step. Vanilla HTML + CSS + JS that calls your /Api endpoint. Harper serves it as static files.
Knowledge Capture from Slack
Add intent classification to @-mentions so engineers can say @harper-eye save this in a Slack thread, and Harper Eye automatically extracts the key Q&A from the thread discussion and saves it to the knowledge base. The tribal knowledge from a debugging session is preserved forever without anyone having to write a wiki page.
Source Relevance Learning
Track which data sources actually contribute to useful answers. If Zendesk results never get cited for architecture questions, stop searching Zendesk for those queries. This saves API calls and reduces noise in Claude's context.
The Project Structure (Final)
harper-eye/
├── config.yaml # Harper app config
├── schema.graphql # 7 tables, 5 vector indexes
├── CONFIG.env # Secrets (gitignored)
├── .env # Deploy creds (gitignored)
├── package.json
│
├── resources/ # HTTP endpoints (Resource Classes)
│ ├── SlackEvents.js # Slack commands + @mentions
│ ├── SlackInteractivity.js # Feedback button handlers
│ ├── PagerDutyWebhook.js # PagerDuty auto-analysis
│ ├── Api.js # REST API for web dashboard
│ ├── HealthCheck.js # GET /HealthCheck
│ └── Debug.js # Debug endpoint
│
├── lib/ # Core business logic
│ ├── orchestrator.js # AI orchestration (the brain)
│ ├── knowledge-base.js # KB CRUD + vector search + feedback
│ ├── embeddings.js # Gemini embedding generation
│ ├── config.js # Config loader
│ ├── slack-formatter.js # Slack Block Kit formatting
│ └── slack-mentions.js # @-mention expert suggestions
│
├── mcp/ # Data source wrappers
│ ├── confluence.js # Confluence REST API
│ ├── zendesk.js # Zendesk REST API
│ ├── datadog.js # Datadog REST API
│ ├── github.js # GitHub REST API
│ └── harper-docs.js # Documentation site search
│
└── site/ # Static web UI
├── index.html # Knowledge base dashboard
├── dashboard.html # Analytics
└── help.html # Help page
Total lines of core logic: about 700. Total external infrastructure: zero (beyond Harper itself). Total time to build: 3 days.
What You've Built
Let's take a final inventory. Look at these two images side by side:
That's not a mockup. That's a production system that 35 engineers use every day, running for under $100/month. Here's what you now have:
An AI assistant that:
- Searches 6+ internal data sources in parallel (2-4 seconds)
- Synthesizes results with Claude into structured, cited responses
- Lives in Slack with slash commands, @mentions, and threaded follow-ups
- Returns verified cached answers instantly (sub-second)
- Learns from team feedback without manual curation
- Automatically degrades and purges bad answers
- Knows which engineers are experts on which topics
- Costs under $100/month for your entire organization
Running on a stack of:
- Harper: one platform for HTTP, database, and vector search
- Claude: AI synthesis with structured JSON output
- Gemini: embedding generation (free tier)
- Vanilla JS: Built into Harper, no framework, no build step
The compounding effects over time:
- Month 1: ~0% KB cache hits. Every query goes through full orchestration.
- Month 2: ~15-20% cache hits. Common questions return instantly.
- Month 3: ~25-35% cache hits. Negative feedback has pruned bad answers.
- Month 6: ~40-50% cache hits. Your AI costs are dropping while quality improves.
- Month 12+: The knowledge base becomes your most valuable internal asset. It's the institutional memory that survives employee turnover.
The Bigger Picture
Here's what I've learned from building and running Harper Eye:
Custom beats generic, always. No vendor product will ever understand your architecture, your terminology, your people, or your incident history the way a custom tool does. The gap isn't about AI capability; Claude is the same Claude whether you use it through a vendor or directly. The gap is about context. Your context.
The feedback loop is everything. An AI assistant without a learning mechanism is a party trick. One that gets smarter every time someone uses it is an institution. The knowledge loop, verified answers, negative feedback, automatic degradation, is what makes Harper Eye more valuable the longer you run it.
Infrastructure should disappear. I didn't want to manage Postgres, Pinecone, Redis, Express, and a deployment pipeline. Harper let me put all of that in a single schema.graphql and config.yaml. The less time you spend on infrastructure, the more time you spend on the logic that actually makes your tool useful.
Build it yourself, but build it with the right tools. I built Harper Eye in 3 days, not because I'm fast, but because Claude wrote most of the code and Harper eliminated most of the infrastructure. The combination of AI-assisted development and a unified platform is what makes this feasible for a single engineer.
If you've followed along this far, you have everything you need. The code is real. The architecture is battle-tested. The costs are provably lower than the alternatives.
Build it. Run it. Let your team use it. Watch it get smarter.
And stop paying thousands a month for SaaS that doesn't know your name.
*If you build your own version, I'd love to hear about it. Reach out on the Harper Discord.




Top comments (0)