Paste your LinkedIn, GitHub, Twitter, and resume — get a brutally honest AI investigation of your entire internet personality.
Introduction
Internet Detective AI is a Next.js 15 application that accepts a person's digital footprint (LinkedIn URL, GitHub URL, Twitter handle, resume text) and runs it through a 7-agent AI pipeline to produce a detailed, entertaining investigation report. The output includes evidence-based facts, behavioral signals, hidden obsessions, a career prediction, a startup parody pitch, coworker quotes, a brutal roast, internet personality scores, and a "cooked level" meter.
It serves two purposes simultaneously:
- A viral consumer app — Shareable, funny, and surprisingly insightful. Users compete for the best roasts and personality scores.
- A reference architecture for production AI engineering — Every layer is designed to be educational, extensible, and production-ready.
The entire system was built in 24 hours and spans 14 areas of modern AI engineering, each documented independently in the docs/ folder.
The Vision
Most AI demo projects are either fun but shallow (a single API call wrapped in a nice UI) or educational but boring (a Jupyter notebook with no frontend). Internet Detective AI bridges this gap:
A single codebase that serves as both a genuinely fun consumer app AND a comprehensive reference architecture.
Every engineering decision was made with two audiences in mind: end-users who just want to see their roast, and developers who want to understand how production AI systems work. The result is a project that scales from "clone and run in 5 minutes" to "study every production pattern used by AI engineering teams."
Architecture Overview
The system follows a layered architecture where each layer has a single responsibility:
┌─────────────────────────────────────────────────────┐
│ Next.js 15 App │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Landing │ │ Investigation│ │ Developer │ │
│ │ Page │ │ Report Page │ │ Dashboard │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬─────┘ │
│ │ │ │ │
│ ┌──────┴─────────────────┴──────────────────┴─────┐ │
│ │ API Routes │ │
│ └─────────────────────┬───────────────────────────┘ │
├────────────────────────┼─────────────────────────────┤
│ ┌─────────────────────┴───────────────────────────┐ │
│ │ Multi-Agent Orchestrator │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Profile│ │Signal│ │Career│ │Start │ │ Roast│ │ │
│ │ │Analyst│ │Detect│ │Predic│ │Gener │ │Agent │ │ │
│ │ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │ │
│ │ └────────┴────────┴────────┴────────┘ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Governance │ │ │
│ │ │ Agent │ │ │
│ │ └──────┬───────┘ │ │
│ │ ┌──────┴───────┐ │ │
│ │ │ Synthesis │ │ │
│ │ │ Agent │ │ │
│ │ └──────────────┘ │ │
│ └─────────────────────┬───────────────────────────┘ │
│ ┌─────────────────────┴───────────────────────────┐ │
│ │ AI Service Layer │ │
│ └─────────────────────┬───────────────────────────┘ │
│ ┌─────────────────────┴───────────────────────────┐ │
│ │ Provider Abstraction Layer │ │
│ │ Zen │ OpenAI │ Anthropic │ Gemini │ OpenRouter │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
The request flow is: HTTP Request → API Route → Context Builder → Orchestrator → 7 Agents → Governance Loop → Synthesis → Response.
Provider Abstraction
The most fundamental architectural decision was decoupling every agent from any specific AI provider. We use the Adapter pattern with a ProviderAdapter interface:
// src/lib/providers/types.ts
export interface ProviderAdapter {
name: ProviderType;
chat(request: ChatCompletionRequest): Promise<ChatCompletionResponse>;
getModels(): Promise<string[]>;
isAvailable(): Promise<boolean>;
}
Every provider implements this interface. The factory handles instantiation and configuration:
// src/lib/providers/factory.ts
export class ProviderFactory {
private static registry = loadConfig();
static createProvider(type: ProviderType): ProviderAdapter {
const registration = ProviderFactory.registry[type];
if (!registration) {
throw new Error(`Unknown provider type: ${type}`);
}
if (!registration.config) {
throw new Error(
`Provider "${type}" is not configured. Set the required environment variable.`,
);
}
return new registration.constructor(registration.config);
}
static getDefaultProvider(): ProviderAdapter {
const providerType = (process.env.AI_PROVIDER || "zen") as ProviderType;
return ProviderFactory.createProvider(providerType);
}
}
The registry supports 7 providers out of the box: Zen, OpenAI, Anthropic, Gemini, OpenRouter, Featherless, and Ollama. Each provider is gated by environment variables — if the key is missing, it's simply excluded from getAllAvailableProviders().
The BaseProvider abstract class adds shared behavior like error handling with retry logic, latency measurement, and cost calculation:
// src/lib/providers/base.ts
export abstract class BaseProvider implements ProviderAdapter {
abstract name: ProviderType;
protected config: ProviderConfig;
protected async executeWithRetry(
request: ChatCompletionRequest,
): Promise<ChatCompletionResponse> {
return retryWithBackoff(() => this.chat(request), 3);
}
protected handleError(error: unknown, context: string): never {
if (error instanceof AIProviderError) throw error;
if (error instanceof Error) {
const statusCode = this.extractStatusCode(error);
const retryable = statusCode >= 500 || statusCode === 429;
throw new AIProviderError(`${this.name}: ${context} - ${error.message}`, {
code: this.errorCodeFromStatus(statusCode),
statusCode,
provider: this.name,
retryable,
cause: error,
});
}
throw new AIProviderError(`${this.name}: ${context} - Unknown error`, {
code: "UNKNOWN_ERROR",
provider: this.name,
retryable: false,
cause: error,
});
}
}
The result: Any agent can use any provider. Switch the entire app from GPT-4o to Claude to Gemini with one environment variable. Adding a new provider means writing exactly one adapter file.
Multi-Agent Architecture
The orchestrator runs 7 agents in a directed pipeline. Each agent has a single responsibility:
| Agent | Input | Output | Responsibility |
|---|---|---|---|
| Profile Analyst | ContextPack | Facts, digital profile summary | Extract directly observable facts |
| Signal Detector | Context + Facts | Strong signals, hidden obsessions | Find behavioral patterns |
| Career Predictor | Context + Signals | Career prediction | Predict future trajectory |
| Startup Generator | Context + Signals | Startup parody | Create VC-pitch satire |
| Roast Agent | Context + Signals | Roasts, coworker quotes, verdict | Generate playful humor |
| Governance Agent | All outputs | Governance check | Validate ethical compliance |
| Final Synthesis | Everything | InvestigationReport | Assemble final report |
Each agent extends BaseAgent, which provides shared infrastructure:
// src/lib/agents/base.ts
export abstract class BaseAgent {
protected config: AgentConfig;
protected ai: AIService;
abstract process(input: any): Promise<{ output: any; trace: AgentTrace }>;
protected async callAIJSON<T>(
userPrompt: string,
): Promise<{ parsed: T; trace: AgentTrace }> {
return this.ai.chatJSON<T>({
systemPrompt: this.config.systemPrompt,
userPrompt,
model: this.config.model,
temperature: this.config.temperature,
responseFormat: "json_object",
agentName: this.config.name,
});
}
protected async safeProcess<T>(
processFn: () => Promise<{ output: T; trace: AgentTrace }>,
fallbackOutput: T,
): Promise<{ output: T; trace: AgentTrace }> {
try {
return await processFn();
} catch (error) {
return {
output: fallbackOutput,
trace: { /* error trace with success: false */ },
};
}
}
}
Every agent has safe defaults and fallback outputs. If an agent call fails (timeout, parse error, provider outage), the orchestrator catches it and continues with degraded data rather than crashing the entire pipeline.
The orchestrator runs agents sequentially, passing their outputs forward:
// src/lib/agents/orchestrator.ts
export class InvestigationOrchestrator {
async investigate(context: ContextPack) {
// Step 1: Profile Analyst
const { output: profileOutput, trace: profileTrace } =
await this.runAgent(this.profileAnalyst, { context }, { facts: [] });
// Step 2: Signal Detector (receives analyst facts)
const { output: signalOutput, trace: signalTrace } =
await this.runAgent(this.signalDetector,
{ context, facts: profileOutput.facts },
{ strongSignals: [], hiddenObsessions: [] });
// ... Steps 3-5: Career Predictor, Startup Generator, Roast Agent
// Step 6: Governance Check (with retries)
for (let attempt = 0; attempt <= MAX_GOVERNANCE_RETRIES; attempt++) {
const { output: govOutput, trace: govTrace } =
await this.runAgent(this.governanceAgent, governanceInput, fallback);
if (govOutput.passed) break;
// Sanitize inputs and retry
}
// Step 7: Final Synthesis
const { output: report, trace: synthesisTrace } =
await this.runAgent(this.finalSynthesis, synthesisInput, fallbackReport);
return { report, traces, governanceCheck };
}
}
The governance loop is key: if the governance agent finds violations, the orchestrator sanitizes inputs and retries up to 2 times before accepting the best available result. This creates a self-correcting pipeline.
Context Engineering
Raw profile input comes in many shapes: LinkedIn URLs, GitHub usernames, free-text resumes, Twitter bios. The ContextBuilder class normalizes all of this into a structured ContextPack.
The context pipeline has three phases:
- Extraction — Parse each input source independently, extracting education, work experience, skills, repos, and stats from LinkedIn, GitHub, Twitter, and resume text
- Deduplication — Merge across sources, removing duplicates using composite keys:
private deduplicateEducation(items: Education[]): Education[] {
const seen = new Set<string>();
return items.filter((item) => {
const key = `${item.institution}|${item.degree}|${item.field}`;
if (seen.has(key)) return false;
seen.add(key);
return true;
});
}
- Compression — Remove duplicate lines, normalize whitespace, and calculate compression ratio:
private compressContent(text: string): string {
if (!text || text.length < 500) return text;
const lines = text.split("\n");
const compressed: string[] = [];
const seen = new Set<string>();
for (const line of lines) {
const trimmed = line.trim();
if (!trimmed) continue;
const normalized = trimmed.toLowerCase().replace(/\s+/g, " ");
if (seen.has(normalized)) continue;
seen.add(normalized);
compressed.push(trimmed);
}
return compressed.join("\n");
}
The builder also extracts key signals early — things like years of experience, leadership roles, top companies, open-source popularity, trending focus areas — which get passed to every agent as context. This early signal extraction reduces the burden on each agent to rediscover obvious patterns, saving tokens and improving accuracy.
Structured Outputs
Every agent returns typed JSON. The entire report schema is defined in TypeScript:
// src/lib/types.ts
export interface InvestigationReport {
id: string;
profileHash: string;
digitalProfileSummary: string;
facts: Fact[];
strongSignals: StrongSignal[];
hiddenObsessions: HiddenObsession[];
coworkerQuotes: CoworkerQuote[];
startupParody: StartupParody;
careerPrediction: CareerPrediction;
brutalRoast: Roast[];
wildGuesses: WildGuess[];
finalVerdict: string;
personalityScores: InternetPersonalityScores;
cookedLevel: CookedLevel;
metadata: ReportMetadata;
}
Agents use JSON mode (response_format: { type: "json_object" }) and a chatJSON<T> helper that handles parsing and error recovery:
// src/lib/ai.ts
async chatJSON<T>(
options: AIRequestOptions,
): Promise<{ parsed: T; trace: AgentTrace }> {
const jsonOptions = { ...options, responseFormat: "json_object" };
const response = await this.chat(jsonOptions);
if (!response.trace.success) {
throw new Error(`AI chat failed: ${response.trace.error}`);
}
// Clean markdown code fences from JSON output
const cleaned = response.content
.replace(/```
{% endraw %}
json\s*/gi, "")
.replace(/
{% raw %}
```\s*$/g, "")
.trim();
let parsed: T;
try {
parsed = JSON.parse(cleaned) as T;
} catch (parseError) {
throw new Error(
`Failed to parse JSON response: ${parseError.message}\nRaw: ${response.content}`
);
}
return { parsed, trace: response.trace };
}
Every JSON response goes through parser recovery — the system strips markdown code fences, trims whitespace, and falls back to partial matching if the output is malformed. Combined with the safeProcess pattern, this means a single agent failure never crashes the entire investigation.
Prompt Engineering
Prompts are stored as separate markdown files in prompts/system/ and loaded by a PromptRegistry:
// src/lib/prompts/index.ts
export class PromptRegistry {
private prompts: Map<AgentType, string> = new Map();
async load(): Promise<void> {
const entries = await fs.promises.readdir(PROMPTS_DIR, { withFileTypes: true });
for (const entry of entries.filter(f => f.isFile() && f.name.endsWith(".txt"))) {
const agentType = this.resolveAgentType(entry.name);
if (!agentType) continue;
const content = await fs.promises.readFile(
path.join(PROMPTS_DIR, entry.name), "utf-8"
);
this.prompts.set(agentType, content.trim());
}
}
}
This means prompts can be edited without touching any code — useful for iterating with non-technical stakeholders or A/B testing prompt variants.
Every prompt file follows a consistent structure: Purpose → Version → Expected Inputs → JSON Schema → Step-by-Step Instructions → Failure Modes → Guardrails → Example Outputs → "Why This Matters".
The "Why This Matters" section is the key educational feature. For example, the Roast Agent prompt explains why its design choices are effective:
🎭 Persona-Driven Prompting: The "roast comedian" persona is a deliberate choice — it constrains the model to a specific tone, vocabulary, and ethical framework. Persona prompts are one of the most effective prompt engineering techniques because they activate the model's understanding of social roles.
📎 Specificity = Funniest: The instruction to always reference real profile data is grounded in comedy theory: specific humor outperforms generic humor. "Their GitHub has 47 repos with 0 stars each" is funnier than "they code a lot."
⚖️ The Kindness Gate: Adding a "kindness check" step is an ethical prompt engineering pattern. Rather than a blunt guardrail ("don't be mean"), it frames the check as a social simulation leveraging the model's theory of mind capabilities.
This documentation-first approach turns the prompts directory into an educational resource.
Governance & Safety
The system has two independent safety layers running at different points in the pipeline:
Input Safety (SafetyChecker)
Runs before any agent processes data. Detects four threat categories:
// src/lib/safety/index.ts
export class SafetyChecker {
checkPrompt(input: string): SafetyCheck {
const threats: SafetyThreat[] = [
...this.detectPromptInjection(input),
...this.detectJailbreak(input),
...this.detectPII(input),
];
return { passed: threats.length === 0, threats };
}
}
Prompt injection detection uses 19 regex patterns targeting common escape techniques ("ignore all previous instructions", "you are now...", "DAN", "developer mode"). PII detection catches emails, phone numbers, SSNs, credit cards, addresses, and passport numbers.
Output Governance (GovernanceValidator)
Runs after agents generate content. Checks against 7 prohibited attributes: race, ethnicity, religion, sexual orientation, mental health, medical diagnosis, political affiliation, and criminal activity.
// src/lib/governance/index.ts
export class GovernanceValidator {
validate(report: Partial<InvestigationReport>): GovernanceCheck {
const violations: GovernanceViolation[] = [];
for (const [fieldName, text] of this.extractTextFields(report)) {
violations.push(...this.checkText(text, fieldName));
}
// Also check facts, strong signals, roasts, etc.
return {
passed: violations.length === 0,
violations: this.deduplicateViolations(violations),
checkedAt: new Date().toISOString(),
};
}
sanitize(report, violations): Partial<InvestigationReport> {
// Redact any text matching violation patterns
const redacted = this.redactText(text, violations);
return { ...report, digitalProfileSummary: redacted, ... };
}
}
The governance agent operates in a validate → retry → sanitize loop within the orchestrator. If violations are found, the orchestrator strips offending facts and retries the governance check. If violations persist after MAX_GOVERNANCE_RETRIES, they're sanitized via redaction and recorded as metadata.
Evaluations
The evaluation framework measures every generated report across 5 metrics:
// src/lib/eval/index.ts
computeMetrics(report: InvestigationReport, context: ContextPack) {
return {
json_compliance: this.measureJSONCompliance(reportJson),
consistency: this.measureConsistency(report),
hallucination_rate: this.measureHallucinationRate(report, context),
humor_score: this.measureHumorScore(report),
accuracy: this.measureAccuracy(report, context),
};
}
- JSON Compliance — Checks all required fields exist and match expected types
- Consistency — Measures how well facts align with the summary, career prediction, and timeline
- Hallucination Rate — Compares fact text against source context using word-overlap analysis
- Humor Score — Evaluates variety, intensity distribution, and quality signals in roasts
- Accuracy — Inverse of hallucination rate
The evaluation datasets include 20 profiles across 4 categories (developers, designers, founders, creators), each with expected score ranges and minimum fact counts:
{
"id": "dev-1",
"name": "Senior Full-Stack Developer",
"expectedOutputs": {
"minFacts": 14,
"keySignals": ["strong technical background", "open source contributor"],
"expectedScores": {
"builderScore": { "min": 75, "max": 100 },
"operatorScore": { "min": 60, "max": 90 }
}
}
}
The compareModelsWithConfigs method runs the same dataset against multiple model/provider pairs, generating side-by-side comparisons of latency, cost, and quality scores — enabling data-driven model selection.
Production Features
Cost Tracking
Every agent call records its model, token usage, and estimated cost per model-specific pricing tables:
export class CostTracker {
async recordCost(trace: AgentTrace, investigationId: string): Promise<CostRecord> {
const record = {
provider: trace.provider,
model: trace.model,
promptTokens: trace.tokenUsage.promptTokens,
completionTokens: trace.tokenUsage.completionTokens,
estimatedCost: estimateCost(trace.model,
trace.tokenUsage.promptTokens,
trace.tokenUsage.completionTokens),
timestamp: new Date().toISOString(),
};
}
// Query methods: getCostByProvider(), getCostByModel(), getCostByAgent()
}
Observability
The ObservabilityTracker stores all agent traces and investigations in-memory (with configurable size limits) and optionally forwards them to LangSmith for production monitoring:
export class ObservabilityTracker {
async getTraceStats() {
return {
total, successful, failed,
avgLatency, totalCost
};
}
async getAgentPerformance() {
return Record<agentName, { totalCalls, avgLatency, totalCost, successRate }>;
}
}
Developer Dashboard
A full dashboard at /dashboard exposes real-time stats: per-agent latency and cost breakdowns, success rates, investigation history, and trace export (JSON/CSV). This turns the production data into actionable insights.
What's Next
The architecture is designed for extension. Here's what's on the roadmap:
- Real API integrations — GitHub GraphQL, LinkedIn API, Twitter API for live data instead of pasted text
- Browser extension — One-click investigation from any LinkedIn/GitHub profile page
- New agents — Writing style analyzer, network graph agent, tech stack deep dive
- New providers — Together AI, Groq, Replicate (each is one file)
- OG image generation — Shareable report cards for social media
- Mobile app — React Native wrapper for native sharing
Conclusion
Internet Detective AI demonstrates that a production-grade multi-agent AI system doesn't require a massive team or budget. The key patterns are:
- Provider abstraction so you're never locked into one model
- Safe defaults with graceful degradation so partial failures produce useful results
- Structured outputs with typed schemas so agents produce machine-parseable results
- Governance and safety as architectural layers, not afterthoughts
- Documentation-first prompts that double as educational resources
- Evaluation as a first-class feature for objective quality measurement
The entire codebase is open source at github.com/harishkotra/internet-detective-ai. Each of the 14 engineering topics covered by this project has its own documentation file in docs/.
Code & more: https://www.dailybuild.xyz/project/170-internet-detective-ai
Top comments (0)