A beginner-to-advanced guide to building real-world AI applications using modern APIs and tools.
Introduction
A year ago, "AI" felt like something reserved for researchers with massive GPU clusters and decades of experience. Today, you can build a production-ready AI-powered app in an afternoon.
This tutorial walks you through building a real AI application from scratch — a smart document summarizer — using the Anthropic Claude API. No machine learning theory required. Whether you're a curious beginner or a seasoned backend dev who's never touched AI, this guide is for you.
By the end, you'll understand:
- How to call an AI API from your own app
- How to structure prompts for reliable, high-quality results
- How to handle streaming responses for a better UX
- How to think about AI integration at different levels of complexity
What We're Building
A Document Summarizer web app that:
- Accepts any text input (paste an article, paste a legal doc, paste anything)
- Sends it to Claude via the Anthropic API
- Returns a structured summary with key points, tone, and a one-line TLDR
- Streams the response token-by-token (like ChatGPT does)
Let's go.
Prerequisites
- Node.js 18+ installed
- A free Anthropic API key → console.anthropic.com
- Basic familiarity with JavaScript / TypeScript
That's it. No GPU. No Python. No ML frameworks.
Step 1: Set Up Your Project
mkdir ai-summarizer && cd ai-summarizer
npm init -y
npm install @anthropic-ai/sdk express dotenv
Create a .env file:
ANTHROPIC_API_KEY=your_api_key_here
And a basic server.js:
import Anthropic from "@anthropic-ai/sdk";
import express from "express";
import dotenv from "dotenv";
dotenv.config();
const app = express();
app.use(express.json());
const client = new Anthropic();
app.listen(3000, () => console.log("Server running on port 3000"));
Step 2: Understand the Anatomy of a Prompt
This is where most tutorials gloss over the most important part. Prompt engineering is the skill that separates a flaky AI feature from a reliable one.
A prompt has three parts:
| Part | Purpose | Example |
|---|---|---|
| System prompt | Sets the AI's role and rules | "You are a document analyst. Always respond in JSON." |
| User message | The actual input | The text to summarize |
| Constraints | Format, length, tone guardrails | "Respond with 3 bullet points max." |
The Golden Rules of Prompting
1. Be specific, not vague
❌ "Summarize this text"
✅ "Summarize this text into: (1) a one-sentence TLDR, (2) 3-5 bullet point key insights, (3) the overall tone (formal/casual/technical)."
2. Tell it what format to return
If you want JSON back, say so explicitly. Claude will comply.
3. Give it a persona
"You are a senior editor at The Economist" produces very different results than just asking for a summary.
4. Use XML tags for complex inputs
Summarize the following document:
<document>
{{USER_TEXT}}
</document>
Respond only with valid JSON.
This helps the model clearly separate instructions from content — especially important with long documents.
Step 3: Build the Summarize Endpoint
app.post("/summarize", async (req, res) => {
const { text } = req.body;
if (!text || text.trim().length === 0) {
return res.status(400).json({ error: "No text provided" });
}
const systemPrompt = `You are a world-class editor and analyst.
Your job is to produce concise, accurate document summaries.
Always respond with valid JSON in this exact shape:
{
"tldr": "one sentence summary",
"key_points": ["point 1", "point 2", "point 3"],
"tone": "formal | casual | technical | emotional",
"word_count_estimate": 123
}`;
try {
const message = await client.messages.create({
model: "claude-opus-4-5",
max_tokens: 1024,
system: systemPrompt,
messages: [
{
role: "user",
content: `Summarize the following document:\n\n<document>\n${text}\n</document>`,
},
],
});
const raw = message.content[0].text;
const parsed = JSON.parse(raw);
res.json(parsed);
} catch (err) {
console.error(err);
res.status(500).json({ error: "Something went wrong" });
}
});
Test it:
curl -X POST http://localhost:3000/summarize \
-H "Content-Type: application/json" \
-d '{"text": "Artificial intelligence is transforming every industry..."}'
You'll get back structured JSON — every time. That's the power of a well-crafted prompt.
Step 4: Add Streaming for Better UX
Nobody wants to stare at a spinner for 10 seconds. Streaming lets you show the response as it's being generated — token by token.
app.post("/summarize-stream", async (req, res) => {
const { text } = req.body;
res.setHeader("Content-Type", "text/event-stream");
res.setHeader("Cache-Control", "no-cache");
res.setHeader("Connection", "keep-alive");
const stream = await client.messages.stream({
model: "claude-opus-4-5",
max_tokens: 1024,
system: "You are a concise document summarizer. Write clearly and directly.",
messages: [
{
role: "user",
content: `Summarize this:\n\n${text}`,
},
],
});
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
res.write(`data: ${chunk.delta.text}\n\n`);
}
}
res.write("data: [DONE]\n\n");
res.end();
});
On the frontend, consume it with EventSource or fetch with a readable stream. The UX difference is dramatic.
Step 5: Level Up — Context Windows and Long Documents
Here's where it gets interesting for intermediate/senior devs.
Claude has a 200,000 token context window. That's roughly 150,000 words — longer than most novels. But you still need to be smart about what you send.
Chunking Strategy for Very Long Docs
If a document exceeds your comfortable token budget, chunk it:
function chunkText(text, maxChars = 50000) {
const chunks = [];
let start = 0;
while (start < text.length) {
let end = Math.min(start + maxChars, text.length);
// Try to break at a paragraph boundary
const lastParagraph = text.lastIndexOf("\n\n", end);
if (lastParagraph > start) end = lastParagraph;
chunks.push(text.slice(start, end));
start = end;
}
return chunks;
}
Then summarize each chunk, and do a final "summary of summaries" pass. This is called the Map-Reduce prompting pattern and it's used in production by many serious AI apps.
Step 6: Error Handling and Rate Limits
Production AI apps fail gracefully. Here's what to handle:
import Anthropic from "@anthropic-ai/sdk";
async function callWithRetry(fn, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await fn();
} catch (err) {
if (err instanceof Anthropic.RateLimitError) {
const delay = Math.pow(2, i) * 1000; // Exponential backoff
console.log(`Rate limited. Waiting ${delay}ms...`);
await new Promise((r) => setTimeout(r, delay));
} else if (err instanceof Anthropic.APIError) {
console.error("API error:", err.status, err.message);
throw err; // Don't retry on non-rate-limit errors
} else {
throw err;
}
}
}
throw new Error("Max retries exceeded");
}
Key error types to handle:
-
RateLimitError→ Retry with backoff -
APIError(4xx) → Likely a bad prompt or input; don't retry -
APIConnectionError→ Network issue; retry - JSON parse failures → Your prompt didn't enforce the format well enough; refine it
What's Next?
You've built a working AI app. Here's where to go from here:
🔵 Beginner next steps
- Add a simple HTML frontend with a textarea + button
- Try different system prompts and see how the output changes
- Experiment with temperature (coming in future API versions)
🟡 Intermediate next steps
- Add conversation history (multi-turn chat)
- Use tool use / function calling to let Claude trigger real actions in your app
- Implement semantic caching to reduce API costs
🔴 Advanced next steps
- Build an agent loop where Claude can take actions, observe results, and retry
- Add RAG (Retrieval Augmented Generation) with a vector database
- Fine-tune prompts using systematic evaluation pipelines
Key Takeaways
- Prompts are code. Treat them with the same rigor as your application logic. Version-control them.
- Structured output is your friend. Always ask for JSON when you need to parse the result.
- Streaming dramatically improves perceived performance. Use it for anything that takes more than 2 seconds.
- Start simple, then layer complexity. A direct API call beats a complicated agent system until you actually need the agent.
- Error handling matters. Rate limits and timeouts will happen in production.
Resources
Written by a developer, for developers. Drop any questions in the comments — I read them all.
Top comments (0)