TL;DR: I built a 100% private AI tool that learns your writing style and generates content that sounds like you wrote it - all running locally on your machine with no data sent to the cloud.
The Problem I Wanted to Solve
As someone who writes a lot - tweets, emails, blog posts, documentation - I noticed I was spending hours crafting messages in my personal voice. AI tools like ChatGPT can write, sure, but they sound... robotic. Generic. Not me.
I wanted something that could:
- Learn my unique writing style
- Generate content that sounds indistinguishable from what I'd write
- Run 100% locally for complete privacy
- Work with any type of writing samples (tweets, emails, blog posts)
So I built CloneWriter - an AI voice cloning tool powered by local LLMs via Ollama.
Tech Stack
Here's what I used to build this:
- Next.js 14 (App Router) - For the full-stack web app
- TypeScript - Type-safe development
- Ollama - Running LLMs locally (llama3.2, llama3.1, etc.)
- File-based Vector Store - Simple keyword-based retrieval for RAG
- Tailwind CSS + Framer Motion - Beautiful, animated UI
- PapaParse - CSV/JSON parsing
Architecture Overview
The system follows a RAG (Retrieval-Augmented Generation) pattern:
┌─────────────────────────────────────────────┐
│ User Uploads Writing Samples │
│ (CSV, JSON, TXT files) │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Parse & Store Documents │
│ - Extract text from various formats │
│ - Store in file-based vector store │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ User Writes a Prompt │
│ (e.g., "Write a tweet about AI") │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Query Vector Store │
│ - Find relevant writing samples │
│ - Simple keyword matching │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Send to Ollama LLM │
│ - Include context + prompt │
│ - Generate in user's style │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Return Generated Content │
│ - Shows similarity scores │
│ - Displays context used │
└─────────────────────────────────────────────┘
Implementation Deep Dive
1. File Upload & Processing
The first challenge was accepting various file formats. I built a drag-and-drop component that accepts CSV, JSON, and TXT files.
// app/api/upload/route.ts
async function parseCSV(content: string): Promise<string[]> {
return new Promise((resolve, reject) => {
Papa.parse(content, {
header: true,
complete: (results) => {
const texts: string[] = [];
results.data.forEach((row: any) => {
// Look for common text fields
const text = row.Text || row.text || row.Content ||
row.content || row.Message || row.tweet;
// Only include substantial text (>50 chars)
if (text && typeof text === "string" && text.length > 50) {
texts.push(text.trim());
}
});
resolve(texts);
},
error: (error: any) => reject(error),
});
});
}
This handles Twitter exports, LinkedIn data dumps, or any CSV with text columns. The >50 character filter ensures we skip metadata like names or job titles.
2. Vector Store Implementation
Initially, I planned to use ChromaDB, but I simplified to a file-based system with keyword matching. For most use cases, this works surprisingly well:
// lib/vector-store.server.ts
export async function queryDocuments(
query: string,
nResults: number = 4
): Promise<{ documents: string[]; metadatas: any[]; distances: number[] }> {
const documents = await loadDocuments();
// Simple keyword matching
const queryLower = query.toLowerCase();
const queryWords = queryLower.split(/\s+/);
const results = documents
.map((doc) => {
const textLower = doc.text.toLowerCase();
// Score based on how many query words appear
const score = queryWords.filter(word =>
textLower.includes(word)
).length / queryWords.length;
return { doc, score };
})
.sort((a, b) => b.score - a.score)
.slice(0, nResults);
// Always return at least some documents
const finalResults = results.length > 0
? results
: documents.slice(0, nResults).map(doc => ({ doc, score: 0.3 }));
return {
documents: finalResults.map((r) => r.doc.text),
metadatas: finalResults.map((r) => r.doc.metadata || {}),
distances: finalResults.map((r) => 1 - r.score),
};
}
This approach:
- Splits the query into words
- Scores documents based on keyword overlap
- Returns top N matches
- Falls back to random samples if no matches found
3. Ollama Integration
The magic happens when we send the retrieved context to Ollama. The system prompt is crucial here:
// lib/ollama.ts
export async function generateWithOllama(
prompt: string,
context: string,
options: GenerationOptions = {}
): Promise<string> {
const ollama = new Ollama({
host: process.env.OLLAMA_HOST || "http://localhost:11434",
});
const systemPrompt = `You are an AI assistant that writes in the EXACT style and voice of your user.
Your task is to generate new content that sounds indistinguishable from how the user would write it.
STUDY THESE WRITING SAMPLES FROM THE USER:
---
${context}
---
CRITICAL INSTRUCTIONS:
- Match the writing style, tone, vocabulary, and sentence structure EXACTLY
- Use the same level of formality/informality as the user
- Adopt their typical sentence length and punctuation style
- Use their characteristic phrases and expressions
- Write as if the user themselves wrote this - no AI disclaimers or meta-commentary
- Be natural, authentic, and consistent with their voice`;
const fullPrompt = `${systemPrompt}\n\nTASK TO WRITE IN USER'S VOICE:\n"${prompt}"\n\nNow write exactly as the user would:`;
const response = await ollama.generate({
model: MODEL_NAME,
prompt: fullPrompt,
options: {
temperature,
num_predict: max_tokens,
top_p,
},
stream: false,
});
return response.response;
}
The key is providing:
- Rich context from similar writing samples
- Clear instructions to mimic style exactly
- No meta-commentary - just write as the user would
4. Generation Settings
I exposed three main controls to fine-tune generation:
// Temperature (0.0 - 1.0)
// - Low (0.3): More focused, consistent
// - Medium (0.7): Balanced
// - High (0.9): Creative, diverse
// Max Tokens (100 - 2000)
// - Controls length of output
// Top P (0.0 - 1.0)
// - Controls diversity of word choices
I also created preset modes:
// lib/utils.ts
export const GENERATION_MODES = {
creative: {
label: "Creative",
temperature: 0.9,
top_p: 0.95,
description: "More diverse and creative outputs"
},
balanced: {
label: "Balanced",
temperature: 0.7,
top_p: 0.9,
description: "Good all-around performance"
},
precise: {
label: "Precise",
temperature: 0.3,
top_p: 0.7,
description: "More focused and consistent"
}
};
5. The Main UI
The frontend is built as a single-page Next.js app with several key sections:
// app/page.tsx
export default function Home() {
const [prompt, setPrompt] = useState("");
const [temperature, setTemperature] = useState(0.7);
const [generating, setGenerating] = useState(false);
const [generatedText, setGeneratedText] = useState("");
const handleGenerate = async () => {
setGenerating(true);
const response = await fetch("/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
prompt,
temperature,
max_tokens: maxTokens,
top_p: topP,
}),
});
const data = await response.json();
setGeneratedText(data.response);
setContext(data.context);
};
return (
<div className="container">
<FileUpload onUploadComplete={onComplete} />
<PromptLibrary onSelectPrompt={setPrompt} />
<textarea value={prompt} onChange={e => setPrompt(e.target.value)} />
<GenerationSettings {...settings} />
<button onClick={handleGenerate}>Generate</button>
{generatedText && <OutputDisplay text={generatedText} />}
</div>
);
}
6. API Route - Generation
The generation API route ties everything together:
// app/api/generate/route.ts
export async function POST(request: NextRequest) {
const { prompt, temperature, max_tokens, top_p } = await request.json();
// 1. Query vector store for context
const { documents, metadatas, distances } = await queryDocuments(prompt, 4);
if (documents.length === 0) {
return NextResponse.json({
error: "No documents found. Please upload writing samples first."
}, { status: 400 });
}
// 2. Combine context
const context = documents.join("\n\n---\n\n");
// 3. Generate with Ollama
const response = await generateWithOllama(prompt, context, {
temperature,
max_tokens,
top_p,
});
// 4. Return results
return NextResponse.json({
success: true,
response,
context: {
chunks: documents,
metadatas,
distances,
},
});
}
Features I'm Proud Of
1. Prompt Library
Instead of making users start from scratch, I built a categorized prompt library:
export const DEFAULT_PROMPTS = {
"Social Media": [
"Write a tweet about AI and privacy",
"Create a LinkedIn post about remote work",
"Draft an Instagram caption for a travel photo",
],
"Professional": [
"Write an email to my team about project updates",
"Draft a professional introduction for a networking event",
"Create a project proposal summary",
],
"Creative": [
"Write a short story opening",
"Create a blog post introduction",
"Write a product description",
],
"Personal": [
"Write a thank you note",
"Draft a birthday message",
"Create a personal bio",
],
};
2. Session History
Every generation is saved locally (in localStorage) so users can revisit, copy, or regenerate previous outputs:
// lib/history-store.ts
export function saveToHistory(entry: GenerationEntry): GenerationEntry {
const history = loadHistory();
const newEntry = {
...entry,
id: crypto.randomUUID(),
timestamp: Date.now(),
};
history.unshift(newEntry);
localStorage.setItem('generation_history', JSON.stringify(history.slice(0, 50)));
return newEntry;
}
3. Context Visualization
Users can see exactly which writing samples were used to generate their content:
{context && (
<div>
<button onClick={() => setShowContext(!showContext)}>
Show Retrieved Context ({context.chunks.length} chunks)
</button>
{showContext && (
<div>
{context.chunks.map((chunk: string, i: number) => (
<div key={i}>
<span>Context {i + 1}</span>
<span>Similarity: {((1 - context.distances[i]) * 100).toFixed(1)}%</span>
<p>{chunk}</p>
</div>
))}
</div>
)}
</div>
)}
4. Theme Support
Built-in dark/light mode that respects system preferences:
// components/ThemeProvider.tsx
export function ThemeProvider({ children }: { children: React.ReactNode }) {
const [theme, setTheme] = useState<'light' | 'dark'>('dark');
useEffect(() => {
const stored = localStorage.getItem('theme') as 'light' | 'dark';
const system = window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
setTheme(stored || system);
}, []);
useEffect(() => {
document.documentElement.classList.toggle('dark', theme === 'dark');
localStorage.setItem('theme', theme);
}, [theme]);
return (
<ThemeContext.Provider value={{ theme, setTheme }}>
{children}
</ThemeContext.Provider>
);
}
Performance Considerations
Model Choice
I recommend llama3.2:3b for fast responses (1-3 seconds on M1 Mac):
ollama pull llama3.2:3b
For better quality, use llama3.1:8b (5-10 seconds):
ollama pull llama3.1:8b
Timeout Handling
Generation has a 120-second timeout to handle larger models:
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 120000);
const response = await fetch("/api/generate", {
signal: controller.signal,
});
clearTimeout(timeoutId);
Batch Processing
Files are processed in batches to avoid memory issues:
const batchSize = 100;
for (let i = 0; i < allDocuments.length; i += batchSize) {
const batch = allDocuments.slice(i, i + batchSize);
await addDocuments(batch);
}
Privacy First
Everything runs locally:
- No external APIs - All processing happens on your machine
- No telemetry - No tracking, analytics, or data collection
-
Local storage - Files stored in
./data/uploadsand./vector_store - No accounts - No sign-up, no authentication
Your writing samples never leave your computer.
Deployment Options
Local Development
# Install dependencies
npm install --legacy-peer-deps
# Start Ollama
ollama serve
ollama pull llama3.2:3b
# Run the app
npm run dev
Docker (Recommended)
I created a Dockerfile that bundles everything:
FROM node:20-alpine
# Install Ollama
RUN apk add --no-cache curl
RUN curl -fsSL https://ollama.com/install.sh | sh
# Copy app files
COPY . /app
WORKDIR /app
# Install dependencies & build
RUN npm install --legacy-peer-deps
RUN npm run build
# Expose ports
EXPOSE 3000 11434
# Start both Ollama and Next.js
CMD ollama serve & npm start
Cloud Deployment
For hosting on Fly.io, DigitalOcean, or AWS:
fly launch
fly deploy
Note: Choose instances with at least 8GB RAM for decent model performance.
What I Learned
- RAG is powerful - Even simple keyword matching works surprisingly well when you have good context
- Prompt engineering matters - The system prompt makes or breaks the output quality
- Local LLMs are viable - Ollama makes it trivial to run models locally
- Privacy sells - Users love that their data never leaves their machine
- UX is everything - A beautiful UI makes the difference between a demo and a product
Future Improvements
Things I want to add:
- Better embeddings - Use Xenova/transformers.js for semantic search
- Fine-tuning - Allow users to fine-tune models on their writing
- Multi-user support - Separate vector stores for different writing personas
- Export options - Download generated content as markdown, PDF, etc.
- Analytics dashboard - Show writing style metrics (avg sentence length, vocabulary richness, etc.)
- Voice comparison - Side-by-side comparison of your writing vs generated content
Try It Yourself
The entire project is open source:
GitHub: harishkotra/clonewriter
To get started:
git clone https://github.com/harishkotra/clonewriter
cd clonewriter
npm install --legacy-peer-deps
# Start Ollama
ollama serve
ollama pull llama3.2:3b
# Run the app
npm run dev
Visit http://localhost:3000 and upload some of your writing samples!
The key insight: you don't need fancy vector databases or cloud APIs to build something useful. A file-based system with keyword matching, combined with a good LLM and smart prompting, gets you 90% of the way there.
If you're interested in AI, privacy, or just want to clone your writing voice, give it a try!
Questions? Feedback? Open an issue on GitHub or reach out on Twitter @harishkotra
Built with: Next.js, TypeScript, Ollama, Tailwind CSS
License: MIT
Top comments (0)