Harish Kotra (he/him)

Posted on Nov 5

A Privacy-First AI Voice Cloning Tool with Local LLMs

#ollama #programming #ai #javascript

TL;DR: I built a 100% private AI tool that learns your writing style and generates content that sounds like you wrote it - all running locally on your machine with no data sent to the cloud.

The Problem I Wanted to Solve

As someone who writes a lot - tweets, emails, blog posts, documentation - I noticed I was spending hours crafting messages in my personal voice. AI tools like ChatGPT can write, sure, but they sound... robotic. Generic. Not me.

I wanted something that could:

Learn my unique writing style
Generate content that sounds indistinguishable from what I'd write
Run 100% locally for complete privacy
Work with any type of writing samples (tweets, emails, blog posts)

So I built CloneWriter - an AI voice cloning tool powered by local LLMs via Ollama.

Tech Stack

Here's what I used to build this:

Next.js 14 (App Router) - For the full-stack web app
TypeScript - Type-safe development
Ollama - Running LLMs locally (llama3.2, llama3.1, etc.)
File-based Vector Store - Simple keyword-based retrieval for RAG
Tailwind CSS + Framer Motion - Beautiful, animated UI
PapaParse - CSV/JSON parsing

Architecture Overview

The system follows a RAG (Retrieval-Augmented Generation) pattern:

┌─────────────────────────────────────────────┐
│  User Uploads Writing Samples               │
│  (CSV, JSON, TXT files)                     │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│  Parse & Store Documents                    │
│  - Extract text from various formats        │
│  - Store in file-based vector store         │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│  User Writes a Prompt                       │
│  (e.g., "Write a tweet about AI")           │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│  Query Vector Store                         │
│  - Find relevant writing samples            │
│  - Simple keyword matching                  │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│  Send to Ollama LLM                         │
│  - Include context + prompt                 │
│  - Generate in user's style                 │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│  Return Generated Content                   │
│  - Shows similarity scores                  │
│  - Displays context used                    │
└─────────────────────────────────────────────┘

Implementation Deep Dive

1. File Upload & Processing

The first challenge was accepting various file formats. I built a drag-and-drop component that accepts CSV, JSON, and TXT files.

// app/api/upload/route.ts
async function parseCSV(content: string): Promise<string[]> {
  return new Promise((resolve, reject) => {
    Papa.parse(content, {
      header: true,
      complete: (results) => {
        const texts: string[] = [];
        results.data.forEach((row: any) => {
          // Look for common text fields
          const text = row.Text || row.text || row.Content ||
                      row.content || row.Message || row.tweet;

          // Only include substantial text (>50 chars)
          if (text && typeof text === "string" && text.length > 50) {
            texts.push(text.trim());
          }
        });
        resolve(texts);
      },
      error: (error: any) => reject(error),
    });
  });
}

This handles Twitter exports, LinkedIn data dumps, or any CSV with text columns. The >50 character filter ensures we skip metadata like names or job titles.

2. Vector Store Implementation

Initially, I planned to use ChromaDB, but I simplified to a file-based system with keyword matching. For most use cases, this works surprisingly well:

// lib/vector-store.server.ts
export async function queryDocuments(
  query: string,
  nResults: number = 4
): Promise<{ documents: string[]; metadatas: any[]; distances: number[] }> {
  const documents = await loadDocuments();

  // Simple keyword matching
  const queryLower = query.toLowerCase();
  const queryWords = queryLower.split(/\s+/);

  const results = documents
    .map((doc) => {
      const textLower = doc.text.toLowerCase();
      // Score based on how many query words appear
      const score = queryWords.filter(word =>
        textLower.includes(word)
      ).length / queryWords.length;
      return { doc, score };
    })
    .sort((a, b) => b.score - a.score)
    .slice(0, nResults);

  // Always return at least some documents
  const finalResults = results.length > 0
    ? results
    : documents.slice(0, nResults).map(doc => ({ doc, score: 0.3 }));

  return {
    documents: finalResults.map((r) => r.doc.text),
    metadatas: finalResults.map((r) => r.doc.metadata || {}),
    distances: finalResults.map((r) => 1 - r.score),
  };
}

This approach:

Splits the query into words
Scores documents based on keyword overlap
Returns top N matches
Falls back to random samples if no matches found

3. Ollama Integration

The magic happens when we send the retrieved context to Ollama. The system prompt is crucial here:

// lib/ollama.ts
export async function generateWithOllama(
  prompt: string,
  context: string,
  options: GenerationOptions = {}
): Promise<string> {
  const ollama = new Ollama({
    host: process.env.OLLAMA_HOST || "http://localhost:11434",
  });

  const systemPrompt = `You are an AI assistant that writes in the EXACT style and voice of your user.
Your task is to generate new content that sounds indistinguishable from how the user would write it.

STUDY THESE WRITING SAMPLES FROM THE USER:
---
${context}
---

CRITICAL INSTRUCTIONS:
- Match the writing style, tone, vocabulary, and sentence structure EXACTLY
- Use the same level of formality/informality as the user
- Adopt their typical sentence length and punctuation style
- Use their characteristic phrases and expressions
- Write as if the user themselves wrote this - no AI disclaimers or meta-commentary
- Be natural, authentic, and consistent with their voice`;

  const fullPrompt = `${systemPrompt}\n\nTASK TO WRITE IN USER'S VOICE:\n"${prompt}"\n\nNow write exactly as the user would:`;

  const response = await ollama.generate({
    model: MODEL_NAME,
    prompt: fullPrompt,
    options: {
      temperature,
      num_predict: max_tokens,
      top_p,
    },
    stream: false,
  });

  return response.response;
}

The key is providing:

Rich context from similar writing samples
Clear instructions to mimic style exactly
No meta-commentary - just write as the user would

4. Generation Settings

I exposed three main controls to fine-tune generation:

// Temperature (0.0 - 1.0)
// - Low (0.3): More focused, consistent
// - Medium (0.7): Balanced
// - High (0.9): Creative, diverse

// Max Tokens (100 - 2000)
// - Controls length of output

// Top P (0.0 - 1.0)
// - Controls diversity of word choices

I also created preset modes:

// lib/utils.ts
export const GENERATION_MODES = {
  creative: {
    label: "Creative",
    temperature: 0.9,
    top_p: 0.95,
    description: "More diverse and creative outputs"
  },
  balanced: {
    label: "Balanced",
    temperature: 0.7,
    top_p: 0.9,
    description: "Good all-around performance"
  },
  precise: {
    label: "Precise",
    temperature: 0.3,
    top_p: 0.7,
    description: "More focused and consistent"
  }
};

5. The Main UI

The frontend is built as a single-page Next.js app with several key sections:

// app/page.tsx
export default function Home() {
  const [prompt, setPrompt] = useState("");
  const [temperature, setTemperature] = useState(0.7);
  const [generating, setGenerating] = useState(false);
  const [generatedText, setGeneratedText] = useState("");

  const handleGenerate = async () => {
    setGenerating(true);

    const response = await fetch("/api/generate", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        prompt,
        temperature,
        max_tokens: maxTokens,
        top_p: topP,
      }),
    });

    const data = await response.json();
    setGeneratedText(data.response);
    setContext(data.context);
  };

  return (
    <div className="container">
      <FileUpload onUploadComplete={onComplete} />
      <PromptLibrary onSelectPrompt={setPrompt} />
      <textarea value={prompt} onChange={e => setPrompt(e.target.value)} />
      <GenerationSettings {...settings} />
      <button onClick={handleGenerate}>Generate</button>
      {generatedText && <OutputDisplay text={generatedText} />}
    </div>
  );
}

6. API Route - Generation

The generation API route ties everything together:

// app/api/generate/route.ts
export async function POST(request: NextRequest) {
  const { prompt, temperature, max_tokens, top_p } = await request.json();

  // 1. Query vector store for context
  const { documents, metadatas, distances } = await queryDocuments(prompt, 4);

  if (documents.length === 0) {
    return NextResponse.json({
      error: "No documents found. Please upload writing samples first."
    }, { status: 400 });
  }

  // 2. Combine context
  const context = documents.join("\n\n---\n\n");

  // 3. Generate with Ollama
  const response = await generateWithOllama(prompt, context, {
    temperature,
    max_tokens,
    top_p,
  });

  // 4. Return results
  return NextResponse.json({
    success: true,
    response,
    context: {
      chunks: documents,
      metadatas,
      distances,
    },
  });
}

Features I'm Proud Of

1. Prompt Library

Instead of making users start from scratch, I built a categorized prompt library:

export const DEFAULT_PROMPTS = {
  "Social Media": [
    "Write a tweet about AI and privacy",
    "Create a LinkedIn post about remote work",
    "Draft an Instagram caption for a travel photo",
  ],
  "Professional": [
    "Write an email to my team about project updates",
    "Draft a professional introduction for a networking event",
    "Create a project proposal summary",
  ],
  "Creative": [
    "Write a short story opening",
    "Create a blog post introduction",
    "Write a product description",
  ],
  "Personal": [
    "Write a thank you note",
    "Draft a birthday message",
    "Create a personal bio",
  ],
};

2. Session History

Every generation is saved locally (in localStorage) so users can revisit, copy, or regenerate previous outputs:

// lib/history-store.ts
export function saveToHistory(entry: GenerationEntry): GenerationEntry {
  const history = loadHistory();
  const newEntry = {
    ...entry,
    id: crypto.randomUUID(),
    timestamp: Date.now(),
  };

  history.unshift(newEntry);
  localStorage.setItem('generation_history', JSON.stringify(history.slice(0, 50)));

  return newEntry;
}

3. Context Visualization

Users can see exactly which writing samples were used to generate their content:

{context && (
  <div>
    <button onClick={() => setShowContext(!showContext)}>
      Show Retrieved Context ({context.chunks.length} chunks)
    </button>
    {showContext && (
      <div>
        {context.chunks.map((chunk: string, i: number) => (
          <div key={i}>
            <span>Context {i + 1}</span>
            <span>Similarity: {((1 - context.distances[i]) * 100).toFixed(1)}%</span>
            <p>{chunk}</p>
          </div>
        ))}
      </div>
    )}
  </div>
)}

4. Theme Support

Built-in dark/light mode that respects system preferences:

// components/ThemeProvider.tsx
export function ThemeProvider({ children }: { children: React.ReactNode }) {
  const [theme, setTheme] = useState<'light' | 'dark'>('dark');

  useEffect(() => {
    const stored = localStorage.getItem('theme') as 'light' | 'dark';
    const system = window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
    setTheme(stored || system);
  }, []);

  useEffect(() => {
    document.documentElement.classList.toggle('dark', theme === 'dark');
    localStorage.setItem('theme', theme);
  }, [theme]);

  return (
    <ThemeContext.Provider value={{ theme, setTheme }}>
      {children}
    </ThemeContext.Provider>
  );
}

Performance Considerations

Model Choice

I recommend llama3.2:3b for fast responses (1-3 seconds on M1 Mac):

ollama pull llama3.2:3b

For better quality, use llama3.1:8b (5-10 seconds):

ollama pull llama3.1:8b

Timeout Handling

Generation has a 120-second timeout to handle larger models:

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 120000);

const response = await fetch("/api/generate", {
  signal: controller.signal,
});

clearTimeout(timeoutId);

Batch Processing

Files are processed in batches to avoid memory issues:

const batchSize = 100;
for (let i = 0; i < allDocuments.length; i += batchSize) {
  const batch = allDocuments.slice(i, i + batchSize);
  await addDocuments(batch);
}

Privacy First

Everything runs locally:

No external APIs - All processing happens on your machine
No telemetry - No tracking, analytics, or data collection
Local storage - Files stored in ./data/uploads and ./vector_store
No accounts - No sign-up, no authentication

Your writing samples never leave your computer.

Deployment Options

Local Development

# Install dependencies
npm install --legacy-peer-deps

# Start Ollama
ollama serve
ollama pull llama3.2:3b

# Run the app
npm run dev

Docker (Recommended)

I created a Dockerfile that bundles everything:

FROM node:20-alpine

# Install Ollama
RUN apk add --no-cache curl
RUN curl -fsSL https://ollama.com/install.sh | sh

# Copy app files
COPY . /app
WORKDIR /app

# Install dependencies & build
RUN npm install --legacy-peer-deps
RUN npm run build

# Expose ports
EXPOSE 3000 11434

# Start both Ollama and Next.js
CMD ollama serve & npm start

Cloud Deployment

For hosting on Fly.io, DigitalOcean, or AWS:

fly launch
fly deploy

Note: Choose instances with at least 8GB RAM for decent model performance.

What I Learned

RAG is powerful - Even simple keyword matching works surprisingly well when you have good context
Prompt engineering matters - The system prompt makes or breaks the output quality
Local LLMs are viable - Ollama makes it trivial to run models locally
Privacy sells - Users love that their data never leaves their machine
UX is everything - A beautiful UI makes the difference between a demo and a product

Future Improvements

Things I want to add:

Better embeddings - Use Xenova/transformers.js for semantic search
Fine-tuning - Allow users to fine-tune models on their writing
Multi-user support - Separate vector stores for different writing personas
Export options - Download generated content as markdown, PDF, etc.
Analytics dashboard - Show writing style metrics (avg sentence length, vocabulary richness, etc.)
Voice comparison - Side-by-side comparison of your writing vs generated content

Try It Yourself

The entire project is open source:

GitHub: harishkotra/clonewriter

To get started:

git clone https://github.com/harishkotra/clonewriter
cd clonewriter
npm install --legacy-peer-deps

# Start Ollama
ollama serve
ollama pull llama3.2:3b

# Run the app
npm run dev

Visit http://localhost:3000 and upload some of your writing samples!

The key insight: you don't need fancy vector databases or cloud APIs to build something useful. A file-based system with keyword matching, combined with a good LLM and smart prompting, gets you 90% of the way there.

If you're interested in AI, privacy, or just want to clone your writing voice, give it a try!

Questions? Feedback? Open an issue on GitHub or reach out on Twitter @harishkotra

Built with: Next.js, TypeScript, Ollama, Tailwind CSS
License: MIT

DEV Community