DEV Community

Cover image for TiDB for AI Memory: Vector Search, HTAP, and Horizontal Scaling in One Database
Victory Lucky
Victory Lucky

Posted on

TiDB for AI Memory: Vector Search, HTAP, and Horizontal Scaling in One Database

Most AI applications today run on a fragmented stack. Pinecone stores your vectors. PostgreSQL holds your user data. Redis caches frequently accessed information. Snowflake runs your analytics. You write synchronization logic, debug consistency issues, and pay for data transfer between systems.

There is a different approach. TiDB consolidates this entire stack into one database that scales horizontally. This is not theoretical. This article shows you exactly how to build production-grade AI memory systems using TiDB, with complete code examples and real architecture patterns.

If you are new to AI memory systems in general, read AI Memory Systems: Everything You Need to Know first. This article assumes you understand the fundamentals (episodic, semantic, and working memory) and focuses on TiDB-specific implementation.

Why TiDB for AI Memory?

Before we get into code, let me explain why TiDB's architecture solves problems that other databases cannot.

The Typical AI Memory Stack Problem

Most teams building AI applications end up with something like this:

Vector Storage: Pinecone or Weaviate for embeddings

Relational Data: PostgreSQL for user profiles and structured data

Analytics: Snowflake or ClickHouse for usage patterns

Caching: Redis for frequently accessed memories

Synchronization: Custom ETL pipelines to keep everything consistent

This architecture has serious problems:

Consistency nightmares: Your vector database has version 1 of a memory. Your PostgreSQL has version 2. Which one is correct?

Latency issues: Retrieving memories means querying Pinecone, then PostgreSQL, then joining the results in your application. Each hop adds 20-50ms.

Scaling headaches: When you add users, you need to scale each database independently and rewrite your sync logic.

Cost explosion: You pay for three or four databases, plus data transfer fees between them.

TiDB's Unified Approach

TiDB gives you one database that handles everything:

Native vector search: Store embeddings with up to 16,383 dimensions using the VECTOR data type. No separate vector database needed.

HTAP architecture: Run transactional queries (storing memories) and analytical queries (analyzing memory patterns) on the same data without ETL pipelines. TiKV handles row-based storage for fast transactional writes. TiFlash provides columnar storage for analytical queries.

Horizontal scaling: Add nodes when your memory table grows. TiDB automatically redistributes data across the cluster.

MySQL compatibility: Your existing ORMs, tools, and team knowledge transfer directly. No new query language to learn.

This is the consolidation bet. One database that does multiple jobs well beats maintaining multiple specialized databases.

TiDB Architecture for AI Memory

Let me explain how TiDB's components work together for AI memory workloads.

TiDB Server: The SQL Layer

TiDB Server is stateless. It parses SQL, optimizes queries, and coordinates execution across TiKV and TiFlash. For AI memory, this means:

  • You write a query that joins semantic memories with user profiles
  • TiDB's optimizer decides whether to read from TiKV (row storage) or TiFlash (columnar storage)
  • The result comes back as standard MySQL result sets

You can scale TiDB servers horizontally to handle more concurrent queries without touching your data storage layer.

TiKV: Transactional Row Storage

TiKV stores data in row format using RocksDB as its storage engine. It handles all writes and provides strong consistency through the Raft consensus algorithm.

For AI memory, TiKV is where your memories live:

  • New memories get written to TiKV
  • Point queries (get memory by ID) read from TiKV
  • Vector search queries scan TiKV when retrieving recent memories

TiKV automatically shards data into Regions (96MB by default) and distributes them across nodes. When your memory table grows to 100GB or 1TB, TiKV handles the scaling transparently.

TiFlash: Analytical Columnar Storage

TiFlash replicates data from TiKV in columnar format. It synchronizes changes in real time using Raft Learner consensus.

For AI memory, TiFlash enables analytics without impacting your transactional workload:

  • Which memories are retrieved most often?
  • How does memory usage grow over time?
  • What percentage of memories are episodic vs semantic?
  • Which users have the most stored context?

These analytical queries run on TiFlash while your application continues writing memories to TiKV with zero performance impact.

Placement Driver (PD): The Orchestrator

PD manages cluster metadata and coordinates data placement. For AI memory systems, PD's placement rules let you do advanced things like:

  • Store recent memories (hot data) on fast NVMe storage
  • Move old memories (cold data) to cheaper SSD storage
  • Replicate critical memories across multiple availability zones

Setting Up TiDB for AI Memory

Let me walk you through the complete setup, from cluster creation to schema design.

Option 1: TiDB Cloud (Recommended for Getting Started)

The fastest way to get started is TiDB Cloud:

  1. Sign up at tidbcloud.com
  2. Create a Serverless cluster (free tier available)
  3. Note your connection string
# Connection string format
mysql -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 \
  -u '<your-username>' -p'<your-password>' --ssl-mode=VERIFY_IDENTITY \
  --ssl-ca=/path/to/ca.pem
Enter fullscreen mode Exit fullscreen mode

TiDB Cloud automatically provisions TiDB, TiKV, and TiFlash. You do not need to manage infrastructure.

Option 2: Local Development with TiUP

For local testing, use TiUP Playground:

# Install TiUP
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

# Start a local cluster with TiFlash
tiup playground --db 1 --pd 1 --kv 1 --tiflash 1

# Connect using MySQL client
mysql --host 127.0.0.1 --port 4000 -u root
Enter fullscreen mode Exit fullscreen mode

This gives you a fully functional TiDB cluster on your laptop for development.

Database Schema for AI Memory

Here is a production-ready schema for AI memory in TiDB. This schema takes advantage of TiDB-specific features.

-- Create database
CREATE DATABASE ai_memory;
USE ai_memory;

-- Main memory table
CREATE TABLE memories (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT NOT NULL,
    session_id VARCHAR(255),

    -- Content and embedding
    content TEXT NOT NULL,
    embedding VECTOR(1536) COMMENT "hnsw(distance=cosine)",

    -- Classification
    memory_type ENUM('episodic', 'semantic', 'working') NOT NULL,
    importance FLOAT DEFAULT 0.5,
    confidence FLOAT DEFAULT 1.0,

    -- Temporal tracking
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_accessed TIMESTAMP NULL,
    access_count INT DEFAULT 0,
    expires_at TIMESTAMP NULL,

    -- Metadata (JSON for flexibility)
    metadata JSON,

    -- Memory relationships
    source_memory_ids JSON COMMENT 'IDs of memories that support this one',

    -- Soft deletion
    is_deleted BOOLEAN DEFAULT FALSE,

    -- Indexes
    INDEX idx_user_type (user_id, memory_type),
    INDEX idx_user_created (user_id, created_at),
    INDEX idx_expires (expires_at),
    INDEX idx_session (session_id),

    -- Vector index with HNSW for fast approximate search
    VECTOR INDEX idx_embedding (embedding)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

-- Memory access log for analytics
CREATE TABLE memory_access_log (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    memory_id BIGINT NOT NULL,
    user_id BIGINT NOT NULL,
    accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    retrieval_score FLOAT COMMENT 'Cosine similarity score',
    query_context TEXT COMMENT 'What query retrieved this memory',

    INDEX idx_memory (memory_id),
    INDEX idx_user_time (user_id, accessed_at),
    INDEX idx_accessed (accessed_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Enter fullscreen mode Exit fullscreen mode

Key TiDB-specific features used:

  1. VECTOR data type: Native support for embeddings up to 16,383 dimensions
  2. HNSW vector index: Fast approximate nearest neighbor search using Hierarchical Navigable Small World algorithm
  3. JSON metadata: Flexible storage for additional context without schema changes
  4. AUTO_INCREMENT: Works across distributed nodes (though IDs may have gaps)

Creating TiFlash Replicas for Analytics

TiFlash does not replicate data automatically. You tell TiDB which tables to replicate:

-- Enable TiFlash replica for memory analytics
ALTER TABLE memories SET TIFLASH REPLICA 1;

-- Check replication progress
SELECT 
    table_schema, 
    table_name, 
    replica_count, 
    available, 
    progress 
FROM information_schema.tiflash_replica 
WHERE table_schema = 'ai_memory';
Enter fullscreen mode Exit fullscreen mode

Once progress = 1 and available = 1, your memory data exists in both TiKV (row format) and TiFlash (columnar format). TiDB's optimizer automatically chooses which storage engine to use for each query.

Storing Memories in TiDB

Let me show you how to store memories with proper embedding generation and duplicate detection.

Complete Node.js Implementation

import mysql from 'mysql2/promise';
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

// Create TiDB connection pool
const pool = mysql.createPool({
  host: process.env.TIDB_HOST,
  port: parseInt(process.env.TIDB_PORT || '4000'),
  user: process.env.TIDB_USER,
  password: process.env.TIDB_PASSWORD,
  database: 'ai_memory',
  ssl: {
    minVersion: 'TLSv1.2',
    rejectUnauthorized: true
  },
  connectionLimit: 10
});

/**
 * Generate embedding for text using OpenAI
 */
async function generateEmbedding(text) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small', // 1536 dimensions
    input: text
  });

  return response.data[0].embedding;
}

/**
 * Extract memorable facts from a message
 */
async function extractMemorableFacts(message, userId) {
  const extraction = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `Extract important, long-term facts from the user's message.

        Rules:
        - Ignore pleasantries, confirmations, and generic responses
        - Extract specific preferences, attributes, or requirements
        - Each fact should be self-contained and meaningful
        - Classify each as episodic, semantic, or working memory
        - Assign importance score (0.0 to 1.0)

        Return JSON array:
        {
          "facts": [
            {
              "fact": "User prefers dark mode interfaces",
              "type": "semantic",
              "importance": 0.85
            }
          ]
        }`
      },
      { role: 'user', content: message }
    ],
    response_format: { type: 'json_object' }
  });

  const result = JSON.parse(extraction.choices[0].message.content);
  return result.facts || [];
}

/**
 * Store a memory in TiDB with duplicate detection
 */
async function storeMemory(userId, fact, sessionId = null) {
  const connection = await pool.getConnection();

  try {
    // Generate embedding
    const embedding = await generateEmbedding(fact.fact);
    const embeddingJson = JSON.stringify(embedding);

    // Check for similar existing memories (within 0.15 cosine distance)
    const [existingMemories] = await connection.query(`
      SELECT 
        id, 
        content, 
        confidence,
        VEC_COSINE_DISTANCE(embedding, ?) AS distance
      FROM memories
      WHERE user_id = ?
        AND memory_type = ?
        AND is_deleted = FALSE
        AND VEC_COSINE_DISTANCE(embedding, ?) < 0.15
      ORDER BY distance ASC
      LIMIT 1
    `, [embeddingJson, userId, fact.type, embeddingJson]);

    if (existingMemories.length > 0) {
      // Similar memory exists, reinforce it
      const existing = existingMemories[0];

      await connection.query(`
        UPDATE memories
        SET 
          last_accessed = CURRENT_TIMESTAMP,
          access_count = access_count + 1,
          confidence = LEAST(confidence + 0.1, 1.0),
          metadata = JSON_SET(
            COALESCE(metadata, '{}'),
            '$.last_reinforced',
            CURRENT_TIMESTAMP()
          )
        WHERE id = ?
      `, [existing.id]);

      return { 
        action: 'reinforced', 
        memoryId: existing.id,
        content: existing.content
      };
    }

    // No similar memory, create new one
    const [result] = await connection.query(`
      INSERT INTO memories (
        user_id,
        session_id,
        content,
        embedding,
        memory_type,
        importance,
        confidence,
        created_at,
        last_accessed
      ) VALUES (?, ?, ?, ?, ?, ?, 1.0, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
    `, [
      userId,
      sessionId,
      fact.fact,
      embeddingJson,
      fact.type,
      fact.importance
    ]);

    return {
      action: 'created',
      memoryId: result.insertId,
      content: fact.fact
    };

  } finally {
    connection.release();
  }
}

/**
 * Process a user message and store extracted memories
 */
async function processMessage(userId, message, sessionId = null) {
  // Extract facts worth remembering
  const facts = await extractMemorableFacts(message, userId);

  // Filter by importance threshold
  const importantFacts = facts.filter(f => f.importance > 0.6);

  // Store each fact
  const results = [];
  for (const fact of importantFacts) {
    const result = await storeMemory(userId, fact, sessionId);
    results.push(result);
  }

  return results;
}

// Example usage
const userId = 123;
const sessionId = 'sess_abc123';
const message = "I really prefer dark mode for all interfaces. I am currently working on a healthcare AI project focused on cardiology research.";

const stored = await processMessage(userId, message, sessionId);
console.log('Stored memories:', stored);
Enter fullscreen mode Exit fullscreen mode

What this code does:

  1. Uses GPT-4 to extract meaningful facts from user messages
  2. Generates 1536-dimensional embeddings using OpenAI's latest model
  3. Checks for duplicates using vector similarity search
  4. Either reinforces existing memories or creates new ones
  5. Handles TiDB connection pooling properly

Retrieving Memories: Smart Ranking

Retrieval is where TiDB's vector search shines. Here is how to build a production-quality retrieval system.

Retrieval with Scoring Formula

/**
 * Retrieve relevant memories with multi-factor scoring
 */
async function retrieveMemories(userId, queryText, options = {}) {
  const {
    memoryTypes = ['semantic', 'episodic'],
    limit = 10,
    recencyHalfLifeDays = 7, // Memories lose half their recency score every 7 days
    minImportance = 0.0
  } = options;

  const connection = await pool.getConnection();

  try {
    // Generate query embedding
    const queryEmbedding = await generateEmbedding(queryText);
    const embeddingJson = JSON.stringify(queryEmbedding);

    // Retrieve and score memories
    // Score formula:
    // - 60% semantic relevance (1 - cosine distance)
    // - 25% recency (exponential decay)
    // - 15% importance
    const [memories] = await connection.query(`
      SELECT 
        id,
        content,
        memory_type,
        importance,
        confidence,
        created_at,
        last_accessed,
        access_count,
        metadata,

        -- Calculate semantic similarity (1 - distance = similarity)
        (1 - VEC_COSINE_DISTANCE(embedding, ?)) AS semantic_similarity,

        -- Calculate recency score with exponential decay
        EXP(-TIMESTAMPDIFF(DAY, created_at, CURRENT_TIMESTAMP) / ?) AS recency_score,

        -- Combined weighted score
        (
          (1 - VEC_COSINE_DISTANCE(embedding, ?)) * 0.60 +
          EXP(-TIMESTAMPDIFF(DAY, created_at, CURRENT_TIMESTAMP) / ?) * 0.25 +
          importance * 0.15
        ) AS final_score

      FROM memories
      WHERE user_id = ?
        AND memory_type IN (?)
        AND is_deleted = FALSE
        AND (expires_at IS NULL OR expires_at > CURRENT_TIMESTAMP)
        AND importance >= ?
      ORDER BY final_score DESC
      LIMIT ?
    `, [
      embeddingJson,
      recencyHalfLifeDays,
      embeddingJson,
      recencyHalfLifeDays,
      userId,
      memoryTypes,
      minImportance,
      limit * 2 // Get extra for filtering
    ]);

    // Update access tracking for retrieved memories
    if (memories.length > 0) {
      const memoryIds = memories.map(m => m.id);

      // Batch update access tracking
      await connection.query(`
        UPDATE memories
        SET 
          last_accessed = CURRENT_TIMESTAMP,
          access_count = access_count + 1
        WHERE id IN (?)
      `, [memoryIds]);

      // Log access for analytics (optional)
      const accessLogs = memories.map(m => [
        m.id,
        userId,
        m.final_score,
        queryText.substring(0, 500) // Limit query context length
      ]);

      await connection.query(`
        INSERT INTO memory_access_log (memory_id, user_id, retrieval_score, query_context)
        VALUES ?
      `, [accessLogs]);
    }

    return memories.slice(0, limit);

  } finally {
    connection.release();
  }
}

// Example usage
const userId = 123;
const query = "What are my interface preferences?";

const relevantMemories = await retrieveMemories(userId, query, {
  memoryTypes: ['semantic'],
  limit: 5,
  recencyHalfLifeDays: 30
});

console.log('Retrieved memories:');
relevantMemories.forEach(m => {
  console.log(`- ${m.content} (score: ${m.final_score.toFixed(3)})`);
});
Enter fullscreen mode Exit fullscreen mode

Understanding the scoring formula:

The final_score combines three factors:

  1. Semantic similarity (60%): How related is this memory to the current query? Calculated as 1 - cosine_distance. A distance of 0.1 means 90% similarity.

  2. Recency (25%): How recent is this memory? Uses exponential decay with configurable half-life. Default is 7 days, meaning a 7-day-old memory has 50% recency score.

  3. Importance (15%): How important was this memory when created? Set during memory creation based on content analysis.

You can adjust these weights based on your application. A chatbot might increase recency weight (40%). A knowledge base might increase importance weight (30%).

Memory Analytics with TiFlash

This is where TiDB's HTAP architecture provides unique value. You can run heavy analytical queries on your memory data without impacting your application's performance.

Analyzing Memory Patterns

-- Force query to use TiFlash for analytics
-- TiDB usually auto-selects, but you can specify explicitly
SET SESSION tidb_isolation_read_engines = 'tiflash';

-- Memory growth over time
SELECT 
    DATE(created_at) AS date,
    memory_type,
    COUNT(*) AS memories_created,
    AVG(importance) AS avg_importance,
    AVG(confidence) AS avg_confidence
FROM memories
WHERE created_at > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 30 DAY)
    AND is_deleted = FALSE
GROUP BY DATE(created_at), memory_type
ORDER BY date DESC, memory_type;

-- Most accessed memories (cache candidate analysis)
SELECT 
    id,
    content,
    memory_type,
    access_count,
    DATEDIFF(CURRENT_TIMESTAMP, created_at) AS age_days,
    access_count / GREATEST(DATEDIFF(CURRENT_TIMESTAMP, created_at), 1) AS access_rate_per_day
FROM memories
WHERE user_id = 123
    AND is_deleted = FALSE
ORDER BY access_rate_per_day DESC
LIMIT 20;

-- Memory distribution by user
SELECT 
    user_id,
    COUNT(*) AS total_memories,
    SUM(CASE WHEN memory_type = 'episodic' THEN 1 ELSE 0 END) AS episodic_count,
    SUM(CASE WHEN memory_type = 'semantic' THEN 1 ELSE 0 END) AS semantic_count,
    SUM(CASE WHEN memory_type = 'working' THEN 1 ELSE 0 END) AS working_count,
    AVG(importance) AS avg_importance,
    MAX(created_at) AS last_memory_created
FROM memories
WHERE is_deleted = FALSE
GROUP BY user_id
ORDER BY total_memories DESC
LIMIT 100;

-- Memory consolidation candidates
-- Find clusters of similar episodic memories that should be consolidated into semantic facts
SELECT 
    m1.id AS memory_id_1,
    m2.id AS memory_id_2,
    m1.content AS content_1,
    m2.content AS content_2,
    VEC_COSINE_DISTANCE(m1.embedding, m2.embedding) AS distance,
    DATEDIFF(m2.created_at, m1.created_at) AS days_apart
FROM memories m1
JOIN memories m2 ON m1.user_id = m2.user_id AND m1.id < m2.id
WHERE m1.user_id = 123
    AND m1.memory_type = 'episodic'
    AND m2.memory_type = 'episodic'
    AND m1.is_deleted = FALSE
    AND m2.is_deleted = FALSE
    AND VEC_COSINE_DISTANCE(m1.embedding, m2.embedding) < 0.20
ORDER BY distance ASC
LIMIT 50;

-- Reset to auto-selection
SET SESSION tidb_isolation_read_engines = 'tikv,tidb,tiflash';
Enter fullscreen mode Exit fullscreen mode

These queries run on TiFlash (columnar storage) while your application continues writing and reading memories from TiKV (row storage) with zero performance impact. This is TiDB's HTAP capability in action.

Memory Consolidation: Episodic to Semantic

Over time, you accumulate many episodic memories. Consolidation transforms repeated patterns into semantic knowledge.

Automated Consolidation Pipeline

/**
 * Find clusters of similar episodic memories for consolidation
 */
async function findConsolidationCandidates(userId, similarityThreshold = 0.20) {
  const connection = await pool.getConnection();

  try {
    const [clusters] = await connection.query(`
      SELECT 
        m1.id AS id1,
        m2.id AS id2,
        m1.content AS content1,
        m2.content AS content2,
        m1.importance AS importance1,
        m2.importance AS importance2,
        VEC_COSINE_DISTANCE(m1.embedding, m2.embedding) AS distance
      FROM memories m1
      JOIN memories m2 ON m1.user_id = m2.user_id AND m1.id < m2.id
      WHERE m1.user_id = ?
        AND m1.memory_type = 'episodic'
        AND m2.memory_type = 'episodic'
        AND m1.is_deleted = FALSE
        AND m2.is_deleted = FALSE
        AND m1.created_at > DATE_SUB(CURRENT_TIMESTAMP, INTERVAL 90 DAY)
        AND VEC_COSINE_DISTANCE(m1.embedding, m2.embedding) < ?
      ORDER BY distance ASC
      LIMIT 100
    `, [userId, similarityThreshold]);

    // Group into consolidation clusters
    const processed = new Set();
    const consolidationGroups = [];

    for (const pair of clusters) {
      if (processed.has(pair.id1) || processed.has(pair.id2)) continue;

      consolidationGroups.push({
        memberIds: [pair.id1, pair.id2],
        contents: [pair.content1, pair.content2],
        avgImportance: (pair.importance1 + pair.importance2) / 2
      });

      processed.add(pair.id1);
      processed.add(pair.id2);
    }

    return consolidationGroups;

  } finally {
    connection.release();
  }
}

/**
 * Consolidate a group of episodic memories into semantic knowledge
 */
async function consolidateMemoryGroup(userId, group) {
  const connection = await pool.getConnection();

  try {
    await connection.beginTransaction();

    // Use LLM to synthesize semantic fact from episodes
    const synthesis = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [
        {
          role: 'system',
          content: `You are consolidating multiple related episodic memories into a single semantic fact.

          Given these related memories:
          ${group.contents.map((c, i) => `${i + 1}. ${c}`).join('\n')}

          Synthesize a single, concise semantic fact that captures the core information.
          Be specific but general enough to apply across contexts.

          Return JSON:
          {
            "fact": "...",
            "confidence": 0.9
          }`
        }
      ],
      response_format: { type: 'json_object' }
    });

    const result = JSON.parse(synthesis.choices[0].message.content);

    // Generate embedding for consolidated fact
    const embedding = await generateEmbedding(result.fact);
    const embeddingJson = JSON.stringify(embedding);

    // Create semantic memory
    const [insertResult] = await connection.query(`
      INSERT INTO memories (
        user_id,
        content,
        embedding,
        memory_type,
        importance,
        confidence,
        source_memory_ids,
        created_at,
        last_accessed
      ) VALUES (?, ?, ?, 'semantic', ?, ?, ?, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
    `, [
      userId,
      result.fact,
      embeddingJson,
      group.avgImportance,
      result.confidence,
      JSON.stringify(group.memberIds)
    ]);

    // Mark source episodes as consolidated (but keep for audit trail)
    await connection.query(`
      UPDATE memories
      SET metadata = JSON_SET(
        COALESCE(metadata, '{}'),
        '$.consolidated',
        TRUE,
        '$.consolidated_into',
        ?
      )
      WHERE id IN (?)
    `, [insertResult.insertId, group.memberIds]);

    await connection.commit();

    return {
      semanticMemoryId: insertResult.insertId,
      fact: result.fact,
      sourceCount: group.memberIds.length
    };

  } catch (error) {
    await connection.rollback();
    throw error;
  } finally {
    connection.release();
  }
}

/**
 * Run full consolidation pipeline for a user
 */
async function consolidateUserMemories(userId) {
  console.log(`Starting consolidation for user ${userId}`);

  const candidates = await findConsolidationCandidates(userId);
  console.log(`Found ${candidates.length} consolidation candidates`);

  const results = [];
  for (const group of candidates) {
    const result = await consolidateMemoryGroup(userId, group);
    results.push(result);

    console.log(`Consolidated: ${result.fact}`);
  }

  return results;
}

// Run consolidation (could be scheduled daily/weekly)
await consolidateUserMemories(123);
Enter fullscreen mode Exit fullscreen mode

Best practices for consolidation:

  1. Preserve source memories: Do not delete episodic memories immediately. Mark them as consolidated and keep for 30-90 days for audit trail.

  2. Run periodically: Daily or weekly consolidation prevents memory table bloat.

  3. Threshold tuning: Similarity threshold of 0.20 works for most cases. Lower (0.15) for stricter matching, higher (0.25) for looser matching.

  4. Transaction safety: Use transactions to ensure atomic consolidation (all or nothing).

Hot and Cold Memory Tiering with Placement Rules

TiDB's placement rules let you store recent memories on fast storage and archive old memories on cheaper storage.

Setting Up Storage Tiers

First, label your TiKV nodes with performance tiers:

# TiKV configuration for hot storage nodes
server_configs:
  tikv:
    server.labels:
      performance: "high"
      tier: "hot"

# TiKV configuration for cold storage nodes
server_configs:
  tikv:
    server.labels:
      performance: "standard"
      tier: "cold"
Enter fullscreen mode Exit fullscreen mode

Then create placement policies:

-- Policy for hot data (recent memories)
CREATE PLACEMENT POLICY hot_memory
  CONSTRAINTS='[+tier=hot]'
  FOLLOWERS=2;

-- Policy for cold data (archived memories)
CREATE PLACEMENT POLICY cold_memory
  CONSTRAINTS='[+tier=cold]'
  FOLLOWERS=1;

-- Apply to partitioned table
CREATE TABLE memories_partitioned (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT NOT NULL,
    content TEXT NOT NULL,
    embedding VECTOR(1536) COMMENT "hnsw(distance=cosine)",
    memory_type ENUM('episodic', 'semantic', 'working') NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    -- ... other columns

    INDEX idx_user_created (user_id, created_at),
    VECTOR INDEX idx_embedding (embedding)
)
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
    PARTITION p_recent VALUES LESS THAN (UNIX_TIMESTAMP('2026-04-01')) PLACEMENT POLICY hot_memory,
    PARTITION p_last_month VALUES LESS THAN (UNIX_TIMESTAMP('2026-03-01')) PLACEMENT POLICY cold_memory,
    PARTITION p_archive VALUES LESS THAN (MAXVALUE) PLACEMENT POLICY cold_memory
);
Enter fullscreen mode Exit fullscreen mode

This setup:

  • Stores recent memories (last 30 days) on high-performance NVMe nodes
  • Archives older memories on standard SSD nodes
  • Reduces storage costs while maintaining fast access to active memories

You can add new partitions monthly and reorganize as data ages.

Scaling Memory Systems with TiDB

When your memory table grows beyond a single node's capacity, TiDB scales horizontally.

How TiDB Handles Growth

TiDB automatically shards data into Regions (96MB by default). As your table grows:

  1. Automatic splitting: When a Region reaches 96MB, TiDB splits it into two Regions
  2. Load balancing: PD redistributes Regions across TiKV nodes for even load
  3. Transparent to application: Your queries work the same whether you have 1GB or 100TB

Adding Capacity

TiDB Cloud:
Simply adjust node count in the UI. TiDB automatically rebalances data.

Self-Managed:

# Scale out TiKV for storage
tiup cluster scale-out my-cluster scale-out.yaml

# Scale out TiFlash for analytics
tiup cluster scale-out my-cluster tiflash-scale-out.yaml
Enter fullscreen mode Exit fullscreen mode

New nodes join the cluster and PD redistributes data automatically. No downtime required.

Monitoring Memory Growth

-- Check Region distribution across TiKV nodes
SELECT 
    store_id,
    address,
    COUNT(*) AS region_count,
    SUM(approximate_size) / 1024 / 1024 AS size_gb
FROM information_schema.tikv_region_status
GROUP BY store_id, address
ORDER BY region_count DESC;

-- Monitor memory table size
SELECT 
    table_schema,
    table_name,
    ROUND((data_length + index_length) / 1024 / 1024 / 1024, 2) AS size_gb,
    table_rows
FROM information_schema.tables
WHERE table_schema = 'ai_memory'
    AND table_name = 'memories';
Enter fullscreen mode Exit fullscreen mode

When you see uneven Region distribution or storage approaching 80% capacity, scale out.

Complete RAG Application with TiDB

Let me bring it all together with a complete Retrieval-Augmented Generation application.

import mysql from 'mysql2/promise';
import OpenAI from 'openai';

const openai = new OpenAI();
const pool = mysql.createPool({
  host: process.env.TIDB_HOST,
  port: 4000,
  user: process.env.TIDB_USER,
  password: process.env.TIDB_PASSWORD,
  database: 'ai_memory'
});

class RAGSystem {
  constructor(userId) {
    this.userId = userId;
  }

  /**
   * Chat with memory-enhanced AI
   */
  async chat(userMessage, sessionId = null) {
    // 1. Retrieve relevant memories
    const memories = await this.retrieveMemories(userMessage);

    // 2. Build context from memories
    const memoryContext = memories.length > 0
      ? `Here is what I remember about you:\n${memories.map(m => `- ${m.content}`).join('\n')}\n\n`
      : '';

    // 3. Generate response with memory context
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        {
          role: 'system',
          content: `You are a helpful AI assistant with memory.

          ${memoryContext}

          Use the provided memories to personalize your responses.
          Reference past interactions naturally when relevant.`
        },
        { role: 'user', content: userMessage }
      ]
    });

    const aiResponse = response.choices[0].message.content;

    // 4. Extract and store new memories (background)
    this.processMessage(userMessage, sessionId).catch(console.error);

    // 5. Store this interaction as episodic memory
    this.storeInteraction(userMessage, aiResponse, sessionId).catch(console.error);

    return aiResponse;
  }

  async retrieveMemories(queryText) {
    const connection = await pool.getConnection();

    try {
      const queryEmbedding = await this.generateEmbedding(queryText);
      const embeddingJson = JSON.stringify(queryEmbedding);

      const [memories] = await connection.query(`
        SELECT 
          content,
          memory_type,
          (1 - VEC_COSINE_DISTANCE(embedding, ?)) AS relevance,
          (
            (1 - VEC_COSINE_DISTANCE(embedding, ?)) * 0.60 +
            EXP(-TIMESTAMPDIFF(DAY, created_at, CURRENT_TIMESTAMP) / 7.0) * 0.25 +
            importance * 0.15
          ) AS score
        FROM memories
        WHERE user_id = ?
          AND memory_type IN ('semantic', 'episodic')
          AND is_deleted = FALSE
          AND (expires_at IS NULL OR expires_at > CURRENT_TIMESTAMP)
        ORDER BY score DESC
        LIMIT 10
      `, [embeddingJson, embeddingJson, this.userId]);

      if (memories.length > 0) {
        const memoryIds = memories.map(m => m.id);
        await connection.query(`
          UPDATE memories
          SET last_accessed = CURRENT_TIMESTAMP,
              access_count = access_count + 1
          WHERE id IN (?)
        `, [memoryIds]);
      }

      return memories;

    } finally {
      connection.release();
    }
  }

  async processMessage(message, sessionId) {
    const connection = await pool.getConnection();

    try {
      // Extract facts using LLM
      const extraction = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: [
          {
            role: 'system',
            content: `Extract memorable facts from the user message.
            Return JSON: {"facts": [{"fact": "...", "type": "semantic|episodic", "importance": 0.8}]}`
          },
          { role: 'user', content: message }
        ],
        response_format: { type: 'json_object' }
      });

      const result = JSON.parse(extraction.choices[0].message.content);
      const facts = result.facts || [];

      // Store each important fact
      for (const fact of facts.filter(f => f.importance > 0.6)) {
        const embedding = await this.generateEmbedding(fact.fact);
        const embeddingJson = JSON.stringify(embedding);

        // Check for duplicates
        const [existing] = await connection.query(`
          SELECT id FROM memories
          WHERE user_id = ?
            AND memory_type = ?
            AND VEC_COSINE_DISTANCE(embedding, ?) < 0.15
          LIMIT 1
        `, [this.userId, fact.type, embeddingJson]);

        if (existing.length === 0) {
          await connection.query(`
            INSERT INTO memories (
              user_id, session_id, content, embedding,
              memory_type, importance
            ) VALUES (?, ?, ?, ?, ?, ?)
          `, [
            this.userId, sessionId, fact.fact, embeddingJson,
            fact.type, fact.importance
          ]);
        }
      }

    } finally {
      connection.release();
    }
  }

  async storeInteraction(userMessage, aiResponse, sessionId) {
    const connection = await pool.getConnection();

    try {
      const interaction = `User: ${userMessage}\nAssistant: ${aiResponse}`;
      const embedding = await this.generateEmbedding(interaction);

      await connection.query(`
        INSERT INTO memories (
          user_id, session_id, content, embedding,
          memory_type, importance
        ) VALUES (?, ?, ?, ?, 'episodic', 0.5)
      `, [this.userId, sessionId, interaction, JSON.stringify(embedding)]);

    } finally {
      connection.release();
    }
  }

  async generateEmbedding(text) {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: text
    });
    return response.data[0].embedding;
  }
}

// Usage
const rag = new RAGSystem(123);

console.log(await rag.chat("I am working on a healthcare AI project"));
// Later...
console.log(await rag.chat("What projects am I working on?"));
// AI remembers the healthcare project
Enter fullscreen mode Exit fullscreen mode

This complete example shows:

  • Memory-enhanced chat with context retrieval
  • Automatic memory extraction and storage
  • Interaction logging for episodic memory
  • Proper connection management
  • Background processing for performance

Performance Optimization

Let me share TiDB-specific optimizations for AI memory systems.

Query Optimization

-- Use EXPLAIN to check if vector index is used
EXPLAIN SELECT content
FROM memories
WHERE user_id = 123
ORDER BY VEC_COSINE_DISTANCE(embedding, '[0.1, 0.2, ...]')
LIMIT 10;

-- Output should show VectorIndexScan
-- If not, check that:
-- 1. Vector index exists
-- 2. Query uses same distance metric as index
-- 3. TiDB version is 8.4.0+
Enter fullscreen mode Exit fullscreen mode

Index Tuning for Vector Search

-- Drop and recreate vector index with custom HNSW parameters
ALTER TABLE memories DROP INDEX idx_embedding;

-- Create optimized HNSW index
ALTER TABLE memories ADD VECTOR INDEX idx_embedding_optimized (embedding)
USING HNSW
COMMENT 'hnsw(distance=cosine, m=32, efconstruction=400)';

-- Parameters:
-- m: connections per layer (default 16, higher = more accurate but slower build)
-- efconstruction: build-time search depth (default 200, higher = better recall)
Enter fullscreen mode Exit fullscreen mode

Batch Operations

// Bad: Individual inserts
for (const memory of memories) {
  await db.query('INSERT INTO memories VALUES (?, ...)', [memory]);
}

// Good: Batch insert
const values = memories.map(m => [m.userId, m.content, m.embedding]);
await db.query('INSERT INTO memories (user_id, content, embedding) VALUES ?', [values]);
Enter fullscreen mode Exit fullscreen mode

Connection Pooling

Always use connection pools in production:

const pool = mysql.createPool({
  host: process.env.TIDB_HOST,
  port: 4000,
  user: process.env.TIDB_USER,
  password: process.env.TIDB_PASSWORD,
  database: 'ai_memory',
  connectionLimit: 20, // Adjust based on load
  queueLimit: 0,
  waitForConnections: true
});
Enter fullscreen mode Exit fullscreen mode

When TiDB Makes Sense (and When It Does Not)

Let me be honest about when you should and should not use TiDB for AI memory.

Use TiDB When:

Your data exceeds single-node capacity: Memory tables approaching 100GB+ benefit from horizontal scaling.

You need HTAP: Running analytics on memory usage patterns alongside transactional queries is a killer feature.

You want consolidation: Tired of managing Pinecone + Postgres + Snowflake + sync logic. TiDB combines them.

You are already MySQL-based: Zero learning curve. Your team's MySQL knowledge transfers directly.

You need strong consistency: TiDB provides ACID transactions across distributed nodes.

Consider Alternatives When:

Small dataset (under 10GB): Postgres with pgvector is simpler and cheaper.

Pure vector search only: If you have no relational data, specialized vector databases like Pinecone might be faster.

Embedded applications: SQLite with sqlite-vec works better for edge/mobile deployments.

Budget constraints for small scale: TiDB Cloud has a generous free tier, but for tiny projects, managed Postgres is cheaper.

You do not need analytics: If you never run analytical queries on your memory data, TiDB's HTAP capability is wasted.

Next Steps

You now have complete knowledge of building AI memory systems with TiDB. Here is how to move forward:

1. Try TiDB Cloud: Sign up for a free TiDB Cloud Serverless cluster at tidbcloud.com. Deploy the schema from this article and test with your data.

2. Read the foundation: If you skipped it, read AI Memory Systems: Everything You Need to Know to understand memory types and patterns deeply.

3. Explore TiDB docs: The official documentation at docs.pingcap.com covers advanced topics like multi-region deployment and security.

4. Join the community: TiDB's Slack community is active and helpful for architecture questions.

5. Start small, scale later: Begin with a single TiDB node. Add TiFlash when you need analytics. Scale out when data grows. TiDB's architecture supports this evolution path.

The future of AI applications is not managing multiple specialized databases. It is using one scalable database that does multiple jobs well. TiDB provides that foundation.

Now go build something that remembers.

If you find this article helpful, share it with others and give me a follow on X https://x.com/codewithveek.

Top comments (0)