Marko Mijailović for FLXBL

Posted on Mar 4

I Built an AI Knowledge Base in a Weekend. FLXBL Did the Heavy Lifting.

#ai #flxbl #graph #vectordatabase

TL;DR: I took FLXBL's new FILE fields, VECTOR fields, and vector search for a spin by building an AI-powered personal knowledge base called BrainLinks. Upload a PDF, and AI extracts concepts, links them to your existing knowledge, and lets you search semantically across everything you've ever saved. The interesting part? Combining vector similarity with graph traversal—a query pattern that's native in a graph database but requires duct tape and prayer in a relational one. The project is open source, so you can try it yourself.

GitHub: github.com/flxbl-dev/brainlinks

The Idea: Obsidian Meets Semantic Search

I've been using Obsidian for years. The graph view—where you see your notes connected through backlinks—is one of those features that makes you think differently about your knowledge. You notice clusters. You spot gaps. You discover connections you didn't know existed.

But Obsidian's graph is manual. You create the links. You decide which concepts connect. And the search? It's keyword-based. If you wrote about "consensus algorithms" in one note and "distributed agreement protocols" in another, Obsidian has no idea they're about the same thing.

So I built BrainLinks: a knowledge base where AI handles the linking and the search understands meaning, not just keywords. Upload a document, and the system extracts concepts, generates embeddings, discovers relationships to your existing knowledge, and builds a graph you can explore visually.

Think Obsidian's graph view meets Notion meets semantic search—except the graph builds itself.

Why This Project Exists (The FLXBL Angle)

If you've read my previous posts, you know I built a Kevin Bacon path-finder and a graph-native CMS on FLXBL. Both showed that graph databases excel when relationships matter. But I kept hearing the same question:

"Cool, but can it do AI stuff?"

Fair question. I'd just shipped two new features in FLXBL—FILE fields (store and serve files directly) and VECTOR fields (store embeddings and run similarity search)—and I needed a project that exercised everything: files, vectors, graph traversal, relationship properties, and the TypeScript SDK. A knowledge base was the perfect storm.

The hypothesis was simple: knowledge is inherently graph-shaped, and combining vector search with graph traversal should be strictly more powerful than vector search alone. BrainLinks is the proof (or the counter-proof—I'm an engineer, not a salesman).

The Schema: 4 Entities, 5 Relationships, Zero Junction Tables

Entity	Purpose	Special Fields
Document	Uploaded files (PDFs, markdown, plain text)	`embedding` (VECTOR, 4096 dims)
Concept	AI-extracted ideas and topics	`embedding` (VECTOR, 4096 dims)
Tag	Organizational labels (AI-suggested + manual)	`color`
Collection	User-created groupings	—

Relationship	Direction	Edge Properties
`MENTIONS`	Document → Concept	`relevanceScore` (0–1)
`RELATED_TO`	Concept → Concept	`strength` (0–1)
`TAGGED_WITH`	Document → Tag	—
`CHILD_OF`	Tag → Tag	—
`IN_COLLECTION`	Document → Collection	—

That's it. Four entities. Five relationships. In a relational database, you'd have at least 9 tables (4 entity tables + 5 junction tables), and two of those junction tables would need extra columns for relevanceScore and strength. In FLXBL, those properties just live on the edges.

But the real power isn't the schema simplicity—it's what you can do with it.

The Upload Pipeline: From PDF to Knowledge Graph

When you upload a document to BrainLinks, a lot happens behind the scenes. Here's the full pipeline:

📄 PDF Upload
   ↓
📝 Text Extraction (pdf-parse)
   ↓
🧮 Embedding Generation (Ollama qwen3-embedding:8b)
   ↓
📦 Document Created in FLXBL (with VECTOR field)
   ↓
🧠 Concept Extraction (Ollama qwen3:8b, JSON mode)
   ↓
🔍 For each concept:
   ├── Generate concept embedding
   ├── Vector search existing concepts (threshold ≥ 0.92)
   ├── Reuse match OR create new Concept
   └── Create MENTIONS edge with relevanceScore
   ↓
🔗 Link related concepts (RELATED_TO edges, strength > 0.7)
   ↓
🏷️ AI suggests tags → create Tags + TAGGED_WITH edges

The interesting design decision here is step 6: concept deduplication via vector similarity. If you upload a paper about "machine learning" and you already have a concept called "ML algorithms," the system recognizes these are the same idea (similarity ≥ 0.92) and reuses the existing concept node. This means your knowledge graph converges—upload 50 documents about different aspects of ML, and they all cluster around the same concept nodes.

Here's the concept matching logic:

for (const extracted of concepts) {
  // Generate embedding for concept name
  const conceptEmbedding = await generateEmbedding(extracted.name);

  // Search existing concepts by vector similarity
  const matches = await flxbl.vectorSearch<Concept>('Concept', {
    field: 'embedding',
    vector: conceptEmbedding,
    topK: 3,
    select: ['name', 'description'],
  });

  let conceptId: string;

  if (matches.length > 0 && matches[0].score >= 0.92) {
    // Reuse existing concept — the graph converges
    conceptId = matches[0].id;
  } else {
    // New concept — the graph grows
    const newConcept = await flxbl.concepts.create({
      name: extracted.name,
      description: extracted.description,
      embedding: conceptEmbedding,
    });
    conceptId = newConcept.id;
  }

  // Create MENTIONS edge with relevance score
  await flxbl.relationships('Document', docId)
    .create('MENTIONS', conceptId, {
      relevanceScore: extracted.relevanceScore,
    });
}

Notice how relevanceScore rides on the edge, not in a separate table. When I later ask "which concepts does this document mention, and how relevant are they?" — that's a single traversal, not a JOIN.

After concept linking, the system also auto-discovers relationships between concepts:

async function linkRelatedConcepts(
  flxbl: FlxblClient,
  conceptId: string,
  embedding: number[]
): Promise<void> {
  const similar = await flxbl.vectorSearch<Concept>('Concept', {
    field: 'embedding',
    vector: embedding,
    topK: 10,
  });

  for (const match of similar) {
    if (match.id === conceptId) continue; // skip self
    if (match.score < 0.7) continue;      // too dissimilar

    await flxbl.relationships('Concept', conceptId)
      .create('RELATED_TO', match.id, {
        strength: match.score,
      });
  }
}

This is where the graph starts doing things a vector database alone can't. After uploading 20 documents, you don't just have 20 embeddings in a flat list—you have a connected knowledge graph where concepts link to each other with measured strength, and documents connect through those concepts.

The Search: Where Graph + Vector Gets Interesting

BrainLinks supports three search modes, and the differences tell a story about why architecture matters.

Mode 1: Semantic Search (What Pinecone Does)

Embed the query, find the nearest document vectors. Standard RAG stuff:

async function semanticSearch(
  flxbl: FlxblClient,
  queryEmbedding: number[],
  limit: number
): Promise<SearchResult[]> {
  return flxbl.vectorSearch<Document>('Document', {
    field: 'embedding',
    vector: queryEmbedding,
    topK: limit,
    select: ['title', 'content', 'type', 'uploadedAt'],
  });
}

This works fine for direct matches. Search "machine learning," find documents about machine learning. Nothing surprising.

Mode 2: Concept Search (What Only a Graph Can Do)

Here's where it gets interesting. Instead of searching documents directly, we search concepts and then traverse the graph to find documents:

async function conceptSearch(
  flxbl: FlxblClient,
  queryEmbedding: number[],
  limit: number
): Promise<SearchResult[]> {
  // Find concepts similar to the query
  const concepts = await flxbl.vectorSearch<Concept>('Concept', {
    field: 'embedding',
    vector: queryEmbedding,
    topK: 20,
  });

  // For each concept, traverse inbound MENTIONS to find documents
  const docScores = new Map<string, { doc: Document; score: number; concepts: string[] }>();

  for (const concept of concepts) {
    const rels = await flxbl.relationships('Concept', concept.id)
      .list('MENTIONS', 'in'); // Documents that mention this concept

    for (const rel of rels) {
      const existing = docScores.get(rel.source.id);
      if (existing) {
        // Document matches MULTIPLE concepts — boost its score
        existing.score += concept.score * rel.properties.relevanceScore;
        existing.concepts.push(concept.name);
      } else {
        docScores.set(rel.source.id, {
          doc: rel.source,
          score: concept.score * rel.properties.relevanceScore,
          concepts: [concept.name],
        });
      }
    }
  }

  // Sort by accumulated score
  return [...docScores.values()]
    .sort((a, b) => b.score - a.score)
    .slice(0, limit);
}

Why does this matter? Because concept search finds documents that don't directly match your query vector but are conceptually related.

If you search for "consensus algorithms for distributed databases," semantic search will find documents where those exact concepts appear in the text. Concept search will also find your document about Raft, your notes on Paxos, and that paper about Byzantine fault tolerance—because the AI extracted "distributed consensus" as a concept from all of them, and those concept nodes are connected in the graph.

This is the query that degrades exponentially in SQL: "find me all documents that share concepts with documents about X." In a graph, it's just... traversal.

Mode 3: Combined Search (The Killer Feature)

Run both searches in parallel, merge the results, and boost documents that appear in both:

async function combinedSearch(
  flxbl: FlxblClient,
  queryEmbedding: number[],
  limit: number
): Promise<SearchResult[]> {
  const [semantic, concept] = await Promise.all([
    semanticSearch(flxbl, queryEmbedding, limit),
    conceptSearch(flxbl, queryEmbedding, limit),
  ]);

  return mergeResults(semantic, concept, limit);
}

function mergeResults(semantic, concept, limit) {
  const merged = new Map();

  for (const r of semantic) merged.set(r.id, { ...r, source: 'semantic' });

  for (const r of concept) {
    const existing = merged.get(r.id);
    if (existing) {
      // Appears in BOTH — boost score
      existing.score = (existing.score + r.score) / 2 + 0.05;
      existing.matchedConcepts = r.matchedConcepts;
    } else {
      merged.set(r.id, { ...r, source: 'concept' });
    }
  }

  return [...merged.values()]
    .sort((a, b) => b.score - a.score)
    .slice(0, limit);
}

And then there's the related documents feature. After finding your search results, BrainLinks extracts all concepts mentioned by those documents and finds other documents that share those concepts:

async function computeRelatedDocs(
  flxbl: FlxblClient,
  resultIds: string[],
  resultConcepts: string[]
): Promise<RelatedDocument[]> {
  const relatedDocs = new Map();

  for (const conceptId of resultConcepts) {
    // Find ALL documents mentioning this concept
    const rels = await flxbl.relationships('Concept', conceptId)
      .list('MENTIONS', 'in');

    for (const rel of rels) {
      if (resultIds.includes(rel.source.id)) continue; // skip already in results

      const existing = relatedDocs.get(rel.source.id);
      if (existing) {
        existing.sharedConceptCount++;
      } else {
        relatedDocs.set(rel.source.id, {
          ...rel.source,
          sharedConceptCount: 1,
        });
      }
    }
  }

  return [...relatedDocs.values()]
    .sort((a, b) => b.sharedConceptCount - a.sharedConceptCount)
    .slice(0, 5);
}

"Show me documents related to my search results through shared concepts" is a multi-hop graph traversal. In SQL with pgvector, you'd need a recursive CTE joining a vector similarity subquery with a junction table. It's possible—but it's the kind of query that makes your ORM file for divorce.

The Honest Comparison: FLXBL vs. The Alternatives

I'm not going to pretend FLXBL is better at everything. Let's be real about the tradeoffs.

Supabase + pgvector

Supabase is excellent. pgvector is genuinely impressive for what it is—vector similarity search bolted onto Postgres. For a straightforward RAG pipeline (embed documents, search by similarity, return results), it's perfectly fine and probably the pragmatic choice for most teams.

Where it falls short for this use case:

Capability	Supabase + pgvector	FLXBL
Vector similarity search	✅ Native (pgvector)	✅ Native (VECTOR fields)
Store embeddings	✅ vector column type	✅ VECTOR field type
Relationship properties	⚠️ Junction tables with extra columns	✅ Native edge properties
Multi-hop traversal	⚠️ Recursive CTEs	✅ Native graph traversal
"Documents sharing concepts"	🔴 Complex self-joins	✅ Single traversal
Concept deduplication	⚠️ Manual via SQL + pgvector	✅ Vector search on Concept entity
Combined vector + graph query	🔴 Two queries + app-level merge with JOINs	✅ Vector search + traversal in one flow
Schema migrations	⚠️ Migration files	✅ Schema publish, instant API
Raw SQL access	✅ Full access	🔴 No direct Cypher

The BrainLinks concept-search pattern — "embed query → find similar concepts → traverse MENTIONS edges → find documents → boost by shared concept count" — is 5 lines of FLXBL SDK calls. In Supabase, you'd write something like:

WITH similar_concepts AS (
  SELECT id, name, 1 - (embedding <=> $query_embedding) as similarity
  FROM concepts
  ORDER BY embedding <=> $query_embedding
  LIMIT 20
),
mentioned_docs AS (
  SELECT
    dm.document_id,
    sc.name as concept_name,
    sc.similarity * dm.relevance_score as weighted_score
  FROM similar_concepts sc
  JOIN document_mentions dm ON dm.concept_id = sc.id
),
ranked AS (
  SELECT
    document_id,
    SUM(weighted_score) as total_score,
    ARRAY_AGG(concept_name) as matched_concepts,
    COUNT(*) as concept_count
  FROM mentioned_docs
  GROUP BY document_id
)
SELECT d.*, r.total_score, r.matched_concepts
FROM ranked r
JOIN documents d ON d.id = r.document_id
ORDER BY r.total_score DESC
LIMIT 10;

That's a perfectly valid query. It works. But it's a lot more to reason about than vectorSearch + relationships().list(). And the moment you want to go one more hop ("find documents related to documents that share concepts with my query")—you're adding another CTE layer and your DBA is asking questions.

Pinecone / Weaviate / Qdrant (Standalone Vector DBs)

Dedicated vector databases are fantastic at one thing: similarity search at scale. If you're building a RAG pipeline with millions of document chunks and need sub-millisecond recall, Pinecone is probably where you should be.

But here's the thing: a vector database gives you a flat list of results. There's no concept of "these two documents are related through shared concepts" because there are no concepts—there are just vectors and metadata filters.

BrainLinks' entire value proposition—the concept graph, the multi-hop discovery, the "related documents" feature—doesn't exist in a pure vector database. You'd need to bolt on a separate database (Postgres? Neo4j?) to store the relationships, and now you're managing two data systems instead of one.

FLXBL gives you vectors and graph in the same system, queried through the same SDK, governed by the same schema.

LangChain / LlamaIndex + Postgres

These frameworks are great for prototyping RAG pipelines, and they have "knowledge graph" modules. But in my experience, their graph implementations tend to be thin wrappers around triple stores or in-memory NetworkX graphs. They're not backed by a production graph database, so you hit walls when you need real graph queries, persistence, or scale.

BrainLinks uses FLXBL as the graph and vector store, with the AI models (Ollama/qwen3) handling extraction and embedding. The framework isn't managing the graph—the database is. That's a meaningful architectural difference.

The Knowledge Graph Visualization

The graph visualization uses react-force-graph to render an interactive force-directed layout:

// Node styling by entity type
const NODE_COLORS = {
  document: '#3B82F6',  // Blue
  concept:  '#10B981',  // Emerald
  tag:      '#F59E0B',  // Amber
  collection: '#8B5CF6', // Violet
};

// Edge properties drive visual weight
function getEdgeWeight(edge: GraphEdge): number {
  if (edge.label === 'MENTIONS') return edge.weight; // relevanceScore
  if (edge.label === 'RELATED_TO') return edge.weight; // strength
  return 0.5; // default for TAGGED_WITH, etc.
}

The API endpoint supports both full-graph queries (up to 100 nodes per entity type) and local subgraph exploration centered on a specific entity. Click a concept node, and you see all the documents that mention it and all the concepts it relates to—a local neighborhood in the knowledge graph.

The RELATED_TO edges render as dashed lines (to distinguish structural relationships from semantic ones), and edge thickness scales with the strength property. You can immediately see which concept connections are strong vs. tenuous.

Proving It: 41 E2E Tests Back the Architecture

I don't just claim graph + vector is better — I wrote 41 Playwright E2E tests that prove it. Not unit tests mocking a database. End-to-end tests that upload real documents, run the full AI pipeline, and assert structural properties of the resulting knowledge graph.

The test suite splits into two halves. The 31 functional tests cover what you'd expect: navigation, upload flow, search modes, graph rendering, document management, collections. Standard stuff. The interesting part is the 10 data-science tests that verify the architectural claims this entire blog post is built on.

Here's what they prove:

Claim	Test
Concept deduplication converges the graph	Upload 6 diverse docs → MENTIONS edges exceed unique concept count
Concepts bridge documents naturally	At least one concept connects 2+ distinct documents
Edge properties are native	Every MENTIONS edge carries `relevanceScore` as weight (0–1), no join table
Concept search uses graph traversal	Results include `matchedConcepts` array — proof of multi-hop query
Combined search synthesizes both modes	Combined results contain docs from both semantic and concept result sets
Subgraph queries work	`?around={docId}&depth=2` returns a focused neighborhood, not the whole graph

The 6 test documents are intentionally diverse — neural networks, philosophy of mind, renewable energy, climate science, internet history, and graph databases — with just enough conceptual overlap to stress-test deduplication. When "neural networks" appears in a paper about AI and in a paper about brain science, the system should recognize them as the same concept. The tests verify it does.

The most telling test is the concept bridge assertion. It fetches the full graph, builds a map of which documents mention which concepts, and asserts that at least one concept is shared across multiple documents:

test('concept nodes act as cross-document bridges', async ({ request }) => {
  const graph = await request.get('/api/graph').then(r => r.json());

  // Map: conceptId -> Set of document IDs that mention it
  const conceptToDocuments = new Map<string, Set<string>>();
  for (const edge of graph.edges) {
    if (edge.label === 'MENTIONS') {
      if (!conceptToDocuments.has(edge.target))
        conceptToDocuments.set(edge.target, new Set());
      conceptToDocuments.get(edge.target)!.add(edge.source);
    }
  }

  // At least one concept must bridge 2+ documents
  const bridges = [...conceptToDocuments.values()].filter(docs => docs.size >= 2);
  expect(bridges.length).toBeGreaterThan(0);
});

If that test passes, the graph isn't just a flat collection of disconnected document-concept pairs. It's a connected knowledge structure where concepts act as bridges — exactly the property that makes graph traversal useful for search. And it does pass. Every time.

Running It Locally (With Ollama)

One deliberate choice: BrainLinks uses Ollama with local models instead of the OpenAI API. The embedding model is qwen3-embedding:8b (4096 dimensions) and concept extraction uses qwen3:8b with JSON output mode.

Why? Two reasons. First, running locally means zero API costs and no data leaving your machine—important for a personal knowledge base. Second, it demonstrates that FLXBL's VECTOR fields are model-agnostic. You're not locked into OpenAI's 1536-dimension embeddings or anyone else's. Swap in whatever model fits your use case, set the dimensions in your schema, and FLXBL handles the rest.

To get started:

# Install Ollama and pull models
ollama pull qwen3-embedding:8b
ollama pull qwen3:8b

# Clone and configure
git clone https://github.com/flxbl-dev/brainlinks
cd brainlinks
cp .env.example .env.local
# Add your FLXBL API key and Ollama config

# Install and run
npm install
npx flxbl generate   # Generate typed client from schema
npm run dev

Upload a PDF, watch the pipeline extract concepts and build relationships, then search semantically across your knowledge base. The graph visualization shows your knowledge taking shape in real time.

The Tech Stack

Layer	Technology	Why
Framework	Next.js 16 (App Router)	Server Components, API routes, React 19
Styling	Tailwind CSS 4	Utility-first, dark mode from day one
Graph Visualization	react-force-graph	Interactive, performant, customizable
File Upload	react-dropzone	Drag-and-drop with validation
PDF Extraction	pdf-parse v2	Reliable text extraction
Embeddings	Ollama (qwen3-embedding:8b)	Local, free, model-agnostic
Concept Extraction	Ollama (qwen3:8b)	JSON mode for structured output
Backend	FLXBL	Graph-native BaaS with FILE + VECTOR fields
Type Generation	@flxbl-dev/cli	Typed client from schema, zero manual types

What I Learned

Building BrainLinks confirmed a few things I suspected and taught me a few things I didn't expect:

Confirmed: Graph traversal + vector search is more powerful than vector search alone. The concept-mediated discovery genuinely surfaces documents that pure similarity search misses. It's not a gimmick—it's a qualitative improvement in search quality.

Confirmed: Relationship properties eliminate a class of data modeling problems. relevanceScore on MENTIONS edges and strength on RELATED_TO edges are natural, queryable, and don't require extra tables.

Learned: Concept deduplication is the design challenge. The 0.92 similarity threshold for reusing existing concepts is a magic number I tuned by trial and error. Too low and you merge unrelated concepts. Too high and you get duplicates. There's probably a smarter approach involving clustering, but the simple threshold works surprisingly well.

Learned: Eventual consistency matters. After creating a document and its relationships, I wait 500ms before querying because Neo4j (under the hood) needs a moment to index the new edges. Not a dealbreaker, but something to account for in your pipeline design.

Learned: The FLXBL TypeScript SDK + codegen is genuinely pleasant. Defining the schema, running npx flxbl generate, and getting typed clients with autocomplete for entities and relationships saved me from an entire category of bugs. When I fat-fingered a relationship name, the compiler caught it.

Try It Yourself

GitHub: github.com/flxbl-dev/brainlinks
FLXBL Platform: platform.flxbl.dev
FLXBL Docs: flxbl.dev/docs

The repo includes the complete upload pipeline, three search modes, graph visualization, collection management, and a dashboard—all built on 4 entities and 5 relationships.

If you're building anything where knowledge, content, or entities are connected by meaningful relationships—and especially if you want to combine semantic search with structural discovery—give FLXBL a look. The graph-native approach isn't just a different way to store data. It's a different way to think about data. And once you start thinking in graphs, junction tables start looking like what they are: a workaround.

What's the most relationship-heavy AI project you've built? Did you end up fighting your database to make it work? I'd love to hear about it in the comments.

Marko Mijailović is the creator of FLXBL. You can find him on LinkedIn, reach out through email, or join the FLXBL Discord.

DEV Community