DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

Building a Lightweight Developer Knowledge Graph to Accelerate Onboarding and Troubleshooting

Building a Lightweight Developer Knowledge Graph to Accelerate Onboarding and Troubleshooting

Building a Lightweight Developer Knowledge Graph to Accelerate Onboarding and Troubleshooting

Onboarding new engineers and speeding up debugging sessions often hinges on access to the right knowledge, not just code. A lightweight developer knowledge graph (KG) is a practical, low-friction approach to organize tribal knowledge, project context, and common decision rationales so teammates spend less time digging and more time delivering. This tutorial shows how to design, implement, and operate a small but powerful KG that fits inside your existing tooling stack.

Why a knowledge graph for developers

  • Centralizes tacit knowledge (why decisions were made, trade-offs, edge cases).
  • Accelerates onboarding; new hires can infer context faster.
  • Improves incident response by linking symptoms to probable causes and remedies.
  • Encourages consistent terminology and architecture understanding across teams.
  • Scales with convenience: starts small (markdown or notes) and grows into a queryable graph.

Key ideas:

  • Represent entities (components, services, commands, people, decisions) as nodes.
  • Capture relationships (depends_on, owner_of, uses, observed_issue, decision_rivot) as edges.
  • Use simple, accessible storage so adoption is friction-free.

    Core design

  • Entities

    • Component: a service, library, or module.
    • Person: team member or role.
    • Decision: rationale for a choice (e.g., tech stack, architecture).
    • Incident: a real-world failure or outage.
    • Documentation: guides, runbooks, FAQs.
    • Tool: CI/CD tool, monitoring system, local dev utility.
  • Relationships

    • Component depends_on Component
    • Component owned_by Person
    • Decision maps_to_Component
    • Incident affects Component
    • Incident linked_to_Team
    • Documentation describes Component or Incident
    • Tool_used_by Component or Person
  • Provenance and quality

    • Source: where the information came from (PR, meeting notes, doc link)
    • Confidence: a quick rating (0-1) to signal how vetted the entry is
    • Updated_at: timestamp for freshness
  • Storage strategy

    • Start with a local, file-based graph (YAML/JSON) or a lightweight graph database (SQLite with a graph extension, or Nebula, or a simple in-memory graph exported as JSON).
    • Optional: publish a read-only copy to a wiki or docs site for broader visibility. ### Getting started: a minimal schema

A simple JSON-based schema that covers the basics:

  • Node types: Component, Person, Decision, Incident, Documentation, Tool

  • Edges (relationships):

    • "depends_on": Component -> Component
    • "owned_by": Component -> Person
    • "decision_for": Decision -> Component
    • "mitigated_by": Incident -> Documentation
    • "observed_in": Incident -> Documentation
    • "uses": Component -> Tool
    • "describes": Documentation -> Component
    • "related_to": Incident -> Component (optional broader link)
    • "source": Documentation/Decision/Incident -> string (URL or note)
    • "confidence": Documentation/Decision/Incident -> number (0..1)

Example entry (compact):

{
"type": "Component",
"id": "auth-service",
"name": "Authentication Service",
"attributes": {
"language": "Go",
"version": "v1.2.3"
}
"relationships": {
"owned_by": "alice",
"depends_on": ["db-service", "redis-cache"],
"uses": ["jwt-lib","logging-lib"]
}
}

This keeps the data human-friendly and easy to edit without specialized tooling.

Step-by-step: build a starter KG locally

1) Pick a storage format

  • Option A: Markdown-based graph
    • Pros: easiest to edit, universal
    • Cons: harder to query; limited validation
  • Option B: JSON/YAML files
    • Pros: simple tooling, easy to version control
    • Cons: you’ll write small utilities to navigate
  • Option C: Lightweight graph database (SQLite with a small graph layer, Neo4j Desktop for local exploration)
    • Pros: powerful queries, relations
    • Cons: setup overhead

For a first-timer, start with JSON files and a tiny CLI to query.

2) Define a small repository layout

  • kg/
    • entities.json
    • edges.json
    • docs/
    • onboarding.md
    • runbooks/
    • tools/
    • README.md

3) Create the data model in code (TypeScript example)

  • Install node if you don’t have it.
  • Create a minimal graph library to read entities and edges, and query by component.

Code sketch (TypeScript, Node.js):

// kg/index.ts
type ID = string;

type Entity =
| { type: "Component"; id: ID; name: string; attributes?: Record }
| { type: "Person"; id: ID; name: string; role?: string }
| { type: "Decision"; id: ID; summary: string; confidence?: number }
| { type: "Incident"; id: ID; summary: string; severity?: string };

type Edge = { from: ID; to: ID; type: string };

class KG {
private entities: Entity[] = [];
private edges: Edge[] = [];

load(entities: Entity[], edges: Edge[]) {
this.entities = entities;
this.edges = edges;
}

findComponent(id: ID) {
return this.entities.find(e => e.type === "Component" && e.id === id);
}

// simple query: get all components that a given component depends on
dependsOn(componentId: ID): ID[] {
return this.edges.filter(e => e.from === componentId && e.type === "depends_on").map(e => e.to);
}

// find owner
owner(componentId: ID): ID | undefined {
const edge = this.edges.find(e => e.from === componentId && e.type === "owned_by");
return edge?.to;
}

// pretty print a quick report
report(componentId: ID): string {
const comp = this.findComponent(componentId) as any;
if (!comp) return Component ${componentId} not found;
const ownerId = this.owner(componentId);
const owner = this.entities.find(e => e.type === "Person" && e.id === ownerId) as any;
const deps = this.dependsOn(componentId);
return [
Component: ${comp.name} (${comp.id}),
owner ? Owner: ${owner.name} : "Owner: unknown",
Depends on: ${deps.join(", ") || "none"}
].join(" | ");
}
}

export type { KG, Entity, Edge };

  • This is intentionally minimal. You’ll flesh out with more relationship types as needed.

4) Seed data

  • Create entities.json and edges.json with realistic entries.
  • Keep a CHANGELOG entry for each update to the KG.

5) Add a tiny CLI for onboarding queries

  • A command like: node kg/cli.js describe auth-service
  • Or: node kg/cli.js incident outage-42

CLI sketch (TypeScript pseudo):

// kg/cli.ts
import { KG } from "./index";

function main() {
// load from json files
const kg = new KG();
const entities = require("./data/entities.json");
const edges = require("./data/edges.json");
kg.load(entities, edges);
const cmd = process.argv;
const id = process.argv;
if (cmd === "describe" && id) {
console.log(kg.report(id));
} else {
console.log("Usage: describe ");
}
}
main();

6) Documentation and onboarding bias

  • Create a short onboarding guide that explains how to add a new entry:
    • Add a new Component: name, id, language, owner
    • Add relationships: depends_on, owned_by, uses
    • Add a Documentation entry that explains how to use the KG for that component
  • Encourage contributors to cite sources (PRs, runbooks, design docs).

    Practical workflows

  • Onboarding a new engineer

    • Step 1: Share the KG README and the onboarding section.
    • Step 2: Have the new hire pick a component they’ll touch and run the query: “describe ”.
    • Step 3: Update the KG with any missing context discovered during their first week.
  • Incident response

    • Step 1: Create an Incident node describing the outage.
    • Step 2: Link affected components via incident relationships.
    • Step 3: Attach or reference runbooks and dashboards in Documentation/Runbooks.
    • Step 4: After resolution, record mitigation steps and rationale in a Decision node if it indicates a permanent change.
  • Architecture reviews

    • Use the KG to map decisions to components and demonstrate dependencies.
    • Capture trade-offs in a Decision node and tie it to the involved Components. ### Examples you can copy-paste
  • Adding a new component

    • Entity: { "type": "Component", "id": "payment-service", "name": "Payment Service", "attributes": { "language": "Rust", "version": "v0.9" } }
  • Linking ownership

    • Edge: { "from": "payment-service", "to": "alice", "type": "owned_by" }
  • Recording a dependency

    • Edge: { "from": "payment-service", "to": "billing-db", "type": "depends_on" }
  • Documenting a runbook

    • Entity: { "type": "Documentation", "id": "rb_payment", "name": "Runbook: Payment failures", "description": "Steps to recover from payment processing errors", "notes": "...", "source": "https://example.com/runbooks/payment" }
    • Edge: { "from": "rb_payment", "to": "payment-service", "type": "describes" } ### Tools and enhancements over time
  • Lightweight UI

    • If you want a quick UI, render the KG in a static site or a tiny React app that reads the JSON data and visualizes nodes and edges with a library like D3 or Vis.js.
    • Quick filter and search by component name, owner, or tag.
  • Richer querying

    • Introduce a small graph query language or use a simple DSL:
    • Find all incidents affecting a component: Incident linked_to Component
    • Trace ownership: owned_by chain to person
  • Versioning and governance

    • Treat KG data as code: store in a dedicated branch, require PR reviews for updates.
    • Add basic tests to ensure relationships remain consistent (no missing owners, no circular dependencies in critical paths).
  • Automation hooks

    • CI hook: when a new incident document is added, automatically suggest related components based on dependencies.
    • PR template: require linking to a KG entry when introducing a new component or incident. ### Privacy and safety considerations
  • Keep sensitive data out of the KG. Do not store secrets, credentials, or PII in the KG entries.

  • Use access controls if the KG grows beyond a local repo. Consider read-only hosting for broader teams.

    Minimal viable product (MVP) checklist

  • [ ] A local data store for entities and edges (JSON file or small SQLite DB)

  • [ ] A CLI to read and describe a component

  • [ ] A starter dataset with 6-12 entities and 8-12 edges

  • [ ] A short onboarding guide showing how to add components and incidents

  • [ ] A README with usage examples and a simple governance approach

If you want, I can tailor the MVP to your stack (JavaScript/TypeScript, Python, or Go) and generate a ready-to-run project scaffold with sample data. Do you prefer a JSON-file approach for maximum simplicity, or a lightweight SQLite-backed graph for more powerful queries?

-

Rizwan Saleem | https://rizwansaleem.co

Sources

Top comments (0)