Rizwan Saleem

Posted on Jun 4

Building a Developer-Focused Internal Knowledge Graph to Accelerate Onboarding and Maintenance

#react #typescript #frontend #webdev

Building a Developer-Focused Internal Knowledge Graph to Accelerate Onboarding and Maintenance

Onboarding new developers and keeping a growing team aligned can feel like sentence-mining through scattered docs, wiki pages, and vague memories. A lightweight internal knowledge graph (IKG) gives you a practical, scalable way to connect people, concepts, code, and processes. It helps newcomers discover how systems actually work, surfaces ownership and dependencies, and reduces context-switching during critical tasks. This tutorial walks you through designing, implementing, and maintaining an actionable IKG tailored for developer workflows.

What an internal knowledge graph is for developers

An IKG is a directed, labeled graph where nodes represent entities (people, services, components, concepts, tickets, docs) and edges represent relationships (owns, depends-on, documents, implements, reviews). Unlike a flat wiki, a graph emphasizes provenance, discoverability, and traversal. Typical benefits:

Rapid onboarding: new hires can traverse from services to responsibilities to code paths.
Improved incident response: trace a fault to the responsible owner and the implicated subsystems.
Better architecture understanding: reveal cross-team dependencies and data flows.
Documentation living in practice: link code, tests, runbooks, and diagrams.

Key principles:

Start small with high-value domains (core services, deployment pipelines, critical bugs).
Prioritize explicit ownership and versioned artifacts.
Make it searchable, navigable, and graph-visualizable.
Automate data ingestion from source of truth (code repos, issue trackers, docs). ### Step 1: Define your scope and entities

Decide what to model first. A focused scope makes the graph useful quickly.

Core services: service name, owner, team, language, repository URL, deployment pipeline, API spec URL.
Code artifacts: modules, packages, libraries, version constraints, tests.
People and roles: engineers, tech leads, SREs, product owners, on-call rotations.
Projects and platforms: microservices, data pipelines, CI/CD, observability stack.
Runbooks and docs: where to find troubleshooting steps, incident playbooks, runbooks, architectural decisions.
Issues and tasks: tickets, owners, status, linked PRs, post-incident reviews.

Suggested entity types:

Person
Service
Repository
Module
Environment (dev/staging/prod)
Incident
Runbook
Document
Issue/Ticket
Decision/ADR (Architecture Decision Record)

Key relationships:

owns(Person) -> Service
depends_on(Service, Service)
resides_in(Service, Environment)
authored_by(Document, Person)
implemented_by(Module, Person)
lives_in(Repository, Module)
references(Incident, Service)
linked_to(Incident, Runbook)
decision_of(ADR, Service)

Keep a simple, human-readable schema first, then evolve it. Use a lightweight graph database or a flexible store (more on tech options below).

Step 2: Choose a practical data model and storage

You don’t need a heavy graph DB to start. A well-structured set of interconnected CSVs, a JSONL store, or a small graph database works fine.

Options:

Lightweight: a local SQLite with a simple graph-like schema, or a JSON database (e.g., NeDB, lowdb) for rapid iteration.
Flexible graph DB: Tuned for queries like “what services depend on this,” or “who owns this module?” Examples include Neo4j, Nebula Graph, or Dgraph.
Documentation-first approach: store entities as Markdown or MDX files with front-matter linking relationships, enhanced by a lightweight search layer.

Example simple schema (entity types and key fields):

Person: id, name, email, role, teams
Service: id, name, owner_id, language, repo_url
Module: id, name, version, service_id
Incident: id, title, service_id, severity, oncall_ids, runbook_id
Runbook: id, title, content_path
Document: id, title, type ( ADR, README, design ), authored_by_id, path
ADR: id, title, decision, date, affected_services

Edges can be represented as separate relation tables or embedded in the document as pointers. In a relational setup, you’ll have join tables like service_owners(service_id, person_id), service_dependencies(service_id, depends_on_service_id), incident_runbooks(incident_id, runbook_id), etc.

Step 3: Establish a data ingestion pipeline

Automate data capture from sources teams already use. This reduces drift and keeps the IKG current.

Source control: pull repository names, owners, languages, and last commit dates from Git hosting services via their APIs.
Issue/incident trackers: import issues, incidents, PRs, and owners. Tag important issues as ADRs or design decisions.
Runbooks and docs: watch a docs repo or a dedicated folder in your codebase; extract metadata with front-matter.
People: reflect rotations and on-call schedules from your incident system or HR tooling.

Recommended approach:

Create a simple ingestion service (a small Python or Node app) that runs on a schedule and pushes updates to your store.
Use idempotent operations: upsert entities, avoid duplicates.
Maintain provenance: store last_updated timestamps and source-of-truth metadata.

Example ingestion snippet (conceptual):

For each repository:
- Upsert Service by name, owner_id from People if present.
- Upsert Module with service_id linkage.
- Upsert Incident if linked in issues; attach oncall_ids and runbook_id. ### Step 4: Build useful queries and views

Enable developers to discover value quickly. Start with a few practical queries and build outward.

Key queries:

What services does Service A depend on, and who owns them?
Who is the on-call owner for incidents affecting Service B?
Where is the ADR for a major architectural decision, and which services are affected?
Which modules are unversioned or have outdated dependencies?

Views to ship:

Ownership map: a dashboard showing service → owner → team.
Dependency graph: a visualization of service-to-service dependencies for a given namespace.
Incident continuity: a runbook-linked view showing incident history and corresponding owners.
ADR index: list of decisions by service with linked documents and dates.

If you’re using a graph DB, Cypher/GraphQL- or REST-based queries work well. For relational stores, write SQL joins that join the various relation tables.

Step 5: Visualize without overwhelming

A graph visualization helps newcomers see the landscape, but avoid making it a wall of nodes.

Start with a curated subset: the core services and their direct dependencies.
Use filters: by team, by environment, by criticality.
Provide on-click details: clicking a node reveals owner, repo, runbooks, and recent incidents.

Tooling ideas:

Simple web UI with D3.js or vis-network for interactive graphs.
A lightweight GraphQL API to power the UI and support ad-hoc explorations.
Optional: integrate with the company’s existing dashboards (Grafana, Kibana) for a unified experience. ### Step 6: Enrich with practices and governance

A useful IKG isn’t just data; it enforces good practices.

Ownership discipline: always associate a service or component with an owner and team.
ADRs and runbooks: tie decisions and procedures to their contexts; require updates when ownership or dependencies change.
Incident hygiene: link incidents to the runbooks and post-incident reviews, and note improvements.
Documentation etiquette: require short descriptions for new entities and clear paths to primary sources.

Governance tips:

Implement a lightweight review process for new entities (PR-like flow).
Schedule quarterly audits to prune stale data (old ADRs, outdated owners).
Define a policy: if an entity hasn’t been touched in 6-12 months, flag for review. ### Step 7: Practical starter implementation (minimal viable system)

A pragmatic MVP you can deploy in a single afternoon.

Stack:
- Storage: PostgreSQL (or SQLite for local testing)
- API: Node.js with Express or FastAPI
- Graph layer (optional): Neo4j driver or a simple relational schema with adjacency tables
- Frontend: minimal React app or static site with search
Core tables (PostgreSQL example):
- persons(id, name, email, role, team)
- services(id, name, owner_id references persons(id), language, repo_url)
- modules(id, name, version, service_id references services(id))
- incidents(id, title, service_id references services(id), severity, oncall_ids text[], runbook_id)
- runbooks(id, title, content_path)
- documents(id, title, type, author_id, path)
- adrs(id, title, decision, date, service_id)
Relation tables:
- service_dependencies(service_id, depends_on_service_id)
- incident_runbooks(incident_id, runbook_id)
- document_authors(document_id, author_id)
Basic API endpoints:
- GET /services?owner=&dep-on=
- GET /services/:id/graph to fetch nodes/edges for visualization
- GET /incidents?service_id=
- POST/PUT /entities to upsert data (idempotent)
Simple frontend:
- A search bar to query entities by name
- A service card showing owner, language, repo
- A small dependency graph panel for a selected service
- Quick links to runbooks and ADRs

Code snippet: upserting a service (pseudo-Node/Knex style)

Upsert service with owner:
- If owner not found in persons, create placeholder person with a note to fill in later
- Insert or update service with owner_id, language, repo_url
- Upsert dependencies in service_dependencies

Sample SQL-ish sketch:

INSERT INTO persons (name, email, role, team) VALUES ('Alice Chen','alice@example.com','Senior Engineer','Platform') ON CONFLICT (email) DO UPDATE SET name=EXCLUDED.name;
WITH owner AS ( SELECT id FROM persons WHERE email='alice@example.com' )
INSERT INTO services (name, owner_id, language, repo_url)
VALUES ('AuthService', (SELECT id FROM owner), 'Go', 'https://git.example.com/auth/api')
ON CONFLICT (name) DO UPDATE SET owner_id=EXCLUDED.owner_id, repo_url=EXCLUDED.repo_url;
INSERT INTO service_dependencies (service_id, depends_on_service_id)
VALUES ((SELECT id FROM services WHERE name='AuthService'), (SELECT id FROM services WHERE name='UserService'));

This MVP is intentionally simple but immediately useful. You can scale later with a full graph DB and richer queries.

Step 8: Collaboration and adoption

Start with a small team champion: pick a couple of teams whose dependencies are high-impact and get their buy-in.
Encourage “documentation-as-code”: link every new architectural decision, incident, and runbook to the IKG entry.
Make it visible: host a lightweight dashboard in your internal portal and promote quick wins like onboarding a new engineer in under 15 minutes.

Onboarding tip:

Create a guided tour that starts from a common task (e.g., “deploy a feature to production”) and exposes the required services, owners, and runbooks in one flow.

Step 9: Maintenance and evolution
Automate updates: schedule nightly syncs from repos, incident trackers, and docs.
Review cadence: quarterly data quality checks; quarterly ADR refresh.
Expand gradually: add data lineage for data pipelines, security policies, and tenancy boundaries as needed.

Common pitfalls:

Overcomplicating the schema too early.
Treating the IKG as just a data sink; it must drive workflows.
Stale ownership; ensure owners are aware of changes and can claim responsibility.

Example starter roadmap
Week 1: Define entities, set up storage, implement ingestion for two core services, create basic UI.
Week 2: Add incidents and runbooks, ADRs, and a dependency view; publish onboarding guide.
Week 3: Integrate with CI/CD to reflect deployment environments; enable search across docs and code.
Week 4: Roll out governance policy and conduct first data quality audit.

Illustration (conceptual): envision a core services hub node connected to owners, modules, and runbooks; adjacent nodes show dependencies and incidents, with ADRs attached to guide future changes.
If you’d like, I can tailor the MVP to your exact stack (e.g., PostgreSQL vs SQLite, GraphDB vs relational, React vs Svelte) and provide concrete starter code you can drop into a repo. Would you prefer a Python-based ingestion layer with FastAPI and PostgreSQL, or a Node.js/TypeScript stack with GraphQL?

Rizwan Saleem | https://rizwansaleem.co