A deep dive into using Neo4j, Beam Search, and Betweenness Centrality for intelligent team assembly
The Problem: The Bus Factor Crisis
Every software team lives with an invisible risk: the Bus Factor—the minimum number of people who, if they left tomorrow, would bring your project to its knees.
The research is sobering:
- 50% of open-source projects have Bus Factor ≤ 2 (Avelino et al., 2016)
- Developer turnover increases defects by 40-60% (Foucault et al., 2015)
Traditional HR systems track skills but miss structural dependencies. Someone might be a "communication bridge" between the frontend and infrastructure teams without appearing in any report. When they leave, two teams that used to collaborate seamlessly suddenly can't talk to each other.
I built SmartChimera to solve this: a graph-based system that detects organizational linchpins and forms resilient teams. Here's how.
Architecture Overview
SmartChimera is a full-stack application built with Neo4j, FastAPI, and React. The architecture follows a modular design where each component has a single responsibility:
The system runs as containerized services with Docker Compose:
# docker-compose.yml
services:
neo4j:
image: neo4j:5.14.0
ports:
- "7474:7474" # Browser UI
- "7687:7687" # Bolt protocol
backend:
build: ./backend
depends_on:
neo4j:
condition: service_healthy
command: uvicorn app.main:app --host 0.0.0.0 --port 8000
frontend:
build:
context: ./frontend
dockerfile: Dockerfile.dev
ports:
- "5173:5173"
Challenge 1: Detecting Organizational Linchpins
The Problem
Traditional HR systems track skills but miss structural dependencies. Someone might be a "communication bridge" between teams without appearing on any org chart. When they leave, two teams that used to collaborate seamlessly suddenly can't.
The Solution: Hybrid Risk Metric
We combine two complementary signals:
- Betweenness Centrality (BC) — Network topology: who bridges teams
- Project Weight (PW) — Workload concentration: who's overloaded
# linchpin_detector.py - REAL CODE
def compute_combined_risk_score(self) -> Dict[str, float]:
"""
Risk(v) = α · BC_normalized(v) + β · PW(v)/max(PW)
α = β = 0.5 for balanced detection
"""
network_bc = self.compute_betweenness_centrality() # Brandes via NetworkX
project_scores = self._compute_project_dependency_score()
final_scores = {}
all_ids = set(network_bc.keys()) | set(project_scores.keys())
for eid in all_ids:
net_score = network_bc.get(eid, 0.0)
proj_score = project_scores.get(eid, 0.0)
# Weighted unification: 50% Network, 50% Project Weight
final_scores[eid] = (net_score * 0.5) + (proj_score * 0.5)
return final_scores
The Betweenness Centrality computation uses the Brandes algorithm via NetworkX, which runs in O(VE) time—significantly better than the naive O(V³) approach:
def compute_betweenness_centrality(self) -> Dict[str, float]:
"""Compute Brandes BC combined with synthetic BC scores."""
import networkx as nx
G = nx.Graph()
with self.driver.session() as s:
for r in s.run("MATCH (e:Empleado) RETURN e.id as id"):
G.add_node(r['id'])
for r in s.run("MATCH (a:Empleado)-[:TRABAJO_CON]-(b:Empleado) "
"WHERE a.id < b.id RETURN a.id as s, b.id as d"):
G.add_edge(r['s'], r['d'])
brandes_bc = nx.betweenness_centrality(G, normalized=True)
# Combine with pre-computed synthetic BC (use max of both)
...
return self._bc_cache
Why This Matters
This approach catches TWO types of linchpins:
- Social Hubs: High BC, low projects — communication bottlenecks
- Workhorses: Low BC, high projects — overloaded specialists
Both are organizational risks, but they require different mitigation strategies.
Challenge 2: Beam Search with Multi-Objective Optimization
The Problem
Forming an optimal team from N candidates is NP-hard. If you have 100 candidates and need a team of 5, that's C(100,5) = 75+ million combinations to evaluate. Exhaustive search simply doesn't scale.
The Solution: Beam Search
We maintain the top-W partial solutions at each step, pruning aggressively. This gives us O(k × n × W) complexity instead of exponential—polynomial time for a real-world approximation.
# smart_team_formation.py - REAL CODE
BEAM_WIDTH = 10
# State: (team_list, team_ids, covered_skills, score)
beam = [([], set(), set(), 0.0)]
for step in range(k): # k = team size
candidates_pool = []
for (curr_team, curr_ids, curr_covered, curr_score) in beam:
for candidate in candidates:
if candidate['id'] in curr_ids:
continue
c_skills = get_skills(candidate)
# Multi-objective scoring
coverage_score = len(new_skills) * weights['skill_coverage']
depth_score = get_depth(candidate, required) * weights['skill_depth']
collab_score = get_collab_edges(driver, candidate['id'], curr_ids) * weights['collaboration']
redundancy_score = len(overlap_skills) * weights['redundancy']
bc_penalty = bc_score * weights['bc_penalty'] # Penalize linchpins!
total = coverage_score + depth_score + collab_score + redundancy_score - bc_penalty
candidates_pool.append((new_team, new_ids, new_covered, total))
# Prune to top-W
candidates_pool.sort(key=lambda x: x[3], reverse=True)
beam = candidates_pool[:BEAM_WIDTH]
The Result
Our benchmarks show:
- Beam width 10 achieves ~98% of optimal quality with significant speedup
- Response time: <500ms per recommendation on 150-node graphs
- Memory: Constant O(W × k) space regardless of candidate pool size
Challenge 3: Context-Aware Formation with Mission Profiles
The Problem
A team for "legacy maintenance" needs completely different traits than one for "R&D innovation". One-size-fits-all doesn't work in the real world.
The Solution: 9 Configurable Mission Profiles
Each profile adjusts the weight coefficients in our multi-objective function:
# mission_profiles.py - REAL CODE
MISSION_PROFILES = {
'mantenimiento': {
'name': 'Mantenimiento Crítico (Resilient)',
'description': 'Maximum stability. Penalizes risk and demands redundancy.',
'weights': {
'skill_coverage': 2.0,
'skill_depth': 1.0, # Stability > Brilliance
'collaboration': 2.0,
'redundancy': 5.0, # CRITICAL: Must have backups
'bc_penalty': 20.0 # VETO: No Linchpins allowed
}
},
'innovacion': {
'name': 'I+D / Deep Tech (Growth)',
'description': 'Prioritize technical geniuses. Accept Bus Factor risk.',
'weights': {
'skill_coverage': 1.0,
'skill_depth': 10.0, # CRITICAL: Only experts
'collaboration': 0.5,
'redundancy': 0.0,
'bc_penalty': -5.0 # BONUS: We WANT linchpins (they're the experts!)
}
},
'entrega_rapida': {
'name': 'Speed Squad (Agile)',
'description': 'Teams that already know each other. Maximize prior collaboration.',
'weights': {
'skill_coverage': 2.0,
'collaboration': 10.0, # CRITICAL: Must have worked together
'availability': 4.0, # Must be free NOW
'bc_penalty': 0.0
}
},
# ... 6 more profiles including:
# - legacy_rescue (SRE mode)
# - junior_training (Skill development)
# - crisis_response (Firefighting)
# - architecture_review (Seek linchpins for strategic decisions)
# - security_audit (Maximum paranoia and redundancy)
# - cloud_migration (Broad technology coverage)
}
Why This Matters
The same algorithm produces completely different teams based on strategic context:
- Maintenance mode: Stable, redundant, no single points of failure
- Innovation mode: Expert-heavy, accepts risk for maximum capability
- Speed Squad: Prioritizes teams with prior collaboration history
Notice how the bc_penalty weight can be negative: for architecture reviews, we deliberately seek linchpins because they hold institutional knowledge. Sometimes you want your best people on the job, risk be damned.
Lessons Learned
1. Validate Your Metrics Rigorously
Our initial Bus Factor metric was calculated as 1 - avg_BC. When the algorithm itself minimizes BC, this became a tautology: we were measuring success by the very thing we optimized!
We caught this through rigorous statistical validation with N=500 Monte Carlo simulations. The lesson: always have independent validation metrics that measure outcomes your algorithm doesn't directly optimize.
2. Graph Databases Enable New Questions
With Neo4j, queries like "who bridges the frontend and backend teams?" become a single Cypher query:
MATCH (a:Empleado)-[:PERTENECE]->(t1:Team {name: 'frontend'})
MATCH (b:Empleado)-[:PERTENECE]->(t2:Team {name: 'backend'})
MATCH path = (a)-[:TRABAJO_CON*..3]-(b)
WITH nodes(path) as bridge_candidates
UNWIND bridge_candidates as person
RETURN person.nombre, count(*) as bridge_frequency
ORDER BY bridge_frequency DESC
Try doing that with join-heavy SQL. Graph structures make relationship-centric queries trivial.
3. Heuristics Beat ML for Transparency
We could have trained a neural network to predict "good teams." But HR decisions require explainability. Managers need to understand why a team was recommended.
Beam Search with explicit weights gives us full auditability: "This candidate was selected because they add 2 new skills, have worked with 3 team members before, and have low Bus Factor risk." That's a conversation you can have with a VP. A neural network output isn't.
Results
| Metric | Value |
|---|---|
| Dataset | 150-node organizational graph |
| Response time | <500ms per recommendation |
| Validation | N=500 Monte Carlo simulations |
| Algorithm | Beam Search O(k × n × W) |
| Mission Profiles | 9 configurable strategies |
| Stack | Neo4j + FastAPI + React + TypeScript |
| Recognition | 🏆 2nd Place - UNSA Engineering Project Fair 2025 |
What's Next?
SmartChimera is available on GitHub. Future plans include:
- Reinforcement Learning for automatic weight optimization based on team outcomes
- Temporal graphs to model evolving collaboration patterns over time
- Privacy-preserving federated analysis for multi-organization deployment
If you're building HR tech with graph algorithms, working on organizational analytics, or just interested in applied graph theory—let's connect!
Santiago Palma — Universidad Nacional de San Agustín, Perú
Tags: #graphs #neo4j #python #fastapi #react #algorithms #opensource #hrtech
📚 Research References:
- Avelino, G. et al. (2016). "A Novel Approach for Estimating Truck Factors" - ICPC
- Foucault, M. et al. (2015). "Impact of Developer Turnover on Quality" - FSE
- Brandes, U. (2001). "A Faster Algorithm for Betweenness Centrality" - Journal of Mathematical Sociology





Top comments (0)