Introduction
One of the biggest problems in software teams is not writing code. Code eventually gets written, refactored, tested, and deployed. The real challenge, most of the time, is this:
"Why was this decision made?"
When a developer joins a project, they can't understand the work just by looking at the repository. They can see the code, but not the story behind it. Why was a service split this way? Why is an interface designed so oddly? Why does a test specifically check that edge case? Why has a file turned into something everyone is afraid to touch? The answers to these questions usually lie not in the code itself, but in the team's history.
That history is scattered across different tools:
- Jira issues
- GitHub Pull Requests
- Review comments
- Commit messages
- Branch names
- Incident records
- Release notes
- Sometimes Slack/Teams conversations
That's why, for a new developer, the learning process usually goes like this:
Look at the code → Find something you don't understand → Search Jira → Search PR → Search Slack → Ask the old developer → Repeat
This is team memory loss. And this loss costs time, causes errors, and exhausts people.
Section 1: Questions That Team Memory Should Answer
When a well-structured team memory system is in place, it should be able to answer questions like:
- Who changed this file the most?
- Who last touched this file?
- Which commits resolved this issue?
- Which PR was this change discussed in?
- Why has this component changed so frequently in the last 90 days?
- Has this bug occurred before?
- Which issues and PRs should a new developer read to learn the auth module?
- Who is the most suitable reviewer for a new PR?
- Which files carry technical risk?
- Which component is too dependent on a single person?
What these questions have in common: the answers are not in a single record. The answers are hidden in relationships.
For example, to answer "who knows the auth module?" it's not enough to just count commits. You need to look at all of these together:
- People who committed to auth files
- People who reviewed auth PRs
- People who commented on auth issues
- People who fixed auth bugs
- People who have been active recently
- People who made changes with large churn
- Files with revert or incident history
So team memory is essentially a relationship problem.
Section 2: Why an LLM-Free Architecture?
LLMs are powerful, but it's not always right to put an LLM at the core of every problem. For systems like team memory, the main requirements are:
- Accuracy
- Auditability
- Reproducibility
- Low cost
- Long-term maintainability
- Permission and privacy control
- Evidence-based response generation
Let me also add a personal note here. Even though I write a series about AI-free life on Dev.to, this time I specifically wanted to write something on the software engineering side without AI as well. Honestly, the motivation behind this article is partly to push myself outside of repetition while also giving you some food for thought: You can build quite useful, technically clean, and maintainable systems without tying every problem to an LLM.
Let me be even more direct: we're a bit tired of it. Constant AI, constant agents, constant RAG, constant prompts. These are certainly valuable topics, but sometimes you just want to see solid, old-school engineering. This article was written with exactly that motivation.
Why LLM-Free for Team Memory?
An LLM-centric approach carries several risks.
2.1 Hallucination Risk
An LLM might behave as if there's a relationship between an issue and a commit when no such relationship exists in the real system. Pointing to the wrong PR, showing the wrong person as an expert, or misinterpreting a past bug fix causes serious time loss.
In team memory, answers should not be "educated guesses." Answers must come with evidence.
2.2 Auditability Problem
If a system says "this file is risky," it should be able to explain why:
src/auth/token_service.py changed 18 times in the last 90 days.
5 of those changes are linked to bug fixes.
4 different developers have touched the file.
A race condition was discussed in the last two PRs.
The test file was not updated at the same rate.
This kind of answer is debatable, verifiable, and improvable. An LLM saying "it looked risky to me" doesn't deliver the same quality.
2.3 Cost and Latency
LLMs are not needed for questions like:
- Which commit resolved this issue?
- Who last touched this file?
- Which files did this PR change?
These are pure data queries. SQL or graph traversal solves them instantly.
2.4 Reproducibility
For team memory, the same question should always produce the same answer on the same data. LLM-based systems can give different answers each time. This is unacceptable for audit and debugging.
Section 3: Core Architecture
The foundation of the system is a memory store. This store holds the following:
- Git commit log
- Jira issues and comments
- GitHub PRs, reviews, and review comments
- File paths and components
- Developer identities
On top of this, agents query the memory store, score it, and produce explainable outputs.
The basic flow:
Jira / GitHub / Git
↓
Ingestion Layer
↓
Memory Store (relational + graph)
↓
Agents (ContextAgent, ExpertiseAgent, RiskAgent, ...)
↓
Explainable Output (CLI / API / Bot)
The Core Principle: Everything Is a Relationship
PROJ-1247 issue
→ linked to PR #382
→ resolved by commits f00ba47 and b91c0de
→ changed src/auth/token_service.py
→ contributed by Mehmet Turac and Ayşe Demir
→ reviewed by Burak Kaya
With this information, a new developer no longer has to search randomly.
Section 4: Classic Multi-Agent Logic
I'm not using the word "agent" in the LLM agent sense here. In this architecture, an agent is:
A small service with a specific task, which queries memory, makes rule-based decisions, and produces evidence-backed output.
So what we call an agent is not a bot running prompts. It's a perfectly classical software component.
ContextAgent
Extracts context for an issue, PR, or file.
ExpertiseAgent
Calculates the most knowledgeable people for a file or component.
RiskAgent
Finds risky files based on signals like high churn, bug fixes, and contributor spread.
ReviewRoutingAgent
Suggests suitable reviewer candidates for a new PR.
OnboardingAgent
For a new developer on a given component, lists the most valuable issues and PRs to read.
HygieneAgent
Reports data quality problems in the memory store.
Each agent works with a scoring and rule-based logic.
Section 5: Data Model
The minimum entity set for the first version is:
Developer
Repository
Issue
Commit
File
PullRequest
Review
IssueComment
Even with this model, a powerful memory system can be built.
Developer
A developer can appear with different identities across systems:
- Git author email
- GitHub username
- Jira account id
- Display name
These need to be linked to a single developer record.
Commit
Commits are among the most reliable events in the system. Hash, message, date, author, and changed files are stored.
File
Files should be stored not just as paths, but with component information.
For example:
src/auth/** → auth
src/payment/** → payment
infra/** → infra
Issue
Issues give us business context. Summary, status, priority, type, component, and timestamps are stored.
PullRequest
PRs show us how a change was discussed within the team. Reviewers, changed files, linked issues, and commits are among the key fields.
Section 6: Schema
CREATE TABLE developers (...);
CREATE TABLE repositories (...);
CREATE TABLE issues (...);
CREATE TABLE files (...);
CREATE TABLE commits (...);
CREATE TABLE commit_files (...);
CREATE TABLE commit_issues (...);
CREATE TABLE pull_requests (...);
CREATE TABLE pr_commits (...);
CREATE TABLE pr_files (...);
CREATE TABLE pr_issues (...);
CREATE TABLE reviews (...);
CREATE TABLE issue_comments (...);
These tables represent graph thinking in a relational model. Join tables like commit_files, commit_issues, pr_files, pr_issues serve as relationships.
Section 7: Agent Scores
Expertise Score
When finding an expert for a file, looking only at commit count can be misleading. So the score can be calculated as follows:
expertise_score =
commit_count * 10
+ review_count * 8
+ issue_comment_count * 2
+ churn / 20
+ recency_bonus
This score is not an absolute truth; it's a ranking signal. What matters is that the score is explainable.
Bad output:
Ayşe is an expert on this topic.
Good output:
Ayşe made 5 commits in this file recently, reviewed 3 PRs,
last activity was 2026-05-20, and total churn value is 320.
Risk Score
Explainable signals are needed for risk too:
risk_score =
churn
+ bug_count * 100
+ contributor_count * 25
+ commit_count * 5
This is a simple starting point. In production, signals like test coverage, incidents, revert commits, deployment failures, and code ownership can be added.
Section 8: Example Usage Scenario
A new developer picks up issue PROJ-1247.
They run this from the CLI:
teammemory issue-context PROJ-1247
The system produces:
Issue: PROJ-1247
Summary: Token refresh race condition
Status: In Progress
Priority: High
Component: auth
Related PRs:
- #382 Fix token refresh race condition [merged]
Commits:
- f00ba47 Mehmet Turac — PROJ-1247 guard token refresh with per-session lock
- b91c0de Ayşe Demir — PROJ-1247 add regression test for refresh race
Changed files:
- src/auth/token_service.py
- src/auth/session_manager.py
- tests/auth/test_token_refresh.py
People in context:
- Mehmet Turac
- Ayşe Demir
- Burak Kaya
This output was generated without an LLM. Because everything is based on relationships in the database.
Then the developer wants to see file experts:
teammemory file-experts src/auth/token_service.py
Output:
Experts for src/auth/token_service.py
1. Ayşe Demir — score 92.0
commits: 4, reviews: 2, comments: 1, churn: 430, last activity: 2026-05-20
2. Mehmet Turac — score 80.5
commits: 3, reviews: 1, comments: 2, churn: 390, last activity: 2026-05-18
This answer too is not a guess — it's a calculated signal.
Section 9: Data Hygiene
The success of this system depends on data quality. If commit messages don't contain issue keys, PR descriptions are empty, or issues aren't linked to the right components, the team memory stays incomplete.
That's why HygieneAgent is critically important.
What it reports:
- Commits that don't contain an issue key
- PRs not linked to an issue
- Empty PR descriptions
- Issues marked as Done but not linked to any commit
- Files missing component information
This report is not a blame tool — it's a tool for improving memory.
Section 10: Moving to Production
The demo runs with SQLite. The recommended structure for production:
PostgreSQL = raw event store, audit, checkpoint, agent outputs
Neo4j/AGE = relationship analysis and traversal
FastAPI = controlled access layer
CLI/Bot = developer workflow integration
Things to pay attention to in production:
- Incremental sync
- Webhook + scheduled backfill
- Idempotent ingestion
- Rate limit management
- Identity resolution
- Permission control
- Audit log
- Token security
- Repository-based access
Identity resolution is especially important. If the same person appears as mehmet@example.com in Git, mturac on GitHub, and Mehmet Turac in Jira, all of these need to be linked to a single developer record.
Section 11: Strengths of This Approach
- Fully auditable.
- Inexpensive.
- Produces the same answer to the same query on the same data.
- No LLM latency.
- No model dependency.
- No prompt brittleness.
- Data security is easier to control.
- Small agents are testable.
- Can be incrementally added to legacy projects.
- Instills engineering discipline in the team.
Section 12: Weaknesses
- No natural language querying.
- If data quality is poor, results degrade.
- Informal decision sources like Slack are left out of the first version.
- Initial identity matching is tedious.
- Score design requires care.
- If the reason for a decision isn't written in a commit or PR, the system can't know it.
These limitations are not flaws. On the contrary, they are the system's honesty. It doesn't make things up when it doesn't know.
Section 13: Roadmap
Phase 1 — Local Demo
- SQLite schema
- Seed data
- CLI ~~- ContextAgent
- ExpertiseAgent
- RiskAgent~~
Phase 2 — Real Git Ingestion
- Pulling commits from a local repo
- Fetching file changes
- Extracting Jira keys from commit messages
Phase 3 — Jira/GitHub Import
- Jira JSON import
- GitHub PR JSON import
- Review records
- PR-issue relationships
Phase 4 — API
- FastAPI endpoints
- Simple dashboard
- GitHub Action integration
Phase 5 — Production Memory
- PostgreSQL event store
- Neo4j graph projection
- Webhook sync
- Permission control
- Audit log
Conclusion
The main idea of this article is simple:
First build the data model correctly for team memory. Don't rush to LLMs.
Jira, GitHub, and Git already give us an incredibly valuable event history. If we correctly link this history, we can produce reliable answers to questions like:
- Who changed what?
- Why did they change it?
- Which issue was it related to?
- Which PR was it discussed in?
- Which files are risky?
- Which developer has current context in which area?
- Where should a new person start?
In this system, answers don't come with "the model thought so." Answers come from commit, issue, PR, and review records.
Sometimes the best engineering is not using the most impressive technology; it's correctly scoping the problem and building a simpler, more reliable, and more explainable solution.
And this repo is trying to show exactly that:
No LLM.
No RAG.
No prompt.
No embedding.
There is data.
There are relationships.
There are rules.
There is evidence.
TeamMemory LLM’siz
TeamMemory LLM’siz, yazılım ekipleri için Jira + GitHub + Git commit loglarından çalışan, tamamen deterministik bir takım hafızası örneğidir.
Bu repo özellikle şunu göstermek için hazırlandı:
Her takım hafızası problemi LLM, RAG, embedding, prompt veya agentic workflow gerektirmez. Bazen doğru veri modeli, iyi ingestion, sağlam sorgular ve küçük deterministik agent’lar daha güvenilir sonuç verir.
Bu örnekte LLM yoktur.
RAG yoktur.
Vector database yoktur.
Prompt yoktur.
Model çağrısı yoktur.
Bunun yerine:
- SQLite event/memory store
- Git commit ingestion
- Jira/GitHub JSON import
- Deterministik agent sınıfları
- CLI
- Opsiyonel FastAPI API
- Seed demo datası
- Kanıtlı çıktılar
vardır.
Hızlı başlangıç
cd teammemory-llmsiz
python -m venv .venv
source .venv/bin/activate
python -m pip install -e .[api,dev]
teammemory init-db --reset
teammemory seed
teammemory issue-context PROJ-1247
teammemory file-experts src/auth/token_service.py
teammemory component-risk auth
teammemory onboarding auth
teammemory review-suggest 382
teammemory hygiene
API çalıştırmak için:
uvicorn teammemory.api:app --reload
Örnek endpoint’ler:
curl http://127.0.0.1:8000/issues/PROJ-1247/context
curl "http://127.0.0.1:8000/files/experts?path=src/auth/token_service.py"
curl http://127.0.0.1:8000/components/auth/risk…
Top comments (0)