DEV Community

Cover image for Mining Git Internals to Build a Year-in-Review Dashboard
Nimai Charan
Nimai Charan

Posted on

Mining Git Internals to Build a Year-in-Review Dashboard

Every year-end, products like Spotify Wrapped or YouTube Music Wrapped present personalized retrospectives using user data. That idea led to a simple question:

Can we generate a similar “year-in-review” for engineers—purely from Git, without relying on GitHub or Bitbucket APIs?

The result was IQnext Wrapped: an internal tool that analyzes raw Git repositories to generate a visual, gamified summary of engineering activity across multiple years.

This post documents the architecture, core logic, and Git internals behind the project.

High-Level Architecture

The system is intentionally simple:

Git Repositories
      ↓
Go-based Git Miner
      ↓
Aggregated JSON Files
      ↓
Next.js Visualization Layer

Enter fullscreen mode Exit fullscreen mode

There is no database, no backend service, and no scheduled jobs.

The entire pipeline runs locally and produces static artifacts.

Component Breakdown

1. Golang Data Miner

The core of the system is a Go program that analyzes repositories directly via the .git directory.

Key characteristics:

  • Uses go-git/v5
  • Clones and analyzes 12 internal repositories
  • Traverses all branches, not just main
  • Deduplicates commits across branches
  • Computes aggregate statistics:
    • Total commits
    • Lines added / deleted
    • Activity by author
    • Activity by weekday and month

This miner is effectively the warehouse of the system.

2. JSON as the Data Ledger

After analysis, all computed metrics are written to JSON files.

This design decision was deliberate:

  • Flat files simplify development
  • Zero operational overhead
  • Easy consumption by frontend
  • Immutable, auditable snapshots

The JSON files act as:

  • The database
  • The API
  • The single source of truth

3. Frontend: Next.js as the Storyteller

The UI is built using Next.js 14 + TypeScript and consumes the generated JSON directly.

Key features:

  • Year-over-year comparison
  • Team rankings and consistency scores
  • Achievement badges (e.g., Early Bird, Weekend Warrior)
  • Framer Motion for transitions and animations
  • Session-based authentication for internal access

The frontend does no computation—it purely visualizes precomputed data.

Core Insight: The .git Directory Is a Database

Most developers treat .git as an opaque implementation detail. For this project, it is the primary data source.

Understanding Git internals was critical.

Entry Points: .git/HEAD and .git/refs

Each branch reference ultimately points to a commit hash.

By iterating through references:

  • All branches become visible
  • All reachable commits can be collected

This ensures contributions are captured regardless of where they occurred.

Storage Layer: .git/objects and the Commit DAG

Git stores data as a content-addressable object graph.

  • Loose objects (.git/objects/??/)
  • Packfiles (.git/objects/pack/)
  • Objects are compressed using zlib

Using repo.CommitObjects(), go-git transparently:

  • Decompresses objects
  • Reconstructs commits, trees, and blobs
  • Exposes them as Go structs

This enables full traversal of the Directed Acyclic Graph (DAG) of commits.

Understanding Change: Parents, Trees, and Diffs

Each commit points to:

  • One or more parent commits
  • A tree representing a snapshot of the repository

To compute line statistics:

  • Compare a commit to its parent
  • Diff the parent tree against the current tree
  • Drill down to blob-level differences
  • Count additions and deletions

This is how metrics like “lines added” and “lines removed” are derived—without any external tooling.

The Commit Analyzer

All collected data funnels into a central structure:

type RepositoryStats struct {
    Name              string           `json:"name"`
    TotalCommits      int              `json:"total_commits"`
    TotalLinesAdded   int              `json:"total_lines_added"`
    TotalLinesDeleted int              `json:"total_lines_deleted"`
    CommitsByAuthor   map[string]int   `json:"commits_by_author"`
    CommitsByMonth    map[string]int   `json:"commits_by_month"`
    CommitsByWeekday  map[string]int   `json:"commits_by_weekday"`
}

Enter fullscreen mode Exit fullscreen mode

To avoid double-counting commits across branches, the analyzer:

  • Iterates through all reachable commits
  • Tracks seen commit hashes
  • Aggregates metrics only once per unique commit
var allCommitHashes []plumbing.Hash

branchIter.ForEach(func(c *object.Commit) error {
    if !seenCommits[c.Hash.String()] {
        seenCommits[c.Hash.String()] = true
        allCommitHashes = append(allCommitHashes, c.Hash)
    }
    return nil
})

Enter fullscreen mode Exit fullscreen mode

This guarantees a complete and accurate view of contribution history.

Design Constraint: One Core File

The entire backend logic lives in one file: main.go.

Everything else—frontend, animations, UI polish—is secondary.

This constraint forced:

  • Clear data flow
  • Minimal abstractions
  • Easier reasoning and debugging

Lessons Learned

  1. Git already contains everything
    • APIs are convenience layers, not requirements
  2. Flat files are underrated
    • JSON works extremely well for read-heavy analytics
  3. Branch-aware analytics matter
    • mainonly analysis hides real work
  4. Understanding internals pays off
    • Treating Git as a database unlocks new possibilities

Conclusion

IQnext Wrapped started as a side project inspired by year-end review products, but became a deep dive into Git internals, data modeling, and disciplined architecture.

The project reinforced a simple idea:

Many problems do not need more infrastructure—only better understanding of the systems we already use.

The Go-based Git analyzer is robust, the frontend is expressive, and most importantly, the system remains simple, inspectable, and dependency-free.

Top comments (0)