DEV Community

Cover image for DepCast: From Research Scripts to a Real Tool — Finishing a Dependency Intelligence Protocol
Abdelrahman Farag
Abdelrahman Farag

Posted on

DepCast: From Research Scripts to a Real Tool — Finishing a Dependency Intelligence Protocol

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

DepCast is a compatibility intelligence protocol for software package ecosystems. It answers a simple question: "Is it safe to upgrade this dependency right now?"

The core idea is that every failing CI build after a dependency upgrade is a signal — and these signals currently evaporate, unseen and unaggregated. DepCast captures them through a four-factor Compatibility Risk Score (CRS):

CRS(t) = 0.335·V(r) + 0.035·E(r) + 0.584·D(t) + 0.046·H(m)
Enter fullscreen mode Exit fullscreen mode

Where V(r) is API surface volatility, E(r) is downstream exposure, D(t) is observed failure rate from early adopters, and H(m) is maintainer history. Each release gets rated SAFE, WAIT, or AVOID.

The research behind it is real: 51 confirmed breaking npm releases analyzed with SIR epidemiological modeling, 40 non-breaking controls, and AUC-ROC validation on a 91-release dataset. The project started as work toward an arXiv submission and eventual MSR 2027 paper.

Repo: github.com/ahafarag/depcast

Demo

Here's what DepCast looks like now as a working CLI tool:

$ depcast score --V 0.8 --D 0.9 --E 0.2 --H 0.3

  CRS Score:  0.814
  Rating:     AVOID
  Components: V(r)=0.800  E(r)=0.200  D(t)=0.900  H(m)=0.300

  → Pin to prior version and await patch.
Enter fullscreen mode Exit fullscreen mode
$ depcast check chalk@5.0.0

  Package:    chalk@5.0.0
  CRS Score:  0.409
  Rating:     WAIT
  Components: V(r)=0.000  E(r)=0.605  D(t)=1.000  H(m)=0.030
Enter fullscreen mode Exit fullscreen mode
$ depcast list

  Package                   Version      CRS  Rating
  ───────────────────────── ──────────── ──────  ──────
  webpack@4.0.0             4.0.0         0.892  AVOID
  react@16.0.0              16.0.0        0.847  AVOID
  chalk@5.0.0               5.0.0         0.409  WAIT
  lodash@4.0.0              4.0.0         0.156  SAFE
  ...

  Total: 91 releases
Enter fullscreen mode Exit fullscreen mode

The test suite runs 29 tests in under a second:

$ pytest tests/ -v
========================= 29 passed in 0.26s =========================
Enter fullscreen mode Exit fullscreen mode

The Comeback Story

Where It Was (The "Before")

DepCast started as a research project — a collection of Python scripts that I ran manually in sequence to produce data for a paper. The "product" was a CSV file and two matplotlib figures.

Here's what the repo looked like before the finish-up:

  • 16 commits across 8 standalone scripts in a scripts/ folder
  • No package structure — just python scripts/05_compute_crs_validation.py executed from the root
  • No tests of any kind
  • No CI/CD
  • No CLI — the only way to interact was to edit script parameters and re-run
  • No Docker — reproducibility depended on the reader getting the exact same Python environment
  • No releases or tags
  • The core CRS formula was buried inside a 500-line validation script

It worked for my purposes — generating figures and CSV outputs for a paper draft. But nobody else could realistically use it. The README was thorough (it described the methodology, the math, the research agenda), but the code was a researcher's notebook, not a tool.

What Changed (The "After")

I added 10 new files that transformed DepCast from scripts-in-a-folder to an installable Python package with a real CLI:

Package structure (depcast/)

  • __init__.py — clean public API
  • crs.py — core CRS module extracted from script 05, with input validation, a CRSResult dataclass, and proper docstrings
  • cli.py — four commands: score (compute CRS from features), check (look up a package@version), list (show all scored releases), pipeline (run the full analysis)

Test suite (tests/)

  • test_crs.py — 24 tests covering scoring math, rating thresholds, boundary conditions, input validation, custom weights, and default weight invariants
  • test_cli.py — 5 tests covering CLI commands via subprocess

CI/CD and packaging

  • .github/workflows/ci.yml — GitHub Actions running tests across Python 3.9–3.12 with flake8 linting, coverage reporting, and package build verification
  • pyproject.toml — PEP 621 metadata with depcast CLI entry point, dev dependencies, and test configuration
  • Dockerfile — reproducible execution environment
  • CHANGELOG.md — documents the v0.5 → v1.0.0 transition

The key architectural decision was extracting the CRS computation into a standalone module (depcast/crs.py). The original script 05 loaded CSV files, merged dataframes, normalized features, trained a logistic regression, plotted five-panel figures, and computed CRS scores — all in one 500-line function. The new compute_crs() function does exactly one thing: take four normalized feature values, apply weights, and return a result. That's what made the CLI and the test suite possible.

My Experience with GitHub Copilot

GitHub Copilot was most useful in three specific phases of the finish-up:

1. Test generation — the highest-leverage moment

Writing tests for a CRS module requires thinking through boundary conditions, weight invariants, and edge cases. Copilot was genuinely helpful here. After I wrote the first two test methods (test_safe_at_zero and test_safe_at_boundary), Copilot predicted the pattern and started suggesting the full boundary-condition matrix: the WAIT boundaries, the AVOID boundaries, the combined-feature cases. I accepted most suggestions and adjusted the expected values to match the actual weight math.

The TestDefaultWeights class was almost entirely Copilot-suggested — it recognized that checking weight invariants (sum to 1.0, D(t) is heaviest, all four features present) is a standard pattern for score-based systems. That's the kind of structural test I might have forgotten to write myself.

2. CLI scaffolding — argparse boilerplate

The CLI in depcast/cli.py follows a standard subcommand pattern (argparse with subparsers). Copilot is very good at this — after I defined the first subcommand (score), it predicted the structure for check, list, and pipeline with the right argument types and help strings. The CSV-reading logic in cmd_check was mostly Copilot, and it got the rsplit("@", 1) parsing right on the first suggestion.

3. GitHub Actions workflow — CI matrix

The CI workflow is mostly boilerplate, and Copilot generated a working multi-version Python matrix on the first try. The one thing I had to manually adjust was adding the pip install -e ".[dev]" step instead of the pip install -r requirements.txt that Copilot initially suggested — it didn't know I'd structured the project with optional dev dependencies in pyproject.toml.

Where Copilot didn't help much:

The actual CRS module design — deciding what to extract from the 500-line script, what the API should look like, how to handle the dataclass structure — was entirely manual thinking. Copilot can't make architectural decisions for you. It also suggested incorrect pytest assertions a few times (wrong expected values for the weight math), which I caught because I knew the formula. The lesson: Copilot accelerates the mechanical work of writing code, but you still need to understand what you're testing and why.


DepCast is open source under the MIT license. The research targets MSR 2027 and is pending arXiv cs.SE submission.

Top comments (0)