DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

adopting a branching model for data science experiments: a pragmatic guide to versioned notebooks an

adopting a branching model for data science experiments: a pragmatic guide to versioned notebooks an

adopting a branching model for data science experiments: a pragmatic guide to versioned notebooks and reproducible workflows

Deep in data science, teams chase insights with notebooks, scripts, and pipelines. The quick feedback loop can turn chaotic fast: experiments spawn many branches of code, data, and results. A solid, reproducible branching strategy helps you track ideas, share成果, and revert to solid baselines without drowning in merge conflicts or ambiguous results. This guide walks you through a practical branching model tailored for data science workflows, with concrete commands, folder layouts, and tips you can apply today.

Overview and goals

  • Prevent experiment sprawl from breaking main research progress.
  • Keep data, code, and results reproducible across machines and environments.
  • Separate exploratory work from production-ready code and datasets.
  • Make it easy to compare experiments and roll back when needed.
  • Integrate with CI/CD pipelines for automated checks on baseline experiments.

Key concepts:

  • Baseline branch: a stable reference containing the most recent publishable results or a known good state.
  • Feature/experiment branches: isolated work to test ideas, including code, configs, and references to data versions.
  • Data/version control: treat large data with pointers rather than duplicating files; store metadata and hashes.
  • Results provenance: track which code, data, and parameters produced a given result. ### Repository layout

Adapt this layout to your project, but keep the separation between code, data references, and results traceable.

  • project/
    • src/ # Python, R, or notebooks with analysis code
    • notebooks/ # Jupyter notebooks; keep them as small as possible or convert to script-based notebooks
    • data/ # large data files should not be stored in version control
    • data-refs/ # small manifests pointing to data sources/versioned datasets
    • results/ # outputs; store summaries, plots, and artifacts
    • configs/ # experiment configs, hyperparameters, environment specs
    • dev-tools/ # scripts for experiments, runners, utilities
    • .gitignore
    • README.md
    • requirements.txt (or environment.yml)
    • workflow.md # notes about the branching strategy and processes

Notes:

  • Do not commit large data files. Use data versioning tooling or cloud storage with verifiable hashes.
  • Keep notebooks lightweight; prefer converting exploratory steps into scripts or modular functions for reuse. ### Core branching model

This model combines stability with flexible experimentation. It borrows concepts from Git flow and lightweight feature branches but adapts for data-heavy workflows.

  • main: the production baseline. Contains the most recent, reproducible results and production-ready code.
  • dev: a staging line for ongoing work before it’s ready for main. Used for integrating multiple experiments and validating end-to-end pipelines.
  • baseline: a tag-like branch that represents a validated, reproducible state of a particular dataset and model configuration. You can consider baselines as specific commits in main with an accompanying data-refs entry or a dedicated baseline branch per milestone.
  • experiments/NAME: short-lived branches for individual experiments (NAME can be a concise identifier: e.g., etl-augment-v1, model-tuning-2026-06).
  • hotfixes/BUG-#: quick fixes to main that require fast turnarounds.
  • data-patches/NAME: branches or commits that adjust data preprocessing steps, data-refs, or dataset configurations (not raw data).

Branch lifecycle:

  • Create an experiment branch from dev or baseline for an isolated idea.
  • Commit small, meaningful changes with descriptive messages.
  • Rebase or merge changes back to dev after local validation.
  • When an experiment proves useful and reproducible, merge its changes into dev, then into main after verification.
  • For a successful model run, record the exact data refs and environment, and create a baseline reference. ### Data versioning and reproducibility

Data management is the hardest part. Use these practices to keep experiments trustworthy.

  • Data references: store a manifest in data-refs/ that maps logical data names to storage locations, sizes, and checksums.
    • Example data-refs/movies-dataset.yaml:
    • name: movies-dataset location: s3://data-bucket/datasets/movies/2026-06 sha256: abc123... size: 12.3G
  • Data provenance: log the data version, the preprocessing steps, and the code version used to produce results.
  • Environment as code: pin dependencies with exact versions (pip-compile, poetry lock, or conda env file).
  • Deterministic runs: seed random number generators, fix time-dependent shuffles, and document non-deterministic parts.
  • Lightweight data samples: for quick iteration, include a small synthetic dataset or downsampled data; store a sample-refs manifest.

Example: data-refs/mnist-sample.yaml

  • name: mnist-sample location: s3://data-bucket/datasets/mnist/small sha256: 9f0b... size: 50MB ### Workflow: day-to-day with notebooks and scripts

1) Start from a baseline

  • Check out main and ensure you have a reproducible environment.
  • Pull the latest data-refs and environment specs.
  • Run a quick baseline cell set to confirm the end-to-end pipeline still works.

2) Create an experiment branch

  • git checkout -b experiments/idea-name
  • Keep the scope focused: a single hypothesis, a single change in code or config.

3) Manage code and data references

  • Keep data manipulation in scripts or modules; notebooks should call these modules rather than contain all logic.
  • Add or update data-refs entries to reflect any new datasets or versions you rely on.

4) Validate locally

  • Run a full, deterministic pipeline on a small sample first.
  • Capture key metrics and create a simple results summary (plots, tables, and a short narrative).

5) Document results in results/

  • Save artifacts with clear names, e.g., results/2026-06-02-experiment-idea-name/summary.json, plots/, and a README.md describing setup and outcomes.

6) Review and merge

  • When the experiment is reproducible and results are clear, open a pull request to dev.
  • Have teammates review the code, data references, and results provenance.
  • After CI passes (see CI section), merge into dev. Then, once dev is stable, merge dev into main.

7) Create a baseline when warranted

  • If a particular experiment becomes the new standard, capture its state as a baseline:

    • git tag baseline/2026-06-02-idea-name
    • Update a baseline manifest describing the data-refs and environment used. ### Practical commands you’ll use
  • Create an experiment branch from dev:

    • git checkout dev
    • git pull rebase
    • git checkout -b experiments/idea-name
  • Run a quick test script (example in Python):

    • python run_baseline.py data mnist-sample config configs/baseline.yaml
  • Stage and commit focused changes:

    • git add src/ notebooks/ configs/
    • git commit -m "Experiment: hyperparameter sweep for model X; updated baseline config"
  • Update data references:

    • Edit data-refs/mnist-sample.yaml to point to a new dataset version
    • Commit data-refs changes with a clear message
  • Rebase your experiment on latest dev:

    • git fetch origin
    • git rebase origin/dev
  • Create a summary of results:

    • mkdir -p results/2026-06-02/idea-name
    • cp metrics.json results/2026-06-02/idea-name/
    • echo "Experiment summary" > results/2026-06-02/idea-name/README.md
  • Merge back to dev after review:

    • git checkout dev
    • git merge no-ff experiments/idea-name
    • git push origin dev
  • Create a baseline tag:

    • git tag baseline/2026-06-02-idea-name
    • git push origin baseline/2026-06-02-idea-name ### Environment and reproducibility
  • Use a single, shareable environment spec:

    • Python: poetry lock or pip-compile to pin versions
    • R: packrat or renv snapshots
    • Conda: environment.yml with exact package versions
  • Containerize when possible:

    • Dockerfile or nix-shell to reproduce the exact runtime
    • Include a minimal, reproducible Docker command for others to run
  • Automate checks:

    • Linting for code quality
    • Static checks for notebooks (nbqa, flake8)
    • Small unit tests or sanity checks on key functions
    • End-to-end test with a tiny sample dataset ### Notebooks: best practices
  • Keep notebooks as narrative shells that call modular code.

  • Limit the amount of raw data inside notebooks; load data via scripts that reference data-refs.

  • Clear outputs: reset cell outputs before committing, and avoid large outputs in the repo.

  • Use nbextensions or JupyterLab code folding to keep a clean view of experiments.

  • Version-notebook artifacts:

    • Store a notebook skeleton in notebooks/ and generate executed notebooks during runs, with provenance recorded in results/. ### Collaboration and review
  • PRs should include:

    • A short summary of the hypothesis and approach
    • A reproducible runbook: steps to reproduce results, environment specs, data-refs
    • A provenance section listing code commits, data references, and parameter choices
  • Review checklist:

    • Is the data provenance complete and verifiable?
    • Are the results reproducible with the given environment and data refs?
    • Are there any data governance or privacy concerns?
    • Is the experiment scope clearly stated and contained? ### Common pitfalls and how to avoid them
  • Pitfall: Not pinning data references or environment versions.

    • Solution: Maintain a data-refs manifest and an environment lock; require CI to validate reproducibility on a clean environment.
  • Pitfall: Long-running experiments on main branch.

    • Solution: Use dev and feature branches; never run exploratory code on main without explicit intent.
  • Pitfall: Merge conflicts in notebooks.

    • Solution: Convert notebook-focused explorations into modular scripts; use notebooks mainly for storytelling and quick validation.
  • Pitfall: Missing provenance for results.

    • Solution: Always record a results.json with a reference to code commit, data refs, config, and environment. ### Quick-start checklist
  • [ ] Define baseline and create a data-refs manifest for the current state.

  • [ ] Create an experiments/NAME branch from dev.

  • [ ] Implement a focused hypothesis with bounded changes.

  • [ ] Run a deterministic, small-scale test; record results in results/ and summarize in README.

  • [ ] Update data-refs as needed; pin environment versions.

  • [ ] Submit a PR to dev with a clear provenance section.

  • [ ] If validated, merge to dev, then to main and tag a new baseline.

    Example scenario: tuning a model with a controlled data subset

1) Baseline

  • main points to a baseline with dataset version v1 and config baseline.yaml.
  • Run: python train.py config configs/baseline.yaml data data-refs/mnist-sample.yaml
  • Save results: results/2026-06-02-baseline/summary.json

2) Experiment branch

  • git checkout dev
  • git checkout -b experiments/model-tune-v2
  • Modify configs/tuning.yaml to adjust learning rate and regularization

3) Reproducibility

  • Update data-refs to point to mnist-sample v1.1
  • Lock environment to exact versions

4) Validation

  • Run training on small subset; log metrics; generate a concise plot
  • Commit changes with a message like: "Experiment: tune LR and reg for model X on mnist-sample v1.1"

5) Merge and baseline

  • PR to dev; after review, merge
  • If results improve, create a new baseline tag: baseline/2026-06-02-model-tune-v2 If you’d like, I can tailor this workflow to your exact stack (Python vs R, notebooks vs scripts, cloud storage you already use, and CI tools). Would you prefer a version-control workflow that emphasizes notebook-driven experimentation with data-refs, or one that leans more on script-first pipelines with strict data provenance?

-

Rizwan Saleem | https://rizwansaleem.co

Top comments (0)