adopting a branching model for data science experiments: a pragmatic guide to versioned notebooks an
adopting a branching model for data science experiments: a pragmatic guide to versioned notebooks and reproducible workflows
Deep in data science, teams chase insights with notebooks, scripts, and pipelines. The quick feedback loop can turn chaotic fast: experiments spawn many branches of code, data, and results. A solid, reproducible branching strategy helps you track ideas, share成果, and revert to solid baselines without drowning in merge conflicts or ambiguous results. This guide walks you through a practical branching model tailored for data science workflows, with concrete commands, folder layouts, and tips you can apply today.
Overview and goals
- Prevent experiment sprawl from breaking main research progress.
- Keep data, code, and results reproducible across machines and environments.
- Separate exploratory work from production-ready code and datasets.
- Make it easy to compare experiments and roll back when needed.
- Integrate with CI/CD pipelines for automated checks on baseline experiments.
Key concepts:
- Baseline branch: a stable reference containing the most recent publishable results or a known good state.
- Feature/experiment branches: isolated work to test ideas, including code, configs, and references to data versions.
- Data/version control: treat large data with pointers rather than duplicating files; store metadata and hashes.
- Results provenance: track which code, data, and parameters produced a given result. ### Repository layout
Adapt this layout to your project, but keep the separation between code, data references, and results traceable.
- project/
- src/ # Python, R, or notebooks with analysis code
- notebooks/ # Jupyter notebooks; keep them as small as possible or convert to script-based notebooks
- data/ # large data files should not be stored in version control
- data-refs/ # small manifests pointing to data sources/versioned datasets
- results/ # outputs; store summaries, plots, and artifacts
- configs/ # experiment configs, hyperparameters, environment specs
- dev-tools/ # scripts for experiments, runners, utilities
- .gitignore
- README.md
- requirements.txt (or environment.yml)
- workflow.md # notes about the branching strategy and processes
Notes:
- Do not commit large data files. Use data versioning tooling or cloud storage with verifiable hashes.
- Keep notebooks lightweight; prefer converting exploratory steps into scripts or modular functions for reuse. ### Core branching model
This model combines stability with flexible experimentation. It borrows concepts from Git flow and lightweight feature branches but adapts for data-heavy workflows.
- main: the production baseline. Contains the most recent, reproducible results and production-ready code.
- dev: a staging line for ongoing work before it’s ready for main. Used for integrating multiple experiments and validating end-to-end pipelines.
- baseline: a tag-like branch that represents a validated, reproducible state of a particular dataset and model configuration. You can consider baselines as specific commits in main with an accompanying data-refs entry or a dedicated baseline branch per milestone.
- experiments/NAME: short-lived branches for individual experiments (NAME can be a concise identifier: e.g., etl-augment-v1, model-tuning-2026-06).
- hotfixes/BUG-#: quick fixes to main that require fast turnarounds.
- data-patches/NAME: branches or commits that adjust data preprocessing steps, data-refs, or dataset configurations (not raw data).
Branch lifecycle:
- Create an experiment branch from dev or baseline for an isolated idea.
- Commit small, meaningful changes with descriptive messages.
- Rebase or merge changes back to dev after local validation.
- When an experiment proves useful and reproducible, merge its changes into dev, then into main after verification.
- For a successful model run, record the exact data refs and environment, and create a baseline reference. ### Data versioning and reproducibility
Data management is the hardest part. Use these practices to keep experiments trustworthy.
- Data references: store a manifest in data-refs/ that maps logical data names to storage locations, sizes, and checksums.
- Example data-refs/movies-dataset.yaml:
- name: movies-dataset location: s3://data-bucket/datasets/movies/2026-06 sha256: abc123... size: 12.3G
- Data provenance: log the data version, the preprocessing steps, and the code version used to produce results.
- Environment as code: pin dependencies with exact versions (pip-compile, poetry lock, or conda env file).
- Deterministic runs: seed random number generators, fix time-dependent shuffles, and document non-deterministic parts.
- Lightweight data samples: for quick iteration, include a small synthetic dataset or downsampled data; store a sample-refs manifest.
Example: data-refs/mnist-sample.yaml
- name: mnist-sample location: s3://data-bucket/datasets/mnist/small sha256: 9f0b... size: 50MB ### Workflow: day-to-day with notebooks and scripts
1) Start from a baseline
- Check out main and ensure you have a reproducible environment.
- Pull the latest data-refs and environment specs.
- Run a quick baseline cell set to confirm the end-to-end pipeline still works.
2) Create an experiment branch
- git checkout -b experiments/idea-name
- Keep the scope focused: a single hypothesis, a single change in code or config.
3) Manage code and data references
- Keep data manipulation in scripts or modules; notebooks should call these modules rather than contain all logic.
- Add or update data-refs entries to reflect any new datasets or versions you rely on.
4) Validate locally
- Run a full, deterministic pipeline on a small sample first.
- Capture key metrics and create a simple results summary (plots, tables, and a short narrative).
5) Document results in results/
- Save artifacts with clear names, e.g., results/2026-06-02-experiment-idea-name/summary.json, plots/, and a README.md describing setup and outcomes.
6) Review and merge
- When the experiment is reproducible and results are clear, open a pull request to dev.
- Have teammates review the code, data references, and results provenance.
- After CI passes (see CI section), merge into dev. Then, once dev is stable, merge dev into main.
7) Create a baseline when warranted
-
If a particular experiment becomes the new standard, capture its state as a baseline:
- git tag baseline/2026-06-02-idea-name
- Update a baseline manifest describing the data-refs and environment used. ### Practical commands you’ll use
-
Create an experiment branch from dev:
- git checkout dev
- git pull rebase
- git checkout -b experiments/idea-name
-
Run a quick test script (example in Python):
- python run_baseline.py data mnist-sample config configs/baseline.yaml
-
Stage and commit focused changes:
- git add src/ notebooks/ configs/
- git commit -m "Experiment: hyperparameter sweep for model X; updated baseline config"
-
Update data references:
- Edit data-refs/mnist-sample.yaml to point to a new dataset version
- Commit data-refs changes with a clear message
-
Rebase your experiment on latest dev:
- git fetch origin
- git rebase origin/dev
-
Create a summary of results:
- mkdir -p results/2026-06-02/idea-name
- cp metrics.json results/2026-06-02/idea-name/
- echo "Experiment summary" > results/2026-06-02/idea-name/README.md
-
Merge back to dev after review:
- git checkout dev
- git merge no-ff experiments/idea-name
- git push origin dev
-
Create a baseline tag:
- git tag baseline/2026-06-02-idea-name
- git push origin baseline/2026-06-02-idea-name ### Environment and reproducibility
-
Use a single, shareable environment spec:
- Python: poetry lock or pip-compile to pin versions
- R: packrat or renv snapshots
- Conda: environment.yml with exact package versions
-
Containerize when possible:
- Dockerfile or nix-shell to reproduce the exact runtime
- Include a minimal, reproducible Docker command for others to run
-
Automate checks:
- Linting for code quality
- Static checks for notebooks (nbqa, flake8)
- Small unit tests or sanity checks on key functions
- End-to-end test with a tiny sample dataset ### Notebooks: best practices
Keep notebooks as narrative shells that call modular code.
Limit the amount of raw data inside notebooks; load data via scripts that reference data-refs.
Clear outputs: reset cell outputs before committing, and avoid large outputs in the repo.
Use nbextensions or JupyterLab code folding to keep a clean view of experiments.
-
Version-notebook artifacts:
- Store a notebook skeleton in notebooks/ and generate executed notebooks during runs, with provenance recorded in results/. ### Collaboration and review
-
PRs should include:
- A short summary of the hypothesis and approach
- A reproducible runbook: steps to reproduce results, environment specs, data-refs
- A provenance section listing code commits, data references, and parameter choices
-
Review checklist:
- Is the data provenance complete and verifiable?
- Are the results reproducible with the given environment and data refs?
- Are there any data governance or privacy concerns?
- Is the experiment scope clearly stated and contained? ### Common pitfalls and how to avoid them
-
Pitfall: Not pinning data references or environment versions.
- Solution: Maintain a data-refs manifest and an environment lock; require CI to validate reproducibility on a clean environment.
-
Pitfall: Long-running experiments on main branch.
- Solution: Use dev and feature branches; never run exploratory code on main without explicit intent.
-
Pitfall: Merge conflicts in notebooks.
- Solution: Convert notebook-focused explorations into modular scripts; use notebooks mainly for storytelling and quick validation.
-
Pitfall: Missing provenance for results.
- Solution: Always record a results.json with a reference to code commit, data refs, config, and environment. ### Quick-start checklist
[ ] Define baseline and create a data-refs manifest for the current state.
[ ] Create an experiments/NAME branch from dev.
[ ] Implement a focused hypothesis with bounded changes.
[ ] Run a deterministic, small-scale test; record results in results/ and summarize in README.
[ ] Update data-refs as needed; pin environment versions.
[ ] Submit a PR to dev with a clear provenance section.
-
[ ] If validated, merge to dev, then to main and tag a new baseline.
Example scenario: tuning a model with a controlled data subset
1) Baseline
- main points to a baseline with dataset version v1 and config baseline.yaml.
- Run: python train.py config configs/baseline.yaml data data-refs/mnist-sample.yaml
- Save results: results/2026-06-02-baseline/summary.json
2) Experiment branch
- git checkout dev
- git checkout -b experiments/model-tune-v2
- Modify configs/tuning.yaml to adjust learning rate and regularization
3) Reproducibility
- Update data-refs to point to mnist-sample v1.1
- Lock environment to exact versions
4) Validation
- Run training on small subset; log metrics; generate a concise plot
- Commit changes with a message like: "Experiment: tune LR and reg for model X on mnist-sample v1.1"
5) Merge and baseline
- PR to dev; after review, merge
- If results improve, create a new baseline tag: baseline/2026-06-02-model-tune-v2 If you’d like, I can tailor this workflow to your exact stack (Python vs R, notebooks vs scripts, cloud storage you already use, and CI tools). Would you prefer a version-control workflow that emphasizes notebook-driven experimentation with data-refs, or one that leans more on script-first pipelines with strict data provenance?
-
Rizwan Saleem | https://rizwansaleem.co
Top comments (0)