Jen Chan

Posted on Mar 2, 2019 • Edited on Mar 4, 2019

Hack or maybe not: "Deleting" master when it gets too big

#todayilearned #git #discuss #productivity

Just documenting a pro-tip from a friend. Any dev opinion is precious to me due to not working amongst them. Please don't go to twitter to flame him.

I'm just making notes for my future self.

My initial thoughts: Perhaps the trick here is the difference between pull and fetch. Pull attempts to merge your changes with existing commits on remote, and with a huge repo that's risky...?

Update, for my own education and edification:

@joelnet pointed out:

"For those of you that are confused, master is not deleted on the remote repo. It is only deleted on the local repo. The way git fetch works, you always have a local copy. ...once you do `git checkout -b my-new-branch origin/master`, you'll have an identical copy of `origin/master`[.] it'll just be called `my-new-branch` locally."

@yucer added:

"He means not checking out the master branch locally, just to use it [it] as base to checkout your branch. You can checkout your new branch directly from the remote ("origin" in this case). The difference is that, when you checkout master locally then git build[s] a working tree for it. ...to understand it correctly, you need to keep in mind that you can't work in all of the branches. ...git stores the changes differentially, when you checkout a branch it needs to calculate the complete state of the files and make the needed changes to you local files to reach those changes."

Conrad's school of thought seems to resound with something I read a long while back on Mark Longair's blog "GIT: FETCH AND MERGE, DON’T PULL" :

...by both fetching and merging in one [git pull] command, your working directory is updated without giving you a chance to examine the changes you’ve just brought into your repository. The safe ways to change remote-tracking branches are with git fetch or as a side-effect of git-push; you can’t work on remote-tracking branches directly. In contrast, you can always switch to local branches and create new commits to move the tip of the branch forward. So what you mostly do with remote-tracking [local] branches is one of the following: 1. Update them with git fetch 2. Merge from them into your current branch 3. Create new local branches based on them

I have yet to try working this way but can see git repos being updated like the former in very large projects.

@codemouse92 prefers to maintain an easy to access, canonical copy of what worked before.

You [already] pull the entire thing down the first time with `git fetch`, and then the incremental changes later with a basic `git pull`. ...`git pull` uses the same bandwidth as `git fetch`, you'll still saving yourself a step by not deleting master off your local copy every time. ...Don't delete `master`, leave it alone.

Oldest comments (29)

Jason C. McDonald • Mar 2 '19

Ohhhhhhhhhhhhhhhhh, no, please don't. master should be your canonical "we checked that this code works". If deleting master seems like a "good idea," the team workflow is seriously borked, and you need to get outside help to overhaul it.

Jen Chan • Mar 2 '19

Thanks for giving me perspective. I always enjoy these threads cause I'm never 100% sure of the approach but my lack of experience makes me real malleable

Sal Rahman • Mar 3 '19

The author did not actually delete master on the remote repo, if that's what you're implying.

Jason C. McDonald • Mar 3 '19

While I wasn't entirely certain which master he referred to, my concern stands in either case. See my replies to the other comment pointing out that he meant "delete local master".

Sal Rahman • Mar 3 '19 • Edited

If deleting master seems like a "good idea," the team workflow is seriously borked [...].

Like how?

JavaScript Joel • Mar 2 '19

For those of you that are confused, master is not deleted on the remote repo. It is only deleted on the local repo.

If all your merging happens on the remote repo, you don't need master locally. You can fetch and then create your feature branch from origin/master.

I'm not sure the benefits of this because I don't do this myself. You still need to update your local repo with a fetch. So I am doubtful of the savings.

Not endorsing, just offering an explanation.

Comment deleted

JavaScript Joel • Mar 3 '19

I often need to be able to refer back the canonical "this worked" version, and the easiest and most reliable way to do that is to keep a copy locally.

The way git fetch works, you always have a local copy.

When you do this...

git fetch
git checkout -b my-new-branch origin/master

... Your branch my-new-branch is identical to origin/master.

So at any time you can always get that "this worked" version.

Comment deleted

JavaScript Joel • Mar 3 '19

No point wasting extra bandwidth to pull the entire thing down from scratch each time.

The whole thing is pulled down during git fetch.

Otherwise there is no way you could create a new branch off of it.

This can be confirmed by disconnecting from the internet during the internet during the git checkout stage or by doing a git branch -a to see all available branches.

There is no avoiding pulling the whole master branch.

Comment deleted

JavaScript Joel • Mar 4 '19

You pull the entire thing down the first time with git fetch, and then the incremental changes later with a basic git pull.

git pull and git fetch will use the same bandwidth. The difference between the two is git pull will also perform a merge. But you could do a git fetch and then a merge to achieve the same results of a pull.

you'll still saving yourself a step by not deleting master off your local copy every time.

The method laid out by OP would only require deleting the local version of master one time, not every time.

But because both pull and fetch get all files. There isn't much saved by performing the process described in the original tweets.

Comment deleted

JavaScript Joel • Mar 4 '19 • Edited

don't create extra work for yourself by deleting it (yes, locally) when you're only going to need it again.

The main point of OP's post was that you can delete the master branch locally because using the process laid out above, you would never need a local master branch again.

There is no extra work. It is actually less work. You would only need to run 2 commands instead of 3 and with any pull, there is the possibility of merge conflicts.

normal method

git checkout master
git pull # possible merge conflicts here
git checkout -b my-new-branch

OP's method

git fetch
git checkout -b my-new-branch origin/master

Jen Chan • Mar 2 '19 • Edited

Ooh thanks for identifying that only the local is deleted.

So this was my thought process: I thought they were pulling from a forked repo. I also thought they meant checkout master as my-new-branch, delete master locally AND remotely, and push my-new-branch as the authoritative new master

It seems weird not to keep previous work around for future reference. I suppose they wanted to reduce redundancy of branches or repo files at that point. My mind likes to makeup stories

JavaScript Joel • Mar 3 '19

The work is always there in the.git folder. You just don't have the "branch".

But once you do git checkout -b my-new-branch origin/master, you'll have an identical copy of origib/master it'll just be called my-new-branch locally.

So when you build your future with, it's built it the latest files.

Ben Sinclair • Mar 2 '19

I don't understand why this would do anything at all. my-branch-name is still going to have all the history of master in it, and now things are just that bit more confusing for new people starting on the project. And it doesn't delete master either locally or remotely. Are you sure this isn't a joke?
If you want to clean stuff up you can always squash some commits and run git gc, I guess, but unless your repo contains some seriously bonkers history it's probably not going to be an issue.

Jen Chan • Mar 2 '19

I assume they decide to keep all the history up til that point... or methodically rewrite it

I think they meant to make a copy/alternate branch, push that, and deploy that going forward?

yucer • Mar 3 '19

No. He means not checking out the master branch locally, just to use it as base to checkout your branch.

You can checkout your new branch directly from the remote ("origin" in this case).

The difference is that, when you checkout master locally then git build a working tree for it.

In order to understand it correctly, you need to keep in mind that you can't work in all of the branches.

Given that git stores the changes differentially, when you checkout a branch it needs to calculate the complete state of the files and make the needed changes to you local files to reach those changes.

I guess git calculates the commits it need to revert from your branch in order to reach merge base (the oldest common commit from both branches) and then it starts to apply the ones from the other branch.

That would trigger a lot of modifications for the local files and make take a long time specially if the files are big or there is a big amount of them in master.

That's my guess. That the problem is not so much with the history but with the amount of files.

Ben Sinclair • Mar 3 '19

Oh, gotcha.

yucer • Mar 3 '19

If the problem is really with a big history... I guess it can be solved cloning the repo with an specific deep. The parameter is --deep. For example:

git clone --deep=100

Sal Rahman • Mar 3 '19

The only benefit that I see here is that I don't end up accidentally pushing to master, which has happened to many projects that I have worked on in the past.

JavaScript Joel • Mar 4 '19 • Edited

One benefit is that you would not need to perform a merge. Any time you do git pull, you run the risk of having merge conflicts and having to merge.

New Branch with master

git checkout master
git pull # Possible Merge Conflicts!
git checkout -b my-new-branch

New Branch without master

git fetch
git checkout -b my-new-branch origin/master

The 2nd method to create a feature branch described will never run into merge conflict.

Another benefit is that you do not need to enter (checkout) the master branch to create a new feature branch. The feature can be created from anywhere.

Marco Carrozzo • Mar 3 '19 • Edited

Yeah but how do i check branches size?

JavaScript Joel • Mar 4 '19

master isn't gone. You still have full access to it.

You can even do:

git checkout origin/master

To see what is in origin/master without the need for creating a local branch.

You can also do:

git checkout -b new-feature origin/master

At this point, new-feature is identical to origin/master.

Be sure to run git fetch to sync any changes to the remote repo with your local.

Sal Rahman • Mar 3 '19

This whole thread devolved into a micromanagement session on how people work with branches on their local machine.

Jen Chan • Mar 4 '19

Everything is an exercise of interpretation 🤷🏻‍♀️

Jason C. McDonald • Mar 4 '19 • Edited

I deleted my previous comments on this, to spare everyone the overblown misunderstand and unnecessary re-explanation it prompted. So, I'll try again.

Keeping a local copy of master is important for being able to see the "this works" version of the code, apart from your working branch, on your local machine using local tools. You don't edit it, you don't work on it, you only git pull it when it has been updated remotely.

This has nothing to do with branching. You can certainly branch from remote -- it may even save you some effort -- but that's beside my point. The local master copy is your clean "reading copy" of "this code works".

Muhammad Arslan Aslam • Mar 10 '19

I've always worked like this. I start of with master. Push and "initial commit" and then checkout to develop/feature branch and then never turn back to master. ever again.

View full discussion (29 comments)