Johannes

Posted on Nov 26, 2018 • Originally published at jite.eu

Wtf is git?

#git #vcs #development

Introduction

Version control... One of the corner stones of the development process.

There are a lot of Version Control Systems, but the one mainly used in my field of work is Git. I work a lot with git and I work a lot with people who use git. But now and then a person who has not used git earlier comes along. This post is intended to give some basic information on how to work with Git and a general idea of what it is. It's mainly written for people with some basic knowledge of some type of version control system but should hopefully give something useful to people whom have never used any too.

As always, if you find any oddities, let me know in comments and I'll take a look as soon as possible!

What is Git

Git is a version control system, a program which saves differences between files - sort of like a checkpoint - it even creates a history of file changes.

It's not just useful for creating a history of changes though, it's also one of the core parts of a development project.

Each developer in the project run their own git client, each time they have made a change they commit the change to their local git repository. Whenever they feel the need or feel that it's time to do so, they push their local changes up to a server, a remote.
When a developer have pushed their changes, the other developers can pull the changes down from said remote to their own code base and then push their own! The remote repository will always contain all developers (pushed) file changes!

Most developers have an idea of what Git is, I bet that even most developers nowadays use it in one or another way. But a lot of developers have only had the chance to learn the basics of it, and even more have no idea how to use the git Command Line Interface (CLI) in a good way. Now, that's okay, if you dont want to use the CLI then dont, using a git GUI is not a bad thing! But the point of this post is to give some info on how to use the git CLI and how to commit, merge, push and pull with it.

The repository

Initializing and cloning a remote repository

When a project is first started someone should set up the remote git repository.
The remote server in the following examples will be github, (which is a free-for-public-projects type of git remote server), there are a lot of sites to sign up for free hosting of git repositories, and one could host their own git server, but github is one of the most well known, so to make it easy, github is used in the examples.

The easiest way is to just set it up by clicking "new repository" in the menu and choose a repository name (the description, readme and license is quite self explanatory, so I'll leave that to you to try out).

When the repository has been created on the remote, it's possible to clone the repository. This can be done either by using the http protocol (https even) or ssh.
If the repository is public, you don't have to log in locally, but if it is git will tell you how!

Cloning is done quite easily by the following command (in a directory where the project should be):

git clone git@github.com:some_user/some_repository.git

After a wait (which depends on how large the repository is) the repository will have been downloaded locally to your folder.

Any changes to the local repository will not be on the remote repository if no push is made.

Initializing with init

Another way to start of a project is to not have it on the remote from start. This can be useful if you have not yet decided where to put the repository, or just want to create and start of before you even bother sharing it with others.

To initialize a local git repository, you use the following command:

git init

Now, this creates the files that git requires (placed in a .git directory in the project dir) and you can now make changes and commit them to the local repository.

At any time you can add a remote server.

git remote add origin git@github.com:some_user/some_repository.git

note: origin is the name of the default remote.

Now the remote is the same as the one in the clone example!

Committing

Committing is a word that - for me - was a bit confusing as I came from a SVN (which is another version control system) background. In SVN, commit means that you commit your changes to the server. It does not mean that in git. In git, commit means that you "save" the currently added file changes to the local repository.

Before you can commit, you will have to add the files you wish to commit.

Adding files is done with git add <file>, thats all. Although before committing, I would recommend that you check the difference between your file and the changes last committed, so you can be sure that you didn't make some changes that you didn't intend to do (git diff <file>).

After adding the changes that you wish to "save", the commit command is used. When you use the commit command you should always leave a message. The message you leave will be stored locally and whenever you push, on the remote and will make it easier for both you and any possible collaborator to see what the meaning of a given commit is.

The commit message should describe the changes you made, not in depth, but at least so that you or anyone looking in the logs can understand what you where thinking.

Always leave a message!

Committing is done with the git commit -m "Message" command, and after that is done, you will have a "checkpoint" in the code!

If you feel that the message is too long, that you did too many changes, you have done wrong! Don't fret though, most people who start with git does this - to be honest, I still do from time to time - but best practice is to commit often!

So, in conclusion:

# Edit file a.txt and b.txt
git add a.txt b.txt
git commit -m "Updated a.txt with something and changed b.txt with something else!"

Pulling and Pushing

Push is the word that is used for sending your code to the server. This is often done when you have made enough changes to complete a specific task, or in some cases, at the end of the day or at a specific time, depending on the project and team (I'll describe my personal flavour of git-flow in a later post, but for now, lets leave it at this).

Pull is the word that is used for fetching the data from the server and add it to your own codebase.

Push without pull?

A good rule of thumb is to never push your changes without first checking if there are other changes on the server first.

When you push data to the repository, all the changes that you made locally will be tried against the server codebase. Luckily, git does some checks before merging, that means, you can't actually push code to the server that will interflict and it will let you know instead, nevertheless don't do that.

So before pushing, you should do a pull...

Is it safe to pull?!

No, it's not always safe to pull.

That might sound like pulling will kill you, but it's not that bad!

If you pull and there are changes on the server that interflict with your changes, there will be merge conflicts. Merge conflicts are okay, it's something that you will have to deal with every now and then, but there are ways to avoid it as much as possible.

Tip number 1: Use branches!

Small disclaimer: This part is mainly personal preferences, other people might disagree, but this is the way that I feel have been working best, especially while we have had junior developers in the team!

Always work in branches.

When you have a task, create a branch. There is always a remote branch that the team will merge all their changes into, that branch is usually the best one to start of from, to use as base for your branch.

Checkout the branch with the git checkout <branch name> command and make sure it is up to date with the remote (by pulling). If you have not changed anything yourself in the branch, it should be totally fine to pull, no merge conflicts should be possible. Then, when it's up to date, create a branch from it with the git branch <your new branch name>. When the branch is created, check it out with the git checkout <your new branch name> command, and you will have your own branch, ready to develop in!

Whenever you are done with your task, you commit all your changes. When committed, you make sure that there are no conflicts between your and the other branches code, and then you can merge!

It's possible to merge on the remote server, in some cases it's easier, as it will give you a lot of info about merge conflicts and such, but I prefer to do it locally and in this post I will describe how to do that.

In conclusion:

git checkout develop    # Checkout the develop branch which will be used as base.
git pull origin develop # Make the develop branch up to date with remote.
git branch my-branch
git checkout my-branch  # Now the current branch will be 'my-branch'.
# Edit file a.txt and b.txt
git add a.txt b.txt
git commit -m "Updated a.txt with something and changed b.txt with something else!"
# At this point, the local repository have your changes in the my-branch branch!

Diff

The git diff command is not only used to show the difference between your current changes and your last commit, it can also be used to show differences between two branches.
When you do this, you will get detailed information about what changes has been done that conflicts. With this info it's possible for you to fix the potential merge conflicts even before they are merge conflicts!

To check the diff, you use the git diff <your new branch name>..<the other branch> (note the two dots), this will show you the difference between your branch and the other branch at their current state.

When you check the diff, you can see where both you and the other branch has changes in the code at the same place, those places are where the merge conflicts might happen. Fix those, and you won't have to see a conflict ever!

And then we pull..?

No...

After we have diffed the branch, we merge the other branch into ours, we make sure that our code and the remote code are at the same point + the changes made in the new branch!

I always recommend people to use the --ff-only argument. The --ff part means "fast forward" (add all your commits onto the other branches commits without merging the commits together). The only part makes the whole thing a slight bit safer. If git can't fast forward (because of a conflict) it will say that it cant, and the code will stay as when you first typed the command. I use it like sort of a dry-run. If it does complete successfully, I'm happy, if it does not, I know so and prepare to fix the conflicts!
If there are conflicts, I usually remove the only and fix the merge conflicts.

git merge <other branch> --ff-only

Oh no! Conflicts!

Conflicts happen, and it's okay! Whenever you encounter a conflict, you should deal with it. Don't panic, it might be a pain, but it's fixable!

So what is done when a conflict happens? Well, you should manually merge by starting a merge-tool.

A merge tool is a tool which helps you with the merge (the name kind of hints on that). While the merge-tool is running a few files will be created. The files created will be named filename.orig, and are backup files. If they are still in the folder when you are done merging, it's okay to remove them, they should never be added to the source control.
Remember, it's all quite safe, you will still have your changes saved in the repository, so without doing something really wrong, you won't loose your (or anyone elses) code. I personally prefer the graphical tools that come with my IDE, but not everyone have an IDE, there are free ones online and there is a cli one included in git by default. So whenever you get a conflict, you can start the default tool by typing git mergetool.

The merge-tool will show you the differences and where there are merge conflicts. The conflict is, as stated earlier, a part of the code which has been changed both by you and someone else.

There is usually an option to select either your or the other persons parts of the code, but you could also use both, or change it to something completely different!

When the merge conflicts are fixed, you commit the files and resume the merge.

At the end, you will have a branch which contains both your code and the other branches code. And that's when you can push (or rather (imho) create a pull-request).

The push command

When you are done with your merge it's always a good idea to make sure that none in the team have had the audacity to push while you where merging! Hehe... Do the above routine again to make sure, and when you see that there is nothing left to merge, you can push to the remote.

If you use your own branch and will be creating a pull-request, just push the branch with the git push origin <my branch name>, but if its a shared branch, use that as name instead.

When you have finally pushed the code to the server, the whole cycle starts once again!

Last words

Hopefully the information above have given you an idea how to git. It's not that hard, and it's really not that scary!

There are a lot of commands and a lot of different ways to do stuff with git, but this is the absolute basics, the parts that you need to know to use git in a decent way.
But remember, the most important thing of all when it comes to using git (or any other type of development tool), is to try to use it the same way as your team does. And never be afraid to ask! It's better to ask and get an answer than to do it wrong and have to fix stuff!

This post is cross-posted from my blog and can be found here: https://jite.eu/2017/12/19/git-intro/ (including a bunch of other posts!).

Top comments (7)

Krzysztof Zaporowski • Nov 26 '18

small remark - there's huge discussion between master-only and feature-branches approach, so I wouldn't be such definite by saying that developer should ALWAYS work with branches. It depends on the situation, as almost everything in our industry ;)

Johannes • Nov 26 '18

You are right, I will make it more clear that it's a personal opinion (will update after food!). :)

Johannes • Nov 26 '18

There we go, added a disclaimer at the branches part ;)

ADS-BNE • Nov 27 '18 • Edited

I use git frequently but it frustrates the hell out of me. It just seems to do random things that no one can explain.

For example, yesterday I merged my code onto our development branch, like I have done hundreds of times before. However, this time the pull request got updated with a whole lot of code from other people's branches - hundreds of files I had never touched. This happens not often, but with some regularity. No one in my team knows why and the only advice given is to copy out your files and start a new branch form scratch.

Right now, too, I am trying to update a local development branch: 'git pull origin dev-branch' but am getting the error Could not read from remote repository. Again, no one knows why. I can access the repo fine on Github and it was working fine yesterday, but today it's not (and is costing me hours of time in debugging).

Git just seems way too temperamental to me. Does anyone else have this similar experience?

Johannes • Nov 27 '18

Have not noticed that type of issues if I don't count the times people have used git commit -a, could it be that your git client changes line endings or "lint fixes" files? Or that you merge/rebase other peoples branches before pushing?

Could not read from remote repository. I get when the repository is down or my net is broken, could that be the issue in your case?

Johannes • Feb 25 '19

I do like rebase in many cases (even if it might mess up the history a small bit), but I choose to leave it out in this part, as it's a slight bit over the general level of the text, hehe. A good point though!

Johannes • Nov 26 '18

Nice, good links :)