loading...

Introduction to github fork workflow - why is it so complex ?

matks profile image Mathieu Ferment ・7 min read

Dear reader,

This blog post is intended to help people willing to understand better the process to contribute to a github opensource project for the 1st time. It does not explain "how" (this part has been addressed already by multiple posts, you'll find links about this at the end) to use the github fork workflow but rather "why" it is used and why it seems so complex the 1st time you look at it.

So I'll continue this post while considering that

  • you are willing to contribute to an opensource project hosted on github and using the standard github fork workflow
  • you have already learnt about it (and maybe used it, but not mastered it)
  • you feel like this system is unnecessary complex to setup and use

It's about people

Github fork workflow is actually quite a nice answer to a complex topic. This topic is "how to make multiple developers work together on a codebase without hindering each other's work". Because this is what OSS* projects are at the very core: multiple developers working together on a codebase.

*OSS = Open Source Software

Please let that sink in. Have you ever considered how complex it would be to write a book together with multiple people ?

Well, if you let everybody free of modifying any part of the book, this is going to be messy ! People erasing other people's parts (to replace it with their own), people disagreeing on sentences, or people who write duplicate sentences because they did not agree beforehand on who would write which part ...

In order to avoid a mess, you need to manage how people work together. You need to provide people a process, a workflow that allows them to focus on the business value of the book (writing it !) rather than wasting time synchronizing with everybody's agenda.

Obviously someone will need to be in charge of this workflow, carefully checking each person whereabouts, and your workflow needs to allow this person to handle the contribution of each member of the group in a manner that provides him

  • a way to review the contributions
  • a way to merge contributions into the book
  • and it must be easy to use (so handling everything with emails is a no-go) as it is expected there will be a lot of contributors, but very few maintainers

So github fork system aims to solve this topic, although the subject is source code, and not a book... even if github fork system could very well be used to structure how a book is written.

Let's break github workflow down

Now that we know what problem the github fork workflow aims to solve, let's see how it unfolds.

Let's consider the following scenario : a contributor considers the OSS project PrestaShop but thinks the logo is actually quite bad and would look better in black&white. He wants then to suggest a better version of this logo.

PrestaShop logo

Please meet Preston, mascot and logo of PrestaShop

In the beginning of this story, you have the official repository alone, the authority where the official source code belongs to. It is a git repository, hosted on github.com .

One single repository

Now comes the contributor who wishes to suggest a better version of the logo. He is forbidden to do this on the official repository (else it would mean anybody can modify anything in the project and this would lead to chaos).

So github requires him to create a copy of the official repository, which is called a fork.

This copy is, at first, identical to the official repository. But this fork is by no means subjugated to the official repository, it has its own life. The contributor can do whatever he wants with this fork: modify all files, delete all files, use it as a project ... this fork can live a life on its own, and belongs to the contributor only.

I fork the original repository

Back to our usecase. The contributor now wants to modify the logo on his fork rather than the official repository. Since his fork is "his property" he can do whatever he wants on it, including altering the logo to make it black&white.

But his fork is a repository on github. This is not a convenient environment to do some modifications (although github provides quite a nice online editor), especially for images. The contributor will consequently create a git copy, a clone, of his github repository on his own computer.

I create a local copy

At this point there are 3 versions of the PrestaShop OSS project:

  • the official main repository, with the standard logo ; hosted on github
  • the contributor's fork, with the standard logo so far ; hosted on github
  • the contributor's fork copy, with the standard logo so far ; hosted on his computer

3 repositories alike

Since the local copy is on the contributor's computer, he can edit it using his favorite image editor (someone said GIMP ?) and improve the logo. He does so by altering the logo to make it more "classy" ... meaning black&white.

Now there are 3 versions of the PrestaShop OSS project:

  • the official main repository, with the standard logo ; hosted on github
  • the contributor's fork, with the standard logo so far ; hosted on github
  • the contributor's fork copy, with the modified logo ; hosted on his computer

Local copy has modified logo

The contributor then updates his fork by pushing his changes from his computer copy into github. This is a standard git process where you host the code online and edit it on your computer. Usually you mirror the branches of the github repository and your local copy.

I push local changes to my fork

Now there are 3 versions of the PrestaShop OSS project:

  • the official main repository, with the standard logo ; hosted on github
  • the contributor's fork, with the modified logo ; hosted on github
  • the contributor's fork copy, with the modified logo ; hosted on his computer

Finally, the contributor can submit a Pull Request, to request the official repository to acknowledge the modifications he did on his fork ... and make it a part of the official repository.

I submit a Pull Request

The official repository maintainers will be able to look at his Pull Request, review it, suggest modifications and either accept or refuse it.

Summary

When you are being told for the 1st time about github fork workflow, you might think "why ? why does this contribution process involves 3 git repositories, me pushing my work from my computer to a fork then the fork requesting a merge into the official repository, this is such a waste of time ..." . If the contribution was a quick fix (like a missing comma), you might spend more time applying the workflow* than making the necessary code changes !

*For these kind of usecases, github online editor is quite nice as it allows you

  • to modify directly your github fork repository (no need for a local copy)
  • to create a Pull Request automatically from a file of the official repository you want to modify (no need for a fork !)

But when you consider the big picture, it actually makes sense. Each of these 3 repositories serve a purpose.

The official repository is the "location" of the official source code and must be modified only by approved contributions.

Forks allow developers to do whatever they want and allow specific code changes to be submitted for merge into the official repository.

Since forks are hosted on github, they are not easy to use, which is why it is better to make a local git clone of your fork on your computer and work on this copy. On your computer you have access to all your developer tools and environment.

The "hard work" of doing an opensource Pull Request actually comes from passing and pushing changes between these three.

Also, keep in mind that in our scenario we focused on the contributor point of view:

Contributor point of view

But for the project maintainers, the situation rather looks like this !

Pull Requests everywhere, maintainer point of view

Github fork workflow also allows opensource maintainers to keep track of all incoming contributions and efficiently manage them. An efficient tool is necessary for a quality project, which is why so many projects use github fork workflow.

Some parts that I have not covered

In this blog post, I did not talk about

  • the different branches and branching strategy, that can add another layer of complexity to this workflow
  • the issues of keeping a fork up-to-date with the original repository, be it the main branches or rebasing pull request branches

I might cover them in another blog post, as this one is already quite long to read for an "introduction".

After theory comes practice

Here are some great tutorials explaining the "how" to contribute to a github opensource project.

Discussion

pic
Editor guide
Collapse
tdavidsonas88 profile image
Tadas Davidsonas

Nice explanations Mathieu :) Very nice to find some reads from PrestaShop dev.