Ben Link

Posted on Oct 10

The Adventures of Blink S2e5: Source Control with GitHub

#buildinpublic #github #beginners #devops

Hey pals, and welcome to the next Adventure of Blink! If you've been following along, we're on a journey to build the world's most complicated game of Hangman! 😬 Joking aside, we're definitely building with some overkill, but the point is to learn how these tools and techniques work.

So that you can follow along, click HERE for the source code.

TL/DR: YouTube

Shouldn't Source Control be at the BEGINNING?

Usually, yeah it would. In fact, whenever I start a new coding project, it's my custom to build the repo where it will be stored first. This serves two purposes:

It makes me think about the end product from the beginning. This is critical because it's much harder to clean up a project that was built haphazardly than it is to discipline myself to deliver in an organized fashion from the beginning.
It stores the project in the cloud. This benefits me because if I need to ask someone for help with something, I can easily share it. It also protects me from losing my project if something happens to my laptop. GitHub (or GitLab or whatever your preferred SCM tool is) will be much less likely to lose everything than the one hard drive in your local computer!

So why, then, are we here in episode 5 and just getting started with it? Fair question. I wanted to wait until we could look at some more advanced features of GitHub with respect to our project. The world doesn't necessarily need another

git clone https://www.github.com/LinkBenjamin/theAdventuresOfBlinkS2-hangman
git checkout -b newbranch
git commit -am "here's my update"
git push --set-upstream origin newbranch

tutorial. While we'll certainly cover that, I plan to gloss over it on the way to something more interesting: GitHub Actions!

First things first: Git vs GitHub vs GitLab vs...

It's easy to get confused, because everything's so similarly-named. So out of the gate, let's set some definitions:

git is a protocol developed by Linus Torvalds (yeah, THAT Linus Torvalds!) for distributed version control. It defines the operations that you can do on a git repository - creating, editing, merging, and deleting branches, stuff like that. It doesn't (natively) have a website, or even a concept of accounts with credentials.
GitHub is a company that started in April 2008, and was purchased by Microsoft. It provides a web-based platform for hosting and sharing Git repositories (among a bunch of other great features!). It is built on the git protocol.
GitLab is another company, founded in 2014. It provides a similar service to GitHub. It is also built on the git protocol.

How does it work?

The git protocol allows you to track changes using a concept called a "commit". Basically, you pull (think: Download) a copy of the code to your local computer and then make a bunch of edits in your working directory. When you've completed your edits for the session, you create a "commit" containing the updated file(s) that you wish to include (you can select them individually yourself). You can then push this new commit (or several of them, if you've been collecting them) into a remote repository (like GitHub or GitLab).

You can also create branches, or longer-running copies of the code, which can then eventually be merged back into the main branch. This is helpful if you have a larger series of changes to implement and want to keep them separate until you've finished the whole thing.

When do I commit and when do I branch?

Generally speaking, you want to keep commits small and numerous. In modern software development, you hear lots of talk about being Agile and delivering continuously (I even covered these terms back in season 1 here on The Adventures of Blink!). Those concepts work best when our changes are smaller and more frequent, rather than big and slow.

Branching strategies abound in the wild - and everybody's got opinions. Personally, I'm a fan of something called trunk-based development, which means that you don't have long-lived branches.

Your goal is to keep your main branch always in a releasable state. Pure Trunk-Based Development is very hard to implement in a team; it requires tremendous discipline for the team to never create branches for collecting the commits for a larger feature. Most often we see a modified version where the main branch is the releasable product and the team works together in a single branch that then gets merged into main when the feature is ready for implementation.

Digging a little deeper

Now that we've talked about the basics of git, let's check out some features that will come in handy as we adopt it as our source control protocol.

.gitignore

Not everything in your project folder needs to be stored in your git repository. Sometimes you'll have temporary files, or compiled files, or other stuff that you don't want cluttering up your project space.

Git provides for a .gitignore file in your project's root where you list all the file and folder patterns that you don't want git to track.

In a Python app, for instance, you'd include your .venv virtual environments, the pycache, and your .env file (because it could contain credentials that you don't want to share with others!). In java, you'd want to exclude any .class files.

Here's the structure of our hangman game's .gitignore file so far:

# Ignore the location of the docker volume for the DB
/database/mongo_data/
.env
__pycache__/
.venv
.pytest_cache

Now whenever we work with git, it knows not to touch any of these elements. This means we can continue building and testing locally but none of the temporary files will be pushed to our git repository.

README.md

This might not seem like too big of a deal at first glance. "So what, your project has a README file." The important part of this is the '.md' at the end. That extension is used for markdown files, which provide easy-to-use formatting options. What's more, your README's markdown will be rendered in the GitHub or GitLab websites, meaning that you can build clean documentation directly into your codebase.

Branch Protections

If you have a need to keep things stable in production and don't want surprise deployments, you can add Branch Protections to specific branches. This will block anyone from editing the protected branch directly (with a push); instead, they'll be forced to use a Pull Request which provides an approval process for merging the code.

And now for our feature presentation… Actions

Storing your code is nifty. And being able to pick apart various commits to see what changed and when is extremely useful. But storing source code doesn’t get us closer to our desired state: IN USE. What would really be handy is a way to help our code progress through its development lifecycle… from “written” to “verified” to “deployed”. As programmers, we’re usually thinking about testing our code and tinkering with it to ensure we work out any bugs and such.

Git doesn’t do that

Git, however, is strictly a protocol for managing the storage of the code, though. It just doesn’t care what you’re going to do with it… as long as the files need to be tracked and organized, that’s the purpose of Git. This is where some of those “extras” really shine, however…

GitHUB and GitLAB both offer the ability to run automation around your code when it’s been committed and merged into their tools. Today we’re focusing on GitHUB’s feature set, because it’s where my account lives and I’m more familiar with it 🙃, but be aware you could do this kind of setup in both (and other SCM platforms too!).

The Anatomy of an Action

In GitHub, an Action is defined in YAML format in a workflow file. These are stored in your project alongside your code, but in a special folder at the project root that you create. To use GitHub Actions, you first create a .github folder in your project and then you add a ‘workflows’ folder inside it, like this:  

Image of .github/workflows

Any Actions workflows you write (and you can have lots of them if needed) are stored in workflows. You can name the individual yaml files whatever you like.

Building an Action to validate our tests

Here’s the real reason I wanted to wait until Episode 5 - because we have some test automation built for our frontend app now. But if you’re in a hurry and not thinking about the whole Software Development Lifecycle, you might forget to run those tests before you push your code up. It’s easy to get in a hurry and forget a step… but that could lead to broken code being pushed despite our Test-Driven approach! So what we really want is to let our GitHub enforce these test runs for us - to not allow us to push code into main unless the changes pass all our tests. Let’s see how to set this up:

# On specifies when this workflow will be executed by GitHub
on:
  # push sets the trigger when code is pushed, with conditions specified below
  push:
    # branches says we’re filtering by branch name
    branches:
      # These filters say to accept any branch except for ‘main’.  We do this because we want the tests to run when we push our code up and give us instant feedback, but we don’t want to re-run them when we PR/merge.
      - ‘*’
      - ‘*/*’
      - ‘**’
      - ‘!main’
    # Paths gives us the ability to state that we want the workflow to run ONLY when code in certain parts of the codebase are changed.  For example: if we change the database docker file, there’s no sense in unit testing our frontend code again!
    paths:
      # Run when anything ending in .py in the frontend folder changes… even nested items.
      - ‘hangman/**/*.py’
      # Also run if we make changes to the Action workflow file.
      - '.github/workflows/pytest.yml'

# Jobs are related sets of steps we can run on our codebase.  You might have “run unit tests” like we’re doing, but you can make more complex flows that integration-test our entire app - run the database container, the api container, and the frontend app all together and do whatever validation you like.
jobs:
  # This is a name we define for our job
  Frontend-unit-test:
    # runs-on lets us pick our platform
    runs-on: ubuntu-latest
    strategy:
      matrix:
        Python-version: [“3.12”]
    steps:
    - uses: actions/checkout@v4
    - name: 'Set up Python'
      uses: actions/setup-python@v5
      with:
          Python-version: "3.12"
    - name: 'Install dependencies'
      run: |
         pip install -r hangman/requirements.txt
         pip install pytest
    - name: 'Run pytest'
      run: |
           pytest

Running the Action

Simply including this file in .github/workflows isn't enough - the action file needs to be in your main branch in order to be executed. So first things first, you need to merge it to main.

Then, once the workflow appears in main, you can make edits to your code & tests, and see them run in GitHub:

Note that if your test run fails, you'll get an email to the mailbox associated with your GitHub account to let you know.

Wrapping up

There you have it - now we know how to manage the source of our application in GitHub, and we even built an Action script that will run our unit tests any time we commit new code. Now we get instant feedback on whether our tests still pass!

We could even add in a section under "on" that executes this action on every pull request, and prevents the merge until the tests are successful.

This might feel small and contrived in the context of a hangman video game, but this is a core principle for anyone doing continuous integration - you have to build these kinds of protections for your main branch so that you know it's always in a deployable state. Now that you have a hint of what GitHub Actions can do... what other guardrails might you build for your app?

This concludes another Adventure of Blink - I'm so glad you came by today, and I hope you'll join me again as we begin to integrate our application components - we'll connect our frontend application to the database through our API layer. Take care, and see you next time!

DEV Community