loading...

How we use "ship small" to rapidly build new features at GitHub

mscccc profile image Mike Coutermarsh ・6 min read

"Ship small! fast!" and "ship to learn!"

You've seen that shouted everywhere on Twitter. But what does it actually look like in practice?

I'm on the GitHub Actions team and we "ship small" every day to build new features quickly. Personally, this is my favorite way of building software. In this post I'll show you how we do it.

Let's build a new feature

In this post we will pretend we are building a new feature for a web app. It has a complex UI + backend and will take a small team roughly 6 weeks to build.

Project planning, define the minimal scope

To start, the lead engineer, product manager and eng manager collaborate to define minimal scope. We determine what parts are a MUST for this new feature to be valuable to users.

The lead engineer then does a break down of tasks and prioritizes them on a project board. Their job here isn't to be the person that figures out every technical detail. Their job is to find enough info to break the project up into small enough pieces people can get started with.

During planning we aren't spending hours hammering out small details. Perfect upfront planning can be the enemy of shipping. We know that as long as we have the minimal scope defined, we can always iterate and improve from there. Software is never complete.

There will still be things to figure out, but we don't need to wait to get started.

Start writing code

Getting started as soon as possible is important. I've never seen any software project go exactly to plan, as soon as code starts to be written, we know we will uncover new information that may change our approach.

We need to be flexible and roll with it as we learn more. As individuals find problems, they document them in new issues and it gets added to the project plan.

The first pull request: the boilerplate

Since this feature has a UI. One engineer will do the boilerplate setup for it. This is a blank page, that has a URL and is behind a feature flag. It's essentially a "hello, world" for the feature.

This should be a very quick task. They will open the pull request, tag everyone else working on the feature, and we get it shipped to production. Everyone on the team gets added to the feature flag, they can now see the blank page in production.

I cannot emphasize enough how important this first pull request is.

Once it's merged we've now made it much easier for others to build on top of it without running into each other. If your goal is to "ship small", I believe it's critical to start the project off that way as well. The first change is the example set for the changes to come after it. It's now culturally accepted to be shipping things to production
that aren't complete yet.

We can now have parallel work streams:

  • 1 engineer can start working on the backend that feeds data to this page.
  • Another engineer can start working on the frontend.
  • The designer works on the UI. If they can code, they jump in and write markup + CSS.

We now have everyone able to contribute and we're able to ship partially completed work to production confidently (remember, it's behind a flag).

As small pieces of work are completed, everyone is opening pull requests, getting reviews from each other and merging to production. The shared context of the project is high.

If we need reviews from other teams (security for example), these small PR's are fairly low effort to review for outside teams. Making it likely we get feedback faster.

This is important because:

  • it allows people to work async (we are remote/different timezones).
  • each pull request is small and reviewed by everyone else. Making them easy to review and keeping everyone aware of progress/context
  • Project stakeholders can get feature flagged in and view production to see progress (and give feedback!)

Never get blocked, start ASAP

Often when building software we'll find ourselves in a situation where we believe "we need X from other person" before we can start. I've never found this to be 100% true.
I have seen software projects be stuck in a holding pattern where engineers are waiting on each other (this can burn days, or even weeks of time). We can generally always start something.

For example, maybe you need data from the API before starting the frontend. Stick it in a JSON file and fake the API. You're then unblocked and can get started. You'll then learn things about the data you need and will be able to provide this information to the developer who is working on the API. This will make their job easier and probably get you the API endpoint faster and exactly how you need it.

Production code is the best code

There's a reason we want to get changes into production as fast as possible.

When code is successfully merged and deployed to production, we've now eliminated a bunch of uncertainties about it. We know it passes CI, we know it won't bring the site down.

Even though it's behind a flag, there's always the possibility of something going wrong. Getting it into production increases our trust of it and makes it easier for the team
to build on it moving forward. Confidence is a huge driver of momentum. We need both if we're going to be able to build this feature quickly.

Feedback loop

As this project progresses, we are seeing changes in production daily. Once the feature is somewhat working, we have a good opportunity to start inviting more people into the process.
I like to share with people from security, documentation, support, sales, biz dev etc. Add them to the feature flag and ask them for feedback.

Remember how we started with a very basic project plan? At this point we have feedback coming in, we turn that feedback into new issues (tasks on the project board) and it feeds right
back into the development process.

We are now ITERATING on the feature and making it better. Iterating is easy for the team because we've been making small changes all along. People have a shared context
of how everything works, so contributing is easy. We have confidence because the majority of the code is in production.

Personally this is my favorite phase of every project. If things were setup correctly from the start, there should be a lot of momentum driving everything forward at this point.

This process of feedback -> code -> ship continues to loop until we have something we can ship to real users.

Shipping to real users

Once the team has accomplished the minimal scope, it's time to start getting this new feature out to real users.
The high level process we follow is "Staff ship" -> get feedback -> "General availability". There can be a lot of other details here (documentation, capacity, security, abuse) that depend on the feature.

The "Staff ship" is when we make the feature available to every employee and ask for feedback. Our company is large enough that this works really well for uncovering
potential problems. During this time we also watch instrumentation more closely for potential errors/performance problems. The ways people use our product is highly
variable and we can't anticipate everything.

"General availability" is when we make the feature available to everyone. We have tools available to let us roll it out slowly to a % of users. As we do this we continue to
monitor things closely. If we see serious problems we can always turn it off, fix the problem and then continue the roll out.

Principles

This was a pretty generalized view of how we work and the details will vary from project to project. But I think I can distill it down to a few important points.

  • Start by defining the MVP. Review and cut scope until it truly feels minimal.
  • Software is never done. We are OK with shipping things that aren't perfect. We can always ship another PR to improve it.
  • Always be unblocking. Make it easy for people to be contributing at all times (even if alone in a separate timezone).
  • The project plan starts minimal and evolves as we ship code. Invite others early to give feedback, incorporate that into the plan as the project progresses.

Some important points

  • you need to be able to deploy and rollback quickly (if you only deploy a couple times a week, this won't work well)
  • you need feature flags to gate code and ship only to specific users
  • this is a general example of how teams I've been on work (GitHub is a big company and individual teams do what works best for them)
  • I'm an engineer, this is an engineering centric view of the process

Discussion

pic
Editor guide
Collapse
emmanuelnk profile image
Emmanuel N Kyeyune

Great post! I have a question though that maybe wasn't addressed clearly. When you say 'ship small' I take it to also mean the ability to break the feature down into smaller logical parts and make small pull requests that can get reviewed faster and merged into production.

I constantly run into this situation where I sometimes receive/make massive pull requests because, well, all the code written was needed to make the feature functional (or makes logical sense together).

Is this a situation you encounter and if so how do you prevent it? Do you manage to keep all your PRs small enough to make the review process better for everyone?

Collapse
mscccc profile image
Mike Coutermarsh Author

Is this a situation you encounter and if so how do you prevent it? Do you manage to keep all your PRs small enough to make the review process better for everyone?

Definitely! Happens all the time and I still do this occasionally myself.

I have a few ideas around this.

  • Having feature flags helps out with this a lot. Because then the feature doesn't have to be functional to ship it.
  • A lot of this is cultural, the team has to be OK with merging things that are incomplete (for PR's like this, I recommend including an "up next" section in the description that lists out the plans to make it functioning)
  • If you find you have a huge branch. Sometimes I'll create a new branch, and then cherry-pick in specific files (just the backend for example). Put that in a PR to get it reviewed. Once it's merged, then I'll create another branch and cherry-pick in the frontend.

Also - sometimes making the PR small is more effort than it's worth. So sometimes it's necessary to have people review a large PR. I def try to avoid it though if possible.

Collapse
miku86 profile image
miku86

Hey Mike, thanks for your great post.

What's small for you? Do we talk about a PR per day? Per 3 days? 1 Week? Or is it more about the amount of code?

Sometimes there is a lot of code to write for a very small thing.
Any guidelines on this?

Thread Thread
mscccc profile image
Mike Coutermarsh Author

Hey!

For me, it's generally a day or two of work for a PR. I like to think about the person who is going to be reviewing it. Will it be easy for them?

Using feature flags and being OK with shipping partially working features helps make everything smaller.

I think it's OK to ship an empty page. Or a page that has static fake data, that needs to be connected to a database later. As long as it's all behind a flag and users don't see it, it helps the development process move faster.

Thread Thread
miku86 profile image
miku86

Thanks a lot,
I will keep that in mind!

Collapse
kpollich profile image
Kyle Pollich

Thanks for this, Mike! Really enjoyed this post. I have one question about the "parallel work streams" you mentioned.

  • 1 engineer can start working on the backend that feeds data to this page.
  • Another engineer can start working on the frontend.
  • The designer works on the UI. If they can code, they jump in and write markup + CSS.

What does the frontend work look like while the UI design is still being worked on? Is this like a "wiring up" of the initial boilerplate/wireframe that gets put up?

Thanks again!

Collapse
mscccc profile image
Mike Coutermarsh Author

❤️✨

Depends on how design heavy the feature is. Usually we have low fidelity sketches done during planning. Then we have our own design system, Primer, that we can copy/paste components from to build out the page. We can get really far with that alone.

This page for example: github.com/actions/starter-workflo...

I remember when we started it, we had some rough design ideas. We were confident we'd be showing "boxes" with workflows in them. So one engineer got started by passing an array of fake data to the frontend, and then looping over it to render a partial for each item in the array.

As this was happening, the designer was iterating on what the final page should look like. Once they had that, we were able to go in and adjust that partial/layout to look how the designer wanted.

My recommendation for people would be to always start on the piece you feel you have enough information to be confident in implementing (even if it looks terrible first pass). Just getting fake data on the page can sometimes be more work than we anticipate, so getting that out of the way and working is usually a nice win and saves time later.

Collapse
brpaz profile image
Bruno Paz

Great article!

What about improvements to existing features? Do you also use feature flags the same way?

And regarding Pull requests, if you work incrementally doing very small changes, while each PR might be much easier to digest, there will be many of them per day.

Can you talk a bit about how is your code review process is organized so that the amount of PRs waiting for review doesn't become a bottleneck?

Thank you! ;)

Collapse
mscccc profile image
Mike Coutermarsh Author

What about improvements to existing features? Do you also use feature flags the same way?

Yes, we often do. It really depends on the risk and is a judgement call for the team.

A recent example, we added this "filter bar" to the Actions tab.

We feature flagged this one because it can generate complex queries and the risk is it could cause performance problems when used against huge data sets. We also wanted to be able to flag in specific people and have them test the functionality before rolling out to everyone.

Can you talk a bit about how is your code review process is organized so that the amount of PRs waiting for review doesn't become a bottleneck?

Yup! A tip I learned from @andreasklinger a few years ago. When you work on a remote team you need everyone to dedicate time to doing code reviews. Unblocking each other is a priority, so most people will dedicate time in the AM (or whenever works for them) each day and they'll go through all the PRs and review them.

I find that when starting these projects, usually the first few PRs are a little more high effort to review because we're setting the foundation for the feature. But after that, things are small incremental improvements to the base. Since people are familiar with the area of code, they get pretty quick to review.

We rarely see anything go unreviewed for more than 24 hours. It helps that we are all on the same project and are motivated to get it done. If someone started a PR, but is now outside of their working ours, we'll often have someone else push commits to it to finish it up and ship it. Seeing multiple peoples commits on a single PR is pretty common.

One more thought on being "unblocking". We almost never block a PR unless we see something in the code that would cause a huge problem. We'll approve it, and leave "non-blocking notes" for things to improve. Then it's up to the submitter to take that feedback and apply it if they want. This reduces the amount of back and forth (critically important when in different timezones).

Collapse
brpaz profile image
Bruno Paz

Thanks for the response. Perfectly clear ;)

Just one more question about feature flags.

How do you manage the cleanup process of older feature flags that are no longer needed, for example after the feature is generally available in production?

Thread Thread
mscccc profile image
Mike Coutermarsh Author

We're not great at that :). We often leave them in code for a couple months. Then we file issues to remove them and people will pick those up when they're looking for a quick/easy task to do.

Collapse
mgohin profile image
Maelig

Thanks for your sharing, it gives me hope that one day, I'll work in teams like that, maybe in 20 years :)

Got a question about production visibility, can you describe (or maybe you did talk about it elsewhere, or will) how works your "feature flags", because it's the main advantage you have to push code to production without giving access to "real" users.

Thanks ;)

Collapse
mscccc profile image
Mike Coutermarsh Author

:)

We use Flipper + some custom UI built internally. It works for Ruby apps.

If you're not using Ruby, I recommend looking at launchdarkly.com.

Collapse
tonivdv profile image
Toni Van de Voorde

Great post and this is exactly how we are working at Adsdaq too. Since we introduced feature flags we not only can deliver/deploy fast but we completely eliminated our "long-living" branches which are a real pain.

There is one thing not covered here though which is very important to know when starting this direction and that is "backwards compatibility" ... Sometimes you need to change an existing feature which impacts something either in the api or the database. Since you are deploying and hiding with a feature flag, changes to those things must make sure the existing stuff keeps working! This can sometimes be very challenging. But not doing it immediately breaks the "deploy quickly and often" paradigm.

Collapse
dabeeeenster profile image
Ben Rometsch

How are you guys managing your feature flags? We Open Sourced our Feature Flag platform Bullet Train last year and would love any feedback you have.

Do you think its better to roll your own feature flag system or rely on a third party?

Collapse
mscccc profile image
Mike Coutermarsh Author

Haven't heard of Bullet Train, looks nice!

We use Flipper, which is open source: github.com/jnunemaker/flipper. We have a custom UI on top of it for administration.

I'd always use something open source or paid. Building one doesn't seem like a good use of time when there are so many great OSS ones out there.

Collapse
yo profile image
Yo

I found Unleash - Open source enterprise ready feature toggle service

unleash.github.io

Collapse
serhiiohorodnyk profile image
Serhii Ohorodnyk

Really good post! Thank you for your effort.
I have a question regarding feature flags in the backend. What if your new feature requires relatively big or significant database migrations. How does that impact the process? Do you maintain two data structures (old and new) till the feature is done?

Collapse
mscccc profile image
Mike Coutermarsh Author

That really depends on the migration + feature.

We can never take downtime at GitHub. So when making any DB change, it has to work with both the old and new code.

A recent example, we just had a feature that required us to move data to a new table. The process was like this.

  • Create the new table (no code changes)
  • Change writes to both the old and new table
  • Update the new feature to read from the new table (feature is behind a flag, so OK if data is incomplete)
  • Run backfill, transferring old data to the new table (slowly over a week)
  • Change old feature to read from the new table
Collapse
abdurrahmaanj profile image