amu

Posted on Jul 5, 2023

How to Study Large Code Bases?

#programming #tutorial #learning #softwareengineering

I remember 5-6 years back, I was working on a large code base for the first time. The project had a short deadline and our task was to literally refactor the whole repository. I freaked out the first week and messaged our tech lead to ask for advice on how to tackle this code! I was SO STRESSED.
Starting from then, throughout the years, I’ve tried different ways to face these initially-intimidating code swarms! I am sure they will evolve further over time, but I want to share them now, maybe it will be useful to somebody. Frankly, now I enjoy large, challenging code bases A LOT.

The fundamental statement is this: for a project that has been worked on by a team of devs for hundreds of thousands of hours, you just can’t learn the whole code base all at once. So we have to find ways to do it bit by bit, just like the concept of divide and conquer algorithms.

Here are a bunch of ideas I have used, that you can mix and match to your liking:

Following the Footsteps

A huge code base has grown over time. You face the end product, but there’s a history to it. The best way to go about this is using the version control and seeing the change history in the most granular sense possible. If the version control is healthy, following the micro-changes won’t be intimidating. Moreover, since the original authors are attached to the change, you can ask them questions too.

Do Small Tasks

Most likely, when the code is huge, you are not expected to know all of it. You specialize in parts of it and increase your domain of knowledge gradually. So for step 0, a good way would be choosing a small task in isolated components of the system (as opposed to a task that involves knowledge of multiple parts).
In a commercial production environment and company, this happens naturally when you join a team. For an open-source project, this is close to “low-hanging fruit” issues which are basically problems that are beginner-friendly.

Ask Specific Edge Case Questions. Be edgy.

Asking tricky questions is truly a skill. As you practice, it becomes really enjoyable to use your x-ray vision for edge cases. The edgier the question, the more knowledge gained.

Example: Let’s say you want to learn about a class in the code.

Here is the “vanilla” question:

What are the attributes and the methods?

Here’s the edgy version:

What is the public interface?
How frequent and in what situations the interface is used?
What are the classes that this class is connected with? (association/inheritance/aggregation/any connection)

Here’s an even edgier version:

What’s the runtime flow of this object?
- Who creates this object?
- When is it created?
- How long is it alive?
- When does it die? We can get even edgier:
With my current knowledge, what criticisms do I have about the class? How would I make it better?
- Is the public interface / private things properly laid out?
  - Are they cohesive enough?
  - Are they coupled to outside reasonably?
- Is the abstraction okay?
  - Does this class have things that can be moved to the higher abstractions? (like move things to the base class or other interfaces?)
  - Should I break down this class to smaller objects? Like, does it obey the single responsibility ideas?
- How would I improve the namings of attributes and methods within the class?

(Note that, you probably won't literally refactor the class. But when you have the vision of improvement, your mind dissects the current state of the class pretty well, and then provides even further ideas on how to make it even better.)

Draw Diagrams

Diagrams are underrated. It’s shocking to me how under-used it is when demonstrating how a code works on the internet. The standard diagrams in the software field follow the guidelines of a language referred to as UML. UML provides a bunch of diagrams, each being applicable to different situations: sequence diagram, class diagram, use-case diagram, deployment diagram, collaboration diagram, and many more. If you want to learn about UML, Martin Fowler has a great short book on it called UML Distilled. (His website/ Amazon)
You don’t have to use UML. You can use any visual way that comes to your mind. UML is just a standard language that engineers know all around the world.
As for the medium, you can use paper, but it’s not a very modification-friendly choice. I personally like to use draw.io, an open-source, free, online diagram platform. There are a bunch of paid ones available too.

Use Breakpoints

If you want to dive into the runtime flow of the code, breakpoints are a huge help. It all depends on your IDE, your domain of software (web, desktop, etc), but the concept is the same. The most primitive way is to use logs. But oftentimes, breakpoints give you much more information on the call stack, variable values, etc.

Explain the Code to Others

This is a huge, huge help. This method is scientifically researched a lot and is proven to be highly effective (Google Scholar).

Edge question: to who?

To the authors of the code itself, if they’re on your team.
- They can correct you if you’re wrong, because they literally wrote the code.
- They can provide the perspective on why they coded the way they did in the first place. The knowledge of the context helps a lot. For example, the code may be dirty, but the context might be that they didn’t have enough time. So it’s natural for you to not understand the code easily.
If you don’t have anyone around, you can talk about the code loudly, as if you are recording a video. Even better if you can have a friend to listen to the code, maybe they gain something out of it too.
I use this a lot: Intentionally explain the code to someone who has no idea about the project and ask them how much they understand your lecture. This provides insights in 2 ways:
- Can you improve the way you communicate technically?
- Did you understand the code properly in the first place?

Mix and Match

You can mix and match the ideas in any way you can imagine. Like, you can design some really hard questions then, as for the answer, draw specific diagrams and then present them to a friend.

That’s it for now

I will probably add more to this article later. I hope it helps someone out there, because I personally really struggled a lot in the beginning.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

How to Study Large Code Bases?

Following the Footsteps

Do Small Tasks

Ask Specific Edge Case Questions. Be edgy.

Draw Diagrams

Use Breakpoints

Explain the Code to Others

Mix and Match

That’s it for now

Top comments (0)

Read next

Reviving Defense Technology: Silicon Valley's Next Chapter

Understanding Large Language Models: From Training to Real-World Use

What Usain Bolt, Leibniz and Newton Have in Common?

Daily JavaScript Challenge #JS-72: Count the Frequency of Every Unique Element in an Array