Priyanshu Verma

Posted on Jun 19

How to Read Open Source Code, Python Edition

#programming #opensource #learning #python

Hi,

Today I was reading an open-source project that I found interesting, and it made me realize that reading a project is not easy. There can be multiple entry points and exits. Some files may exist for build tooling, testing, or configuration and may not contain information that is immediately useful when trying to understand how the project works.

As a beginner in open source, it is important to know how to read and understand a project before contributing. This helps you build a mental model of the codebase, understand how components interact, and identify which files need to be modified when adding a feature or fixing a bug.

Let's start with the basics.

This time we are going to read a Python package. Python projects are generally easier to navigate because they often have a straightforward and descriptive file structure.

When opening a Python repository, the first thing I look for is the root-level files:

project/
├── README.md
├── pyproject.toml
├── requirements.txt
├── LICENSE
├── tests/
└── some_lib/

The README is often the best place to start. It usually provides an overview of the problem the project is trying to solve, how the library or application is intended to be used, and the main features it offers.

Most maintainers include usage examples, and these examples often reveal the public API of the package and the common workflows users are expected to follow.

In many projects, you may also find additional documentation such as contribution guides, development setup instructions, or architecture documents. Spending a few minutes reading these files can provide valuable context and make the source code much easier to understand before diving into the implementation.

Find the Package Directory

The package directory usually has the same name as the library:

requests/
numpy/
fastapi/
your_package/

This folder contains the actual source code.
If you don't see any folder like these and there is __init__.py file in root directory that mean you are already in that folder

Open `init.py`

One of the most useful files for understanding a Python package is __init__.py.
For example:

from .client import Client
from .models import User
from .connection import InternalConnection

__all__ = ["Client", "User"]

This file often acts as the package's public interface. It tells us which classes and functions the author expects users to interact with.
From this single file we can already discover important modules:

client.py
models.py __all__ keyword is a list that tells Python which names should be exported when someone uses:

from package import *

In this example it will export only Client and User it does not import any internal functions.
Without __all__, Python imports all names that don't start with _. A common misconception is that __all__ makes things private. It does not. Even if it is exporting only Client and User you can still access some other class or function like InternalConnection

When reading real libraries, think of __all__ as the package author saying:

"These are the names I officially support. Everything else is internal implementation detail."

Follow the Public API

Instead of reading every file, start with the objects exposed in __init__.py.
For example, if you see:

from .client import Client

go to client.py and inspect the Client class.

Look for:

Constructor (__init__)
Public methods
External dependencies
Internal modules it imports

This often leads you through the project's main execution path.
Sometimes you will see * in __init__ or any function args. If you know python well you will understand what that mean but as this covers beginner let me explain.
There are 3 types in which you can find * used:

using only * like this example:

def greet(name, *, age):
    print(name, age)

In this case we can call this function as greet("Priyanshu", age=18) but not like this greet("Priyanshu", age=18) what that mean * by itself marks that args after that will start as keyword-only arguments. This makes code a lot more readable and bug free.
That is why when we use some big libraries we use like this.

In this case you will find it like

def func(*args):

Here * means collect extra positional arguments into a tuple. This is used where function don't know how may arguments can be passed so using *args it collects all into a tuple. You can find it used like this too

nums = [1, 2, 3]  
add(*nums) # calls like add(1, 2, 3)

Here * spreads the list into tuple like args giving some convenience.

Last is like this example

def func(**kwargs):

In this case ** mean key value pair arguments that mean a dictionary can be passed in the way *arg is for tuple.
and it can be called similarly but with dictionary items spreading.

kwargs = {  
"name": "Priyanshu",  
"age": 18  
}
print(**kwargs)
# or like this 
print(name="Priyanshu", age=18)

both ways are valid and common to use.

Following Imports

After understanding the public API, the next thing I usually do is follow imports.
Almost every file imports something from somewhere else. These imports are like roads connecting different parts of the project. If you can follow these roads, you can slowly understand how the whole system works.

For example, suppose you open client.py and see something like this:

from .database import Database
from .embeddings import Embedder

This immediately tells us that the Client class probably depends on a Database and an Embedder. We don't know exactly what they do yet, but now we have some direction. Instead of reading random files, we can follow these imports and see where they lead.

As you continue doing this, a picture starts forming in your head. You begin to understand which files are responsible for storing data, which files handle business logic, and which files simply provide utility functions.

One thing I learned while reading open-source projects is that codebases are not just collections of files. They are graphs. Every import creates a connection between two parts of the system. The more connections you follow, the clearer the architecture becomes.

Understanding Object Creation

At some point you will encounter a class that creates other objects inside its constructor.

For example:

class Client:
    def __init__(self):
        self.db = Database()

When I see code like this, I immediately open database.py.

The reason is simple. The Client class is telling us that it depends on a Database object. If we want to understand how the Client works, we also need to understand what the Database is doing.

This process repeats again and again. One file leads to another file, which leads to another file. Gradually you start discovering the major components of the project and how they communicate with each other.

Finding Where Execution Starts

A common question beginners have is: "Where does the project actually start?"
The answer depends on the project, but every application has an entry point somewhere.
Sometimes it is a main.py file. Sometimes it is a CLI command. Sometimes it is a web server startup script.

You may come across something like this:

def main():
    app.run()

if __name__ == "__main__":
    main()

This is often a good place to begin tracing execution because it shows what happens when the program starts.

Libraries are slightly different. Instead of looking for a main function, you usually start from the public API exposed through __init__.py and follow the path from there.

Following a Single Feature

When I first started reading open-source projects, I tried to understand everything at once. That approach never worked.
A better approach is to pick a single feature and follow it from beginning to end.
Suppose the package exposes a search function:

client.search("python")

Instead of reading every file in the repository, focus on tracing a single feature from start to finish.

In this example, start by locating the implementation of the search method and follow its execution flow. As you move through the code, observe which functions are called, how data moves between components, whether a database is involved, if embeddings are generated, and how the results are processed before being returned.

Following a single execution path naturally reveals the most important parts of the project and how they work together. This approach is much easier to manage than trying to understand the entire codebase at once.

Reading Configuration Files

Once I have a rough understanding of the source code, I usually go back and look at configuration files.
One of the most useful files in modern Python projects is pyproject.toml.
At first glance it may look like a boring configuration file, but it often contains a lot of useful information.
It can tell you the package name, version, dependencies, build system, and sometimes even command line entry points.
If you see dependencies such as FastAPI, SQLAlchemy, or Pydantic, you can already make educated guesses about the architecture of the project before reading more code.

These clues become surprisingly useful when navigating larger repositories.

Do Not Ignore Tests

For a long time I completely ignored test files because I thought they were only useful for maintainers.
That was a mistake.
Tests are often some of the easiest files to read in a project because they show how the author expects the library to be used.
A complex implementation might take hundreds of lines to understand, but a test can often explain the same behavior in ten lines.
Many times I have understood a feature faster by reading its tests than by reading the implementation itself.

So whenever you feel lost, open the test directory and see how the library is being used there.

Using Your Editor as a Navigation Tool

Modern editors make reading code significantly easier.

When you see a function call, use "Go to Definition". When you see a class name, jump to its implementation. When you want to know where something is used, use "Find References".
These features save an enormous amount of time because you do not have to manually search through dozens of files.

Over time, navigating a codebase becomes less about scrolling through folders and more about jumping directly between related pieces of code.

Final Thoughts

Reading code is a skill that improves through repetition.
The first few projects you read will probably feel confusing. There will be unfamiliar files, unfamiliar patterns, and many moments where you have no idea what is happening.

That is completely normal.

The goal is not to understand every line of code. The goal is to build enough context to understand how the project works, how data moves through the system, and which files are responsible for a particular feature.

Once you reach that point, the project starts feeling much smaller. Instead of seeing hundreds of unrelated files, you begin to see a set of connected components working together to solve a problem.

Reading code also makes you a much better developer. Every project teaches you something new. Sometimes you discover a cleaner way to structure code. Sometimes you learn a pattern or technique you had never seen before. Other times you may realize that a certain implementation can be improved and start thinking about alternative approaches yourself.

This is one of the biggest benefits of reading open-source projects. You are not only learning how a specific project works, but also how experienced developers think, organize code, and solve problems.

And that is usually the point where contributing stops feeling intimidating and starts becoming enjoyable. Instead of just consuming code, you start learning from it, questioning it, and eventually improving it.

AND remember this is Priyanshu Verma