Carving the Path to Modularity: A Lobzik Tool Case Study on the ProtonMail Android App

Mikhail Levchenko — Sat, 01 Jul 2023 15:39:07 +0000

Premature and overdue modularisation problems

When you begin building your project, there may be a strong temptation to modularise it right from the start in order to save costs down the line. However, I believe that premature modularisation not only drains your resources but can also hinder the long-term success of your project. At the early stages, your product vision may not be fully formed, and it can undergo significant changes. The module boundaries that you establish initially can quickly become outdated as the project evolves through numerous iterations and achieves success.

However, as your team grows and the number of features starts to accumulate, the tech debt of modularisation can send a chill down your spine. You begin to notice an increase in git conflicts and heisenbugs caused by a lack of clear separation of concerns within your application. Eventually, you reach a point where you declare "enough is enough" and decide to modularise your monolith. But the question remains: where do you begin unraveling this tangled mess? You don't want to stop developers from pumping out new features for your project, so you need to pinpoint the most impactful areas that can be extracted with minimal effort. But how can you achieve this without spending a lot of time delving into the void your codebase had become?

Introducing Lobzik: The Modularisation Toolkit

Having been tasked with modularising the codebase of my work project, with a whopping 200kloc monolith, I embarked on a quest to find a way to reason about modularising this chonky boy. Being an enthusiast of graphs, I was interested in the network of dependencies within the monolith. Soon enough, I discovered that this network could serve as a good place for for community detection algorithms, which could reveal structures that looked like modules. After weeks of experimentation, I successfully devised a way to extract the dependency graph, carefully selected the most suitable community detection methods, and came up with the tricks to yield optimal results.

These insightful findings led to the birth of my pet-project: the Lobzik Gradle Plugin. Rather than relying on complex GUI graph toolkits like Gephi or spinning up Jupyter Notebooks filled with NetworkX Python code, you can effortlessly integrate my tool into your build pipeline. Lobzik provides guidance, pointing you towards the optimal path for modularising your project. However, this tool needs some knowledge to operate, so let this article serve as your guide to use this tool correctly.

Applying Lobzik to the ProtonMail Android App

For the reference project, I've chosen the ProtonMail Android App, which is one of the largest open-source Android apps that has not been modularised yet. With over 50kloc in the main module, it truly represents a monolith that is worth modularising.

cloc app/src/main --include-lang=Kotlin,Java

github.com/AlDanial/cloc v 1.96  T=0.39 s (2388.4 files/s, 249735.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Kotlin                         766           8867          16675          52105
Java                           156           2371           3381          13006
-------------------------------------------------------------------------------
SUM:                           922          11238          20056          65111
-------------------------------------------------------------------------------

Setting up Lobzik

To start using Lobzik, we need to apply the xyz.mishkun.lobzik plugin in the root build.gradle.kts file:

plugins {
   // ...
   id("xyz.mishkun.lobzik") version "0.6.0"
}

Then, we can set up the basic configuration in the same build.gradle.kts file as shown below:

lobzik {
    monolithModule.set(":app")
    packagePrefix.set("ch.protonmail.android")
    variantName.set("betaDebug")
}

Here, we set the name of our monolith module (notice the ":" in the module name!), the name of the variant we will be analyzing, and the package prefix of our classes. With this configuration, only the code in packages starting with ch.protonmail.android inside the :app module will be checked, using the betaDebug variant. This is crucial for our tool to work, because we don't want to deal with all of the library dependencies and standard kotlin library messing our dependency graph.

Running Lobzik for the first time

Now that we are all set up, we can run Lobzik for the first time using the command:

./gradlew lobzikReport

If everything was set up correctly, you will find build/reports/lobzik/analysis/report.html file in your project root. Now let's take a closer look at how to interpret this report.

Interpreting the Lobzik Report

Lobzik report consists of four sections:

Core Candidates
Monolith Modules Table
Module Graphs
Whole Graph

Monolith Modules Table

The first thing that catches our eye is the Monolith Modules Table. It lists all of the modules detected by Lobzik. They can be sorted by several metrics: coductance, cut and monolithCut.

The conductance score is the core metric of this part of the report, as it indicates the benefit-to-effort ratio of extracting modules. A lower score is preferable, with a score of 0 indicating that extracting the module requires virtually no effort since it has no dependencies on other modules.

The cut and monolithCut scores show us how many dependencies should be broken to successfully extract the module. It helps to refine the estimates on how much effort we need to extract this module.

The names of the modules are automatically generated from their classes using the TF-IDF method. Clicking on a module name will take us to the detailed report in the Module Graphs section.

Module Graphs

This section contains per-module detailed reports, each presenting three subsections:

Dependency graph of the module and its neighbourhood
List of all of classes belonging to this module
List of dependencies that need to be broken to extract this module

Whole Graph

This section at the bottom of the report represents the module dependency graph, which can help identify modules that are relatively easier to extract due to their fewer dependencies on the rest of the project.

The "Star" problem

A careful reader may notice that I have omitted the first section of the report, called Core Candidates. This section is collapsed under a spoiler, but it plays a crucial role in enhancing the report's effectiveness. To fully comprehend its value, let's explore what I refer to as the "star" problem.

Let's consider a scenario where we have a class called ListUtil.kt that contains various list utilities. This class is heavily used throughout our codebase, resulting in numerous connections to other nodes in the network. Due to the high degree of connections, our community detection algorithm of choice, the Louvain method, may mistakenly identify this class as the core of a large community. It's important to note that community detection algorithms were initially designed for social networks, where such hubs represent a significant community led by an outstanding individual.

However, for a codebase modularisation problem this class should be extracted to the core modules. By doing so, we can reveal a better modularisation path for the rest of the code, as depicted in the image above. To assist in visualizing the benefits of extracting such classes, Lobzik offers the ignoredClasses configuration parameter which accepts a list of regexes of class names that should be excluded from the analysis.

lobzik {
    // ...
    ignoredClasses.addAll("^ListUtils$")
}

But how we identify such classes, you may ask? These classes are commonly found in sections responsible for Dependency Injection (DI) and Navigation, as they serve as the glue that connects otherwise loosely coupled features code. It is a good choice to ignore your Application class and well-known utility classes too. But can we automatically identify more core classes if we have already eliminated the obvious ones? This is where the Core Candidates section of the report becomes valuable.

Core Candidates

The Core Candidates section presents a table that consists of the top 95 percentile of classes based on Degree or Authority metrics. Thoroughly reviewing this list can help identify the classes that should be excluded from the report. In the case of ProtonMail, the following classes might be considered for elimination:

lobzik {
    // ...
    ignoredClasses.addAll(
        ".*UserManager$",
        ".*Constants$",
        ".*ProtonMailApiManager$",
        ".*Util.*",
        ".*ProtonMailApplication$",
        ".*ResponseBody$",
        "Base.*",
        ".*Module",
        "^Message$",
        "^User$",
        "^ProtonMailApi$"
    )
}

By eliminating these classes from the report, we can improve our algorithm's performance, measured by the modularity score, going from 0.586 to 0.684. A great improvement! Now we can use Lobzik report to start extracting each of detected 24 modules one by one.

Conclusion

You can find a fork of ProtonMail client with integrated Lobzik on my github. I hope you will enjoy using Lobzik for modularising you codebase. I encourage you to try it and don't hesitate to submit any issues to the project's github

DEV Community: Mikhail Levchenko