DEV Community

Ilya Vorobiev
Ilya Vorobiev

Posted on

How to calculate directory size on Android?

This is the second part in the series of articles describing how to calculate the size of an uncompressed app on disk and report it. This part covers the implementation of various components, mentioned in the first part.

Schema of the report and Transport

I will stick to a simple structure: the report will be a map, where a path will be the key and its size will be the value. The map will be flat, meaning that path to a parent directory and paths to all its children will just be different keys of the same map.

It means that transport will take the map of paths and sizes as input and will log it wherever we need it. So the interface for transport will look like this:

interface AppSizeTransport {
   suspend fun sendAppSizeCollection(appSizeCollection: Map<String, Long>): Result<Unit>
}
Enter fullscreen mode Exit fullscreen mode

Note: I use suspend function since transport will most likely report the metrics to backend, and hence needs to do that asynchronously. For that same reason, it returns results in the end.

If the report fails, the system should retry the attempt, however in that case we may need to run AppSizeCollector again, as the state of the filesystem may change.

Configuration of the AppSizeCollector

Filesystem is not under our control, which means there may be thousands of files created. If we are going to send this information to the backend from hundreds of thousands of devices, we may need to pay a lot for storage. That’s why the system needs a way to control how much data it sends.

There are a few metrics that we 100% need:

  • Total size of the app
  • Size of each top-level directory (e.g. caches, files, etc) to understand what impacts size the most

The number of the top-level directories is limited and fixed (apart from the data dir contents, but let’s hope it is not used actively in your app) so it is safe to report those. Deeper in the file tree the number of files is almost unbound and a limit on how deep to report must be introduced.

Sometimes a particular path needs to be investigated, or maybe there is an important directory somewhere deep, that must be monitored. For that, a list of exceptions from the limit is needed.

With all these requirements the configuration of the system should look something like that:

data class AppSizeCollectorConfig(
   val importantDirsLimit: Map<String, Int>, // List of paths that must be reported, and how deep
   val globalLimit:Int = 3, // General limit on how deep we will report
)
Enter fullscreen mode Exit fullscreen mode

Calculating application size

The main component of the system is the AppSizeCollector, which will scan the disk, calculate the size of the app and then pass the result to the transport.

Given that, the dependencies of the class are quite clear: we need a context instance to resolve top-level paths and the transport object. I will also add CoroutineDispatcher to the constructor in order to provide concurrent IO operation and make the overall class Main-safe.

class AppSizeCollector(
   private val context: Context,
   private val transport: AppSizeTransport,
   private val dispatcher: CoroutineDispatcher = Dispatchers.IO
)
Enter fullscreen mode Exit fullscreen mode

The main method of this class should get the config as an input and return the result of the operation. For now, I will not add specific errors support and will only tell if the operation was successful or not, however better type in generic will improve the system’s error handling and debuggability.

suspend fun traverseAndCollect(config: AppSizeCollectorConfig): Result<Unit>
Enter fullscreen mode Exit fullscreen mode

As was described previously, there are 6 top-level directories that we are interested in, however, it is a bit more complex. There may be multiple different external directories associated with the app. Furthermore, the internal data dir, apart from the rest, contains the internal cache and internal files directories. The full set of top-level directories that we would want to traverse will look like that:

val dirsToTraverse = (context.externalCacheDirs.filterNotNull() +
       context.getExternalFilesDirs(null).filterNotNull() +
       context.externalMediaDirs.filterNotNull() +
       context.filesDir +
       context.cacheDir +
       (context.filesDir.parentFile?.listFiles() ?: emptyArray<File>())
       ).toSet()
Enter fullscreen mode Exit fullscreen mode

Once we have a list of the directories we will need to iterate through them and calculate the size, however, there might be a few caveats.

Calculating the size of files and directories

If your app supports the API level 26 and higher the task is much easier, you can use Files.walk. But if you have to support older versions, there are a few edge cases that must be accounted for.

The main difficulty comes from the fact that the calling length() method on a directory will not return the total size of its contents. This forces us to recursively traverse the file tree and calculate the size.

While we are iterating recursively we want to avoid scanning symbolic links, as this may impose a risk of stucking in the loop or calculating the same directory twice. Instead, we will detect if the current file is a symlink and calculate the size of the link itself, instead of its contents.

To detect if the file is a symlink, I will use this utility function:

private fun File.isSymLink(): Boolean {
   val canon: File = if (parent == null) {
       this
   } else {
       val canonDir: File = parentFile.canonicalFile
       File(canonDir, name)
   }
   return canon.canonicalFile != canon.absoluteFile
}
Enter fullscreen mode Exit fullscreen mode

It recalculates the file path from the canonical path of its parent (which is the path when all the symlinks are resolved) and compares it with the current path.

If the current file that we are scanning is neither a symlink nor a directory, this is the end of our recursion and we need to return its size and add it to the hashmap, which we will then pass to our transport.

var fileSize = dir.getSizeOnDisk()
if (dir.isFile) {
   // Check if current level of the recursion is within the limit defined in the config
   if (currentLevel < limit) {
       dirSizes[dir.path] = fileSize
   }
   return fileSize
}
Enter fullscreen mode Exit fullscreen mode

currentLevel and limit will be passed as arguments to the recursive function, and we will talk a bit later about its signature. You might have noticed a weird function getSizeOnDisk which is not a part of the Java File API. What does it do?

How much disk space does the file use?

Filesystems usually don’t operate with bytes directly, but write and read in small chunks of information called blocks. Block has a fixed size and no file in the filesystem can be less than the block size. So if your filesystem has a block size of 4KB, even a file with a size of 1B will take 4KB of actual space. Note that call to file.length() will return 1 in this case, since it is the real length of a file. That is also a reason why empty directory may use 4KB of space since it is the minimal amount of bytes required to store the directory entry. So since we are interested in the disk space, we can’t use file.length() to calculate the size.

Android provides different ways to find the block size: you may look at StatFs class, for example. I will use Os.lstat which is equal to linux lstat command. Call to lstat will give me the block size and I will be able to derive how much disk space will the certain file take

private fun File.getSizeOnDisk(): Long {
   val lstatData = Os.lstat(canonicalPath)
   return (ceil(lstatData.st_size.toDouble() / lstatData.st_blksize) * lstatData.st_blksize).roundToLong()
}
Enter fullscreen mode Exit fullscreen mode

Recursion for directories

Now we are able to get the size of the file and stop recursion if we encountered a symlink. The last part is to calculate the size of the directory, which should be trivial now. However, we haven’t yet defined the signature of the recursive function, so let’s do it!

private suspend fun calculateDirSize(
   dir: File,
   currentLevel: Int,
   limit: Int,
   dirSizes: HashMap<String, Long>,
   config: AppSizeCollectorConfig
): Long
Enter fullscreen mode Exit fullscreen mode

Where dir is the current file to look at, currentLevel is the current depth of the recursion, and limit should verify whether we need to report the dir or use it only to calculate towards the top level directory size, dirSizes is the map that we are going to pass to the transport and config is needed to access exceptional paths.

Here is the chunk responsible for directory traversal:

dir.listFiles()?.forEach { file ->
   val nextLimit = config.importantDirsLimit[dir.path] ?: config.globalLimit
   fileSize += calculateDirSize(file, currentLevel + 1, nextLimit, dirSizes, config)
}
if (currentLevel < limit) {
   dirSizes[dir.path] = fileSize
}
return fileSize
Enter fullscreen mode Exit fullscreen mode

Now as this introduces blocking IO and I defined the function as suspend I must make it Main-safe, so the resulting function will look like this:

private suspend fun calculateDirSize(
   dir: File,
   currentLevel: Int,
   limit: Int,
   dirSizes: HashMap<String, Long>,
   config: AppSizeCollectorConfig
): Long = withContext(dispatcher) {
   if (dir.isSymLink()) {
       return@withContext dir.getSizeOnDisk()
   }
   var fileSize = dir.getSizeOnDisk()
   if (dir.isFile) {
       // Check if current level of the recursion is within the limit defined in the config
       if (currentLevel < limit) {
           dirSizes[dir.path] = fileSize
       }
       return@withContext fileSize
   }
   dir.listFiles()?.forEach { file ->
       val nextLimit = config.importantDirsLimit[dir.path] ?: config.globalLimit
       fileSize += calculateDirSize(file, currentLevel + 1, nextLimit, dirSizes, config)
   }

   if (currentLevel < limit) {
       dirSizes[dir.path] = fileSize
   }
   fileSize
}
Enter fullscreen mode Exit fullscreen mode

And our main function will turn into this:

suspend fun traverseAndCollect(config: AppSizeCollectorConfig): Result<Unit> =
   withContext(dispatcher) {
       val dirsToTraverse = (context.externalCacheDirs.filterNotNull() +
               context.getExternalFilesDirs(null).filterNotNull() +
               context.externalMediaDirs.filterNotNull() +
               context.filesDir +
               context.cacheDir +
               (context.filesDir.parentFile?.listFiles() ?: emptyArray<File>())
               ).toSet()
       val dirSizes = hashMapOf<String, Long>()
       var totalAppSize: AtomicLong = AtomicLong(0L)

       dirsToTraverse.map { dir ->
           launch {
               val limit = config.importantDirsLimit[dir.path] ?: config.globalLimit
               totalAppSize.addAndGet(calculateDirSize(dir, 0, limit, dirSizes, config))
           }
       }.joinAll()
       dirSizes["/"] = totalAppSize.get()
       transport.sendAppSizeCollection(dirSizes)
   }
Enter fullscreen mode Exit fullscreen mode

Note, that I’ve added some concurrency here, which may speed up the overall process. Since coroutines are very light, this may be added to the calculateDirSize as well, but I’ll leave it outside the scope of this article.

I also use “/” as a representation of the overall app size.
Making LogcatTrasport implementation
I will not cover the backend reporting part but will leave the debug transport here that just prints the results to Log for you to analyze and debug the system.

const val TRANSPORT_TAG = "APP_SIZE"

class LogCatAppsizeTransport : AppSizeTransport {
   override suspend fun sendAppSizeCollection(appSizeCollection: Map<String, Long>): Result<Unit> {
       Log.d(TRANSPORT_TAG, "App size collection is ${appSizeCollection.size}")
       appSizeCollection.entries.forEach { entry ->
           Log.d(TRANSPORT_TAG, "Path ${entry.key} is ${entry.value} bytes")
       }
       return Result.success(Unit)
   }
}
Enter fullscreen mode Exit fullscreen mode

This wraps the second part of the Calculating your App’s size article. In the last part, I will show how to stitch everything together.

Top comments (0)