David Haley

Posted on May 7, 2024

Container size analysis: TensorFlow 2.8 base image vs Deep Learning

#docker #tensorflow #optimization #ai

TLDR, building our DeepCell container from a base TensorFlow image is 50% faster to load and 60% smaller than using the Deep Learning container.

	Deep Learning image	Base TF image	Reduction
Uncompressed	19.5 GB	7.2 GB	63%
Compressed	8.4 GB	3.2 GB	62%
Batch job load time	6 min	3 min	50%

This post covers how we rebuilt our container on the smaller base image; and why the Deep Learning container is so big to begin with. The long and short of it is that you pay a steep price to have so many development tools available, and you typically don't need those for production tasks.

Optimizing our container

Our DeepCell journey began on Vertex AI. Google provides pre-built TensorFlow images as part of their Deep Learning Container Images.

These containers purport to let you:

Quickly prototype with a portable and consistent environment for developing, testing, and deploying your AI applications with Deep Learning Containers. These Docker images use popular frameworks and are performance optimized, compatibility tested, and ready to deploy.

Cool beans. Our DeepCell version uses TF2.8 so we picked this image from Google's list: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-8.py37

It runs Python 3.7 which fortunately is still supported by DeepCell. (I've had mixed experiences with python version support across bioinformatics tools)

Our initial container build was simple:

FROM us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-8.py37

ADD https://api.github.com/repos/dchaley/deepcell-imaging/git/refs/heads/main version.json

RUN git clone https://github.com/dchaley/deepcell-imaging.git

WORKDIR "/deepcell-imaging"

RUN pip install --user --upgrade --quiet -r requirements.txt

ENTRYPOINT ["python", "benchmarking/deepcell-e2e/benchmark.py"]

Our requirements file is pretty simple. We verified in the build logs that it didn't reinstall TensorFlow; note that the packages to install do not include TF:

Requirement already satisfied: tensorflow~=2.8.0 in /opt/conda/lib/python3.7/site-packages (from deepcell==0.12.9->-r requirements.txt (line 1)) (2.8.4)

...

Installing collected packages: tensorflow-addons, snakeviz, smart_open, qtpy, opencv-python-headless, lxml, jupyter-core, iniconfig, imagecodecs, cython, pytest, google-api-core, deepcell-toolbox, qtconsole, jupyter-console, deepcell-tracking, google-cloud-notebooks, google-cloud-bigquery, spektral, google-cloud-aiplatform, jupyter, deepcell

This resulted in a whopping ~20 GB container 😩

The compressed artifact size was ~8.5 GB: this is the amount of data that must be transmitted before unpacking.

The impact of all this? A six minute start time for Google Batch jobs, as defined from starting the container download …

2024-04-30 14:56:20.896 PDT
gce: Pulling from deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking

… until executing the container:

2024-04-30 15:02:23.233 PDT
Executing runnable container:

I wasn't thrilled with a six-minute minimum feedback cycle 😤 We tried image streaming to reduce startup time but alas, the container was so large it couldn't run without provisioning additional boot disk space.

We figured we must be able to build a container from a slimmer TensorFlow base image. We knew the DeepCell team had done some work scaling DeepCell using Kubernetes on GKE. Their Dockerfile confirmed that; just use TF's image.

We switched our base to TF's, grabbed the apt maintenance work they did, and updated our Dockerfile [diff].

The result; 7.2 GB uncompressed and 3.2 GB compressed. And ~3min time from starting to fetch the container to beginning to execute it.

	Deep Learning image	Base TF image	Reduction
Uncompressed	19.5 GB	7.2 GB	63%
Compressed	8.4 GB	3.2 GB	62%
Batch job load time	6 min	3 min	50%

That's better 😎 But I couldn't help but wonder … why?

Container size analysis

Let's deep dive on what's on the containers. The containers are too large to open in Cloud Shell 🫠 so we'll do it the old fashioned way on local.

Let's use ncdu to explore the file system.

Deep Learning

This container was built from the Deep Learning base. Let's boot it up & install ncdu.

$ docker run -it --entrypoint bash us-central1-docker.pkg.dev/deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking@sha256:8cc9b89e5869a4d468d64810b2ae47e242cc106519b2b8d7c4a9daa07856bdde
root@55a486270459:/deepcell-imaging# apt update && apt install ncdu

Begin scanning the root directory:

root@55a486270459:/deepcell-imaging# ncdu /

It scans pretty quickly. Here's the summary:

So far this just tells us we have a lot in usr and opt (common places to install libraries). Let's start with usr.

    6.6 GiB [ 53.9%] /lib
    4.9 GiB [ 39.6%] /local
  363.3 MiB [  2.9%] /share
  276.8 MiB [  2.2%] /bin
  144.1 MiB [  1.1%] /src

A bit odd to have stuff in both lib and local; but let's see. lib is mostly CUDA Deep Neural Network:

--- /usr/lib -------------------------
                     /..
    5.5 GiB [ 83.2%] /x86_64-linux-gnu
  938.5 MiB [ 13.8%] /google-cloud-sdk



--- /usr/lib/x86_64-linux-gnu ----------------------------
                      /..
    1.4 GiB [ 24.5%]  libcudnn_static.a
  956.8 MiB [ 16.9%]  libnvinfer_builder_resource.so.8.6.1
  839.4 MiB [ 14.8%]  libcudnn_cnn_infer_static.a
  675.1 MiB [ 11.9%]  libcudnn_cnn_infer.so.8.2.0
  271.8 MiB [  4.8%]  libcudnn_ops_infer.so.8.2.0
  227.3 MiB [  4.0%]  libcudnn_cnn_train_static.a
  225.5 MiB [  4.0%]  libnvinfer.so.8.6.1

Static libraries are used to compile from source. We aren't doing that. Maybe we need the dynamic libraries for inference, I'm not sure. But the static libraries here are over 2.5 GB…

Surprising also to see a gig in the cloud sdk… it looks like the sdk ships its own Python distro and some other stuff.

--- /usr/lib/google-cloud-sdk --
                     /..
  382.3 MiB [ 40.7%] /lib
  296.7 MiB [ 31.6%] /platform
  169.5 MiB [ 18.1%] /bin

As for /usr/local:

--- /usr/local ----------------
                     /..
    3.4 GiB [ 70.0%] /cuda-11.3
  850.0 MiB [ 17.0%] /share
  603.9 MiB [ 12.1%] /cuda-12.2

Well… do we actually need 2 versions of CUDA? (Why is 12.2 so much smaller?) About half of the 11.3 version is static libraries again.

So far we're at ~4 GB of CUDA-related static libraries (which we don't need).

How about that /usr/local/share directory…

--- /usr/local/share/.cache --
                     /..
  850.0 MiB [100.0%] /yarn

A gig of yarn package caches 😑 ~5 GB of stuff we don't need.

Alright, bouncing back to /opt (the other big directory, with 6 GB):

--- /opt --------------------
                     /..
    4.8 GiB [ 79.1%] /conda
    1.3 GiB [ 20.9%] /nvidia

Conda is a python distribution, let's check out what's in nvidia:

--- /opt/nvidia --------------------
                     /..
    1.3 GiB [100.0%] /nsight-compute



--- /opt/nvidia/nsight-compute -----
                     /..
  651.5 MiB [ 50.0%] /2021.1.1
  651.3 MiB [ 50.0%] /2021.1.0

So we have half a gig on an old version. What is nsight anyhow?

NVIDIA Nsight™ Systems is a system-wide performance analysis tool

Well we don't need that … so, we're at ~6 GB stuff we don't need. Let's go back to /opt/conda (~5 GB); as expected most of the stuff is in packages & libraries:

--- /opt/conda -----------
                     /..
    4.5 GiB [ 94.0%] /pkgs
    3.2 GiB [ 67.0%] /lib

Most of the 4.5 GB of pkgs is in something called dlenv-tf-2-8-gpu-1.0.20230926-py37hab20f5e_0 which in turn is ~3 GB of libraries.

--- /opt/conda/pkgs/dlenv-tf-2-8-gpu-1.0.20230926-py37hab20f5e_0 -----
                     /..
    2.9 GiB [ 81.3%] /lib
  623.1 MiB [ 17.3%] /share

The libraries are Python 3.7 site-packages, mostly Tensorflow (1 GB), and a bunch of small Python libraries. We presumably need this stuff!

--- /opt/conda/pkgs/dlenv-tf-2-8...e_0/lib/python3.7/site-packages ---
                     /..
    1.1 GiB [ 39.5%] /tensorflow
  282.5 MiB [  9.7%] /ray
  116.9 MiB [  4.0%] /pyarrow
   98.2 MiB [  3.4%] /llvmlite
   84.3 MiB [  2.9%] /scipy
   83.9 MiB [  2.9%] /sklearn
   78.8 MiB [  2.7%] /plotly
   69.5 MiB [  2.4%] /tensorflow_io
   58.6 MiB [  2.0%] /clang
   50.3 MiB [  1.7%] /apache_beam
   46.6 MiB [  1.6%] /google

How about share ?

--- /opt/conda/pkgs/dlenv-tf-2-8...0.20230926-py37hab20f5e_0/share ---
                     /..
  621.3 MiB [ 99.7%] /jupyter

--- /opt/conda/pkgs/dlenv-tf-2-8...f5e_0/share/jupyter/lab/staging ---
                     /..
  480.7 MiB [ 88.5%] /node_modules
   57.1 MiB [ 10.5%] /build

Half a gig for Jupyter's JS dependencies & build files. So, ~6.5 unused stuff.

How about the lib sibling to pkgs (3.2 GB) ? Almost all of it is … another Python distribution?

--- /opt/conda/lib/python3.7 ------
                     /..
    2.9 GiB [ 98.5%] /site-packages

--- /opt/conda/lib/python3.7/site-packages ---
                     /..
    1.1 GiB [ 38.3%] /tensorflow
  282.5 MiB [  9.4%] /ray
  117.0 MiB [  3.9%] /pyarrow
   98.2 MiB [  3.3%] /llvmlite
   84.3 MiB [  2.8%] /scipy
   83.9 MiB [  2.8%] /sklearn
   78.8 MiB [  2.6%] /plotly
   69.5 MiB [  2.3%] /tensorflow_io
   58.6 MiB [  1.9%] /clang
   50.8 MiB [  1.7%] /google
   50.3 MiB [  1.7%] /apache_beam

These appear to be the same packages as the dlenv-etc folder… ~3 GB of duplication, bringing our unused total to ~9.5 GB.

Since that's nearly all of our ~12 GB difference I stopped here.

Container size analysis: TensorFlow base

Let's do a quick scan of the container built off the base TensorFlow image.

Let's open up the container. Ooh, fancy...

root@2317ea736b48:/deepcell-imaging# apt update && apt install ncdu
root@2317ea736b48:/deepcell-imaging# ncdu /

This time most of the contents are in usr and root

--- / --------------------
    5.5 GiB [ 79.7%] /usr
    1.3 GiB [ 19.0%] /root

Most of root is Python 3.8 libraries, which is a lot of small libraries:

--- /root/.local/lib/python3.8 ----
                     /..
    1.0 GiB [100.0%] /site-packages

--- /root/.local/lib/python3.8/site-packages ----
                     /..
   85.5 MiB [  8.5%] /scipy
   83.6 MiB [  8.3%] /google
   74.5 MiB [  7.4%] /imagecodecs
   72.3 MiB [  7.2%] /cv2
   62.1 MiB [  6.1%] /opencv_python_headless.libs
   61.9 MiB [  6.1%] /pandas
   45.7 MiB [  4.5%] /sklearn

whereas /usr looks like this:

--- /usr ------------------
                     /..
    3.1 GiB [ 57.2%] /local
    2.2 GiB [ 40.1%] /lib

Almost all of lib is CUDA DNN:

--- /usr/lib/x86_64-linux-gnu -------------------
                     /..
  757.3 MiB [ 36.9%]  libcudnn_cnn_infer.so.8.1.0
  442.8 MiB [ 21.6%]  libnvinfer.so.7.2.2
  267.4 MiB [ 13.0%]  libcudnn_ops_infer.so.8.1.0

whereas local is split across more CUDA + python files:

--- /usr/local ----------------
                     /..
    1.7 GiB [ 55.4%] /cuda-11.2
    1.4 GiB [ 43.7%] /lib

--- /usr/local/cuda-11.2/targets/x86_64-linux/lib ---
                     /..
  382.7 MiB [ 25.4%]  libcusolver.so.11.1.0.152
  219.6 MiB [ 14.6%]  libcusparse.so.11.4.1.1152
  186.6 MiB [ 12.4%]  libcusolverMg.so.11.1.0.152
  181.3 MiB [ 12.0%]  libcufft.so.10.4.1.152
  176.7 MiB [ 11.7%]  libcublasLt.so.11.4.1.1043

--- /usr/local/lib/python3.8/dist-packages ---
                     /..
    1.1 GiB [ 84.0%] /tensorflow

It looks like the CUDA DNN files in /usr/lib are different from the CUDA files in /usr/local.

Conclusions

The Deep Learning container seems better suited for:

compiling tools from source
training, not just predicting
using notebooks for iterative development
overall development tasks

The TensorFlow base image seems better suited for:

running the specific thing you want to run once you've figured out how to run it.

Future work?

Google has optimized container images for VertexAI. We'd use: us-docker.pkg.dev/vertex-ai-restricted/prediction/tf_opt-gpu.2-8:latest

I get the sense from the docs these only work on Vertex AI & need you to train the model on Vertex AI as well:

The optimization occurs when Vertex AI uploads a model, before it runs.

At some point it may be worth investigating the cost of predicting via Vertex AI online models, vs, predicting with an open-source container on Batch. But, if the container is so large again because of training code, we may lose whatever benefits we gained…