As a machine learning engineer that works on different deep learning models, unexpected environmental issues always bother me.
Do these scenarios look familiar to you? What happens to the machine learning development environment?
- Even though you can
pip install torch, it doesn't mean you don't need to deal with the low-level code dependencies.
- Container is necessary for a consistent environment. Especially for the GPU part.
- Dockerfile is hard to reuse conveniently.
Dealing with the environment is just the first step of your work. It should be made easy but it's never easy. Although we need to admit that it's much easier than the day we had to search how to install NumPy.
Meanwhile, from the machine learning infra engineers' perspective:
Infra engineers are never the enemies of machine learning engineers. A better tool can make everyone happy.
Let's sum up our requirements:
Machine learning engineers should submit container images instead of raw code. Because they know better about the model dependencies.
Infra engineer should maintain a better utility to help machine learning engineers to build the container images following the best practice.
Meanwhile, machine learning engineers don't want to sacrifice the development experience. They should be able to use Jupyter Notebook and VSCode as usual.
So far, everything looks good. Obviously, it's not something impossible.
Let's introduce the new tool: envd.
It provides the following features:
Writing Python-like function instead of the Dockerfile and share them across your team
Based on bulidkit with better cache and parallel building
Integrated with Jupyter Notebook and VSCode
The syntax looks like this:
def build(): base(os="ubuntu20.04", language="python") install.cuda(version="11.6", cudnn="8") install.python_packages(name=[ "torch" ])
Run the command
envd up, then you are in a isolated container environment.
To reuse the function written by your teammates, you can import them like:
lib = include("https://github.com/tensorchord/envdlib") lib.jupyter_lab(host_port=8888)
It's also much faster. See the benchmark below:
More features are coming! Feel free to open a issue or join the discord community to discuss with us.