DEV Community

Cover image for DO YOU YAML?
Jonathan Fetterolf
Jonathan Fetterolf

Posted on

DO YOU YAML?

circuits

Say what?

Futurama Fry - Is that an alien language?

Introduction of YAML

YAML stands for "YAML Ain’t Markup Language" - this is known as a recursive acronym. YAML is often used for writing configuration files. It’s human readable, easy to understand and can be used with other programming languages. Although YAML is commonly used in many disciplines, it has received criticism on the amoutn of whitespace .yml files have, difficulty in editing, and complexity of the standard. Despite the criticism, properly using YAML ensures that you can reproduce the results of a project and makes sure that the virtual environment packages play nicely with system packages. (If you're looking for another way to share environments there are other alternatives to YAML which include StrictYAML (a type-safe YAML parser) and NestedText)

One of the first steps in entering into an existing data science project is setting up your virtual environment. This makes sure that dependencies and packages used for this project do not interfere with each other or write over those previously used. There will often be a file with the .yml extension in the project files so you can quickly get working on the existing project. Below, I’ll quickly run through the steps I take to create a virtual environment on my M2 MacBook with Anaconda already installed.

Steps

So where is that YAML file on GitHub and what do I do with it?:

First, projects typically will have one .yml file but sometimes you’ll see special instructions in the project’s read me:
yml in readme

Here’s what the actual YAML file will look like (they're usually on the root level of the directory, but can sometimes be further down in the directory:
yml file github

This is what the file will look like on GitHub:
yml file gihub

To save the .yml file, simply click the Raw button here:
raw button

Then, in the newly opened tab, right click and save as:
save as yml
(Make sure to save this somewhere you can easily find this as you’ll need to navigate to it!)

Open up a new terminal session and navigate to the directory where you saved the .yml file.

To create this new environment, I’ll enter:

conda env create -f geoenvironment.yml

After the virtual environment is done installing, you’ll need to activate it to use it. To do so, you’ll need to know it’s name which should be displayed with the command to activate it. If not, check a list of your environments by entering:

conda info --envs

conda info

Then activate the new environment by entering: (replace 'project-env' with the name of your virtual environment)

conda activate project-env

conda activate

Now you’re ready to start chugging on that existing project.

When you’re done working in that virtual environment, don’t forget to deactivate and switch to the next environment you want!

conda deactivate

If you want to start a project from scratch, I prefer to start with a very basic virtual environment and add the packages I need as I go along. My basic framework usually consists of:
Python
NumPy
Pandas
MatplotLib
& sometimes Seaborn

Finally, once you've created your environment and you're ready to unleash it on the world you can run a simple command to export the .yml file. Once you have your file you can upload it or share it with whomever you need. Here is the command to export (feel free to replace "environment" with the desired name of your new environment):

conda env export > environment.yml

conda env export

The whole process of creating and activating a new virtual environment is pretty simple when it works… However, if you run into errors such as not being able to find the right packages, it can get a little hairy. Luckily, there are great resources out that are just a quick google away. The most useful resources I found for these errors were on Stack Overflow and Apple Developer.

If you want to create a virtual environment from a .yml file, here’s a link to one of my projects (Tanzanian Water Wells: Predicting the Functionality of Water Wells in Tanzania) where you can try it out!

In Summary:

YAML isn’t scary (also ain’t markup language)
The .yml is an important feature in any Data Science workflow.
The .yml is used to ensure that packages and versions are the same
The .yml helps with reproducibility.
Including a .yml on a project allows for collaboration
YAML is altogether pretty simple.

Resources

The official YAML Web Site

If you’re looking for further resources on running TensorFlow and Keras on a newer MacBook, I recommend checking out this YouTube video: How to Install Keras GPU for Mac M1/M2 with Conda

If you’re looking for a resource on how to install Anaconda, I highly recommend that you go straight to the source, anaconda.org.

The M2 MacBook gave me some challenges when trying to work with TensorFlow and Keras due to some fancy chip architecture which you can read about here: TensorFlow with GPU support on Apple Silicon Mac with Homebrew and without Conda / Miniforge

If you want to take a dive into the YAML world, here's an in-depth tutorial: YAML: Everything You Need to Get Started in Minutes

Further, Very Serious Notes

Why did the YAML cross the road?

YAML Camel - Does not approve of your file
To get away from the package that broke the YAML's backend.

Want to Follow Along?

GitHub | LinkedIn | Twitter

Top comments (0)