DEV Community

Arseny Zinchenko
Arseny Zinchenko

Posted on • Originally published at rtfm.co.ua on

What is: YAML – its overview, basic data types, YAML vs JSON, and PyYAML

YAML – is one of the most popular formats of the…

Well, actually, they don’t know the format of what…

Originally it was the «Yet Another Markup Language», later it became «YAML Ain’t Markup Language»:

Originally YAML was said to mean Yet Another Markup Language,[12] referencing its purpose as a markup languagewith the yet another construct, but it was then repurposed as YAML Ain’t Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.

In the Russian Wikipedia the “friendly” word even is taken in quotes – and I absolutely agree with that.

In fact – YAML is just another one data serialization type, the successor of the JSON format, but with some additional abilities.

In the recent survey in the Ukrainian DevOps CommunityYAML vs JSON” – YAML took about 90% votes.

As for me JSON still the most convenient, but YAML used in many places so have to use it.

In this post will take a closer look at YAML’s data types and a quick comparison with JSON.

YAML main principles

  • always use UTF-8 to avoid possible issues
  • never use TAB for indentation

YAML syntax validation

To check YAML syntax in Linux the yamllint can be used.

Install it:

$ sudo pacman -S yamllint
Enter fullscreen mode Exit fullscreen mode

And check file:

$ yamllint monitoring.yml
monitoring.yml
1:1       warning  missing document start "---"  (document-start)
20:34     error    trailing spaces  (trailing-spaces)
22:32     error    trailing spaces  (trailing-spaces)
23:37     error    trailing spaces  (trailing-spaces)
33:7      error    wrong indentation: expected 8 but found 6  (indentation)
35:9      error    wrong indentation: expected 10 but found 8  (indentation)
36:11     error    wrong indentation: expected 12 but found 10  (indentation)
Enter fullscreen mode Exit fullscreen mode

Although this file used by Ansible without any problems – there are still some issues in its formatting.

JSON validation

And for example – JSON documents validation from Linux console using Python’s json module:

$ python -m json.tool < json-example.json
{
  "key1": "value1",
}
Enter fullscreen mode Exit fullscreen mode

vim plugin

There is also the vim-yaml plugin for vim.

Add to your .vimrc:

...
" https://vimawesome.com/plugin/vim-yaml-all-too-well
Plug 'avakhov/vim-yaml'
" add yaml stuffs
au! BufNewFile,BufReadPost *.{yaml,yml} set filetype=yaml foldmethod=indent
autocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab
...
Enter fullscreen mode Exit fullscreen mode

Reload config and install it:

:source %
:PlugInstall
Enter fullscreen mode Exit fullscreen mode

PyYAML

To work with YAML from Python there is the PyYAML library.

Some examples below.

YAML formatting

Comments in YAML

One of the few advantages of YAM is an ability to add comments in its files.

Comments formatting is usual – using the #.

Comment can be added in any place.

Examples:

---
# I'm comment
- name: somestring
  value1: "# I'm not a comment!"
  value: anotherstring  # another comment
Enter fullscreen mode Exit fullscreen mode

Indentations

The main headache on the YAML is the indentations.

In this, in a whole file, the number of spaces (spaces – never TABs!) must be the same.

I.e. if in one place two spaces are used – then the whole file must use two spaces.

Even more – the agreement is to use two spaces, although can be any – just has to be the same everywhere.

For example:

---
parent_key:
    key1: "value1"
    key2: "value2"
    key3: "%value3"
Enter fullscreen mode Exit fullscreen mode

Will be valid form, but the next example:

---
parent_key1:
    key1: "value1"
    key2: "value2"
    key3: "%value3"
parent_key2:
  key1: "value1"
  key2: "value2"
  key3: "%value3"
Enter fullscreen mode Exit fullscreen mode

Will not.

While in Python which is ofter is scolded because of the spaces dependency such formatting can be used, although will be standard’s violation:

#!/usr/bin/env python
def a():
    print("A")

def b():
  print("B")

a()
b()
Enter fullscreen mode Exit fullscreen mode

Results:

$ python spaces.py
A
B
Enter fullscreen mode Exit fullscreen mode

Single-line YAML

Besides the standard view and spaces indentation – you can use JSON-like formatting like:

---
parent_key: {key1: "value1", key2: "value2"}
Enter fullscreen mode Exit fullscreen mode

Literal Block Scalar

YAML supports the ability to add multiline literal block scalars and has three types of it: the common one, using the “|” and the “>“.

The common format looks like:

---
string: This
    is
    some text
    without newlines
Enter fullscreen mode Exit fullscreen mode

Result in Python console:

>>> yaml.load(open('yaml-example.yml'))
{'string': 'This is some text without newlines'}
Enter fullscreen mode Exit fullscreen mode

Using the | (Literal style) – will save all newlines and closing spaces:

---
string: |
    This
    is
    some text
    with newlines
Enter fullscreen mode Exit fullscreen mode

Result is:

>>> yaml.load(open('yaml-example.yml'))
{'string': 'This\nis\nsome text\nwith newlines\n'}
Enter fullscreen mode Exit fullscreen mode

And using the > (Folded style):

---
string: >
    This
    is
    some text
    without newlines
Enter fullscreen mode Exit fullscreen mode

Will return whole text in one line + closing newline symbol:

>>> yaml.load(open('yaml-example.yml'))
{'string': 'This is some text without newlines\n'}
Enter fullscreen mode Exit fullscreen mode

But still, you have to adhere to the same spaces formatting.

Also, check the great answer on the StackOverflow here>>>:

There are 5 6 NINE (or 63*, depending how you count) different ways to write multi-line strings in YAML.

YAML basic data formats

YAML uses three main data formats::

  • scalars: the simplest in a key:value view
  • list/sequence: data ordered by indexes
  • dictionary/mapping: similar to scalars but can contain nested data including other data types

Scalars

Basic data type – scalars, just a key:value as programming variables:

---
key1: "value1"
key2: "value2"
Enter fullscreen mode Exit fullscreen mode

Using quotes for values recommended to avoid possible issues with special characters:

cat example.yml
---
key1: "value1"
key2: "value2"
key3: %value3
Enter fullscreen mode Exit fullscreen mode

Validate it:

$ yamllint example.yml
example.yml
4:7       error    syntax error: found character '%' that cannot start any token
Enter fullscreen mode Exit fullscreen mode

Still, you can skip quotes for boolean true/false values and for integer types.

Scalars – YAML vs JSON

For example – scalar in YAML:

---
key: "value"
Enter fullscreen mode Exit fullscreen mode

And JSON:

{
    "key": "value"
}
Enter fullscreen mode Exit fullscreen mode
Python

The YAML-scalars in Python example:

>>> import yaml
>>> yaml.load("""
... key: "value"
... """)
{'key': 'value'}
Enter fullscreen mode Exit fullscreen mode

Or from the file:

>>> import yaml
>>> yaml.load(open('yaml-example.yml'))
{'key': 'value'}
Enter fullscreen mode Exit fullscreen mode

Lists in YAML

Lists, sequences, collections – represents a collection of an ordered data where each element can be accessed by its index.

For example:

# SIMPLE LIST
- element1
- element2
Enter fullscreen mode Exit fullscreen mode
Nested lists in YAML

Similarly to the examples above – lists can include nested lists:

# SIMPLE LIST
- element1
- element2
# nested list
-
  - element1
Enter fullscreen mode Exit fullscreen mode

Or can be a named list:

---
itemname:
  - valuename
Enter fullscreen mode Exit fullscreen mode

In doing so lists can also contain scalars or dictionaries:

---
itemname:
  - valuename
  - scalar: "value"
  - dict: {item1: "value1", item2: "value2"}
Enter fullscreen mode Exit fullscreen mode
Lists – YAML vs JSON

List in YAML:

---
- item1
- item2
- item3
Enter fullscreen mode Exit fullscreen mode

List in JSON:

[
    "item1",
    "item2",
    "item3"
]
Enter fullscreen mode Exit fullscreen mode

Nested list in YAML:

---
- item1
- item2
- item3
-
  - nested1
Enter fullscreen mode Exit fullscreen mode

Nested list in JSON:

[
    "item1",
    "item2",
    "item3",
    [
        "nested1"
    ]
]
Enter fullscreen mode Exit fullscreen mode
Python and YAML-lists

Here are all similar to scalar’s example:

>>> yaml.load(open('yaml-example.yml'))
['item1', 'item2', 'item3', ['nested1']]
>>> for i in yaml.load(open('yaml-example.yml')):
...   print(i)
...
item1
item2
item3
['nested1']
Enter fullscreen mode Exit fullscreen mode

Dictionaries

Dictionaries, also called mappings is similar to scalars type and contains a key:value data but unlike scalars which are basic type – dictionary can include nested elements, for example, a list:

---
key1: "value1"
key2:
  - value2
  - value3
Enter fullscreen mode Exit fullscreen mode

Or another nested dictionary:

---
key1: "value1"
key2:
  - value2
  - value3

key3:
  key4: "value4"
  key5: "value5"
  key6:
    key7: "value7"
Enter fullscreen mode Exit fullscreen mode
Dictionary – JSON vs YAML

Dictionary in YAML:

---
key1: "value1"
key2:
  - value2
  - value3
Enter fullscreen mode Exit fullscreen mode

Dictionary in JSON:

{
    "key1": "value1",
    "key2": [
        "value2",
        "value3"
    ]
}
Enter fullscreen mode Exit fullscreen mode
Python
>>> yaml.load(open('yaml-example.yml'))
{'key1': 'value1', 'key2': ['value2', 'value3']}
>>> type(yaml.load(open('yaml-example.yml')))
<class 'dict'>
Enter fullscreen mode Exit fullscreen mode

And all usual for Python’s dictionaries operations are supported:

>>> dict = yaml.load(open('yaml-example.yml'))
>>> type(dict)
<class 'dict'>
>>> dict.update({'key3':'value3'})
>>> print(dict)
{'key1': 'value1', 'key2': ['value2', 'value3'], 'key3': 'value3'}
Enter fullscreen mode Exit fullscreen mode

In general – that’s all.

Check also those pages for more details:

Similar posts

Top comments (0)