DEV Community

Cover image for 2020, the year of unexpectedness
Leo Di Donato
Leo Di Donato

Posted on

2020, the year of unexpectedness

This time of the year normally makes me reflect on my life and professional accomplishments.

This last year, 2020, has given me a new perspective and awareness about a few things... 💭

It taught me:

  • how important our personal relationships, friends, and family (people we often take for granted) are
  • how I can help people keep their spirits up
  • how important it is for me that they do the same with me, helping my energy to stay positive
  • how all these make the difference even in the work environment

I feel like we all learned the hard way something similar to what I've just written above.

A global pandemic, political and social unrest, environmental concerns, isolation. 🦠
We all have had to face them.
It has been a tough one, let's admit it.

Furthermore, I don't think the SARS-CoV-2 virus - ie., COVID-19 - cares about the 📆 Gregorian calendar: 2021 will be no different than 2020 in many aspects unless we commit ourselves to do the right steps and decisions.

Thus we're in charge to make 2021 a little better by applying all the lessons that the pandemic taught us, and by using the tools we built to face it.

The first and most important lesson is: to keep a sense of balance and normalcy while we go through the new year.

I'll try to achieve this balance starting now, by looking back at my highlights with more indulgence than the usual.

In practice, I always feel like I could have done more, and done it better.

Today, while prepping this blog post and going through all I did, all we did, I suddenly realized that even if it's always possible to do more, that's not the correct yardstick to measure ourselves.

Especially because the word "more" is not an easily quantifiable number. And no one really knows how to correctly weight it, when all around our world is changing so fast and our lives seem completely different from what we were used to.

So, let me tell you what are the 2020 things I'm most proud of!

Deep into the eBPF VM in the Linux kernel

Italy 🇮🇹 was the first western country to impose a total lockdown. 🔒

Unknowns everywhere. Ambulances. A lot. Deaths. A lot of deaths. Words, too many words. 😢

So, I turned off the television and any media device in my house, reaching unprecedented levels of isolation. 🚨

The only sounds I remember distinctly are:

  • the fan noise while compiling Falco and the Linux kernel
  • the fast pace typing on the keyboard
  • the ambulances
  • people singing from the balconies

All of a sudden, the only thing I could do during my evenings and nights was to code and debug with my friend Lorenzo.
I felt alone. But I also quickly realized how lucky I was to have a real friend with whom I share a passion!

The positive aspect of the lockdown was that having unexpected free time came in really handy to look at a set of strange issues (896, 1610) users were experiencing for some months (~October 2019) while trying to get Falco to work fine on some Linux kernels (with the eBPF driver, clearly).

Long story short: users found Falco to hang on Linux kernels 4.19.y when using the eBPF driver.
The CPUs were starting to soft-lock under certain (and unknown) circumstances.

After banging our heads a bit against various brick walls 🧱, we thought we created a reproducer...
But still, we had no idea at all of where the problem cause was.

Kernel stack trace during the softlock

Little after, we discovered that to reproduce the crash effectively we had to compile a Linux kernel on the host!
Or to stress the system in a similar way.

Nataly experiencing the crash

Besides the fact we wanted to use our new spare time to fix it, there were various factors that caught our attention:

  • it was happening only in certain conditions
  • initially, it seemed related to the events (syscall) frequency
  • it was happening regardless of the specific syscall

Lorenzo and I first started bisecting the commits of the Linux kernel 4.19.y.
By doing so we discovered that the problem was introduced by commit 849fa50662fb.

We found that the problem was affecting almost all 4.19.y series, from 4.19.19 to the long-term Linux kernel release 4.19.133. Thus we knew we need to responsibly disclose our findings to the BPF subsystem kernel maintainers.
Thus we contacted Daniel Borkmann privately.

The commit introducing the issue

The commit 849fa50662fb ("bpf/verifier: refine
retval R0 state for bpf_get_stack helper"
) introduced do_refine_retval_range function in the BPF verifier. The original intent of such function was to fix a situation where the LLVM compiler optimizations were messing with registers r2 and r1 when testing the return value of bpf_probe_read_str and of bpf_get_stack helpers against the buffer size - ie., retval > bufsize. In fact, both helpers return a negative error code or a length (equal or smaller than the buffer size).
So, the do_refine_retval_range goal was to check the return values against the correct boundaries (eg., meta->msize_*).

do_refine_retval_range()

In the meantime, with another very long cycle of bisects, we also discovered that other Linux kernel major releases were affected (eg., 5.0 kernels). We prepared a gist containing a table of the affected kernels (which you can find here) and sent it out to Daniel.

Until commit e2ae4ca2, which was indirectly solving the issue. Starting from it the problem was not present anymore...

Commit e2ae4ca2 indirectly solves the issue?

We continued debugging, debugging, and debugging 🔬. We created patches. Applied and tested them. Tested various commit reverts. Looked at the generated assembly, with JIT enabled or not. Generated .dot files via bpftool. For days and nights.
In those days, uncovering the root cause was the only thing I cared about. 🔨

Leo sending assembly instructions to Daniel

While talking with the BPF subsystem maintainers we quickly discovered they also were debugging hard to get to the bottom of the problem.
We shared with them some self-explanatory images of the xlated Falco eBPF driver, with and without the patch we drafted once we get close to fully understand the root cause.

You can find some of the material (.dot files, xlated dumps) in this gist.
But I think, the following image speaks for itself. 💡

The loop soft-locking the CPUs

A beautiful infinite loop 🔄 in a branch consequent to the return value of bpf_probe_read_str. A helper we use a lot in the Falco eBPF probe. We got it! 🔦

Due to the mentioned commit, the eBPF verifier was buggy. It was marking portions of our eBPF probe as unreachable causing our probe to hit the sanitizing code (hence the misleading jump - 1). 🐛

As soon as we got it, we patched the Falco eBPF probe (take a look at the patch here) to avoid the buggy branch analysis mechanism in the eBPF verifier to trigger. 🏅

How? By checking the return value of the bpf_probe_read_str function against EFAULT (remember it returns either a negative error code or a length, equal or smaller than the buffer size).

Check the return value of bpf_probe_read_str() against EFAULT

fix(driver/bpf): exact check on bpf_probe_read_str() return value #1612

The bpf_probe_read_str returns a value >= 0 or -EFAULT (-14) when there's a page fault. To avoid issues with the BPF VM branch analysis we need to check for the negative case exactly.

Fixes #1610 Ref https://github.com/falcosecurity/falco/issues/896

Co-authored-by: Lorenzo Fontana lo@linux.com Signed-off-by: Leonardo Di Donato leodidonato@gmail.com

We also promptly updated the Falco driver version in the Falco core with pull-request 1131.

Easy peasy lemon squeezy. 🍋 Innit?

Not exactly. The process was a month long. The real bug was still present in the Linux kernel.

But we continued working side-by-side with the BPF subsystem maintainers that were preparing another patch slightly different from ours. We tested the new patch and confirmed it was fixing the issue.
A week after a commit titled "bpf: fix buggy r0 retval refinement for tracing helpers" ended up in various Linux kernel releases definitely solving the problem!

Greg!

You can take a look at the final patch by looking at this commit in the Linux kernel 4.19.y series.

The same was applied to 5.4.y and 5.6.y series too.

It was a tough, long, and unique experience. But also very rewarding. Seeing my name in a Linux kernel commit made me cry. 🤩

Lorenzo and I had a lot of fun, but also moments in which results were not coming and our will wavered.

Anyway, during that month we learnt a lot. For example, knowing how to effectively debug the eBPF VM in the Linux kernel is priceless in my opinion, and that alone was worth the price.

In case you also wanna know how to embark on such tasks, I suggest you watch Lorenzo's lightning talk here. 📽

Podcasts, conferences, live streams

I began 2020 by recording a podcast about eBPF and Falco for the Kubernetes Podcast from Google. 📼

I suffer from impostor syndrome, thus initially the anxiety was overwhelming.

I'm grateful I've been able to talk about two of my favorite topics in the world with people at Google: eBPF and Falco. Only today, after one year, I've completely realized how lucky I feel for that opportunity. 🥠

eBPF and Falco with Leonardo Di Donato

Conferences: I always loved to get to meet peers in person. Conferences were that moment in my life.

We all know how 2020 took that privilege away from us.

When COVID-19 came as a shock to all of us, some conferences were canceled.
Other times I declined talks that got accepted.
Too much uncertainty, too many changes. So there I was, sad for the flights I couldn’t take to run away for a bit, and for the friends I couldn't meet.

After mid-2020, I was finally able to emotionally re-calibrate myself.
I adapted and I started to participate in various virtual events and gave a bunch of talks.

Here they are:

You can find the slides in the leodido/presentations GitHub repository. Ping me over Twitter in case of GitHub LFS rate-limits you. Or you can watch the recordings on YouTube.

GitHub logo leodido / presentations

Collection of my talks

presentations

Other talks may have gone lost

2020
Date Title Slides Video Conference Type
2020.11.20 Bypass Falco Slides Watch 📼 KubeCon + CloudNativeCon North America 2020 Virtual Talk
2020.10.29 Intro to Falco ✖️ Watch 📼 Rawkode Live Live stream
2020.09.27 Falco, runtime security analysis through syscalls Slides Watch 📼 RomHack Rome 2020 Talk
2020.08.20 Going Beyond CI/CD with Prow Slides Watch 📼 KubeCon + CloudNativeCon Europe 2020 Virtual Talk
2020.08.19 Designing a gRPC Interface for Kernel Tracing with eBPF Slides Watch 📼 KubeCon + CloudNativeCon Europe 2020 Virtual Talk
2020.06.20 Falco: runtime security analysis through syscalls Slides Watch 📼 BSides Athens 2020 Talk
2020.02.18 eBPF and Falco ✖️ Listen 🔈 Kubernetes Podcast from Google Podcast
2019

TBD


Analytics






I learnt how difficult it is to set-up the tooling for recording a good video. I discovered so badly how much I prefer to give in-person talks. Entering a room with other peeps, smiling at them, and going with the flow while talking about nerdy things is way better than sitting in front of my laptop, tweaking the way I present a topic countless times, wasting hours editing clunky recording videos. That is not my job.

Yeah, I could have given more speeches. But I'm really proud of the quality I've been able to deliver in the talks I gave. They were a completely new format, in totally different conditions, and I now feel I did pretty well.

Especially with my latest talk, Bypass Falco.

Back in July, while I was prepping the talks for KubeCon EU 2020, a thought flashed upon my mind.

What if I can prepare a talk showing how to bypass the software I build?

We all know that the scope of Falco is to detect unknowns and threats at runtime, right?

In this world, it is very common to challenge things others do. It's very rare to challenge the tools we build, the decisions we take.
Will the audience understand the reasoning behind my choice?

In the end, I just wanted to be transparent, and to bypass Falco to later make it stronger. Also, I wanted to show people how to do the same, and help us fix the findings they found.

Thus, I wrote an abstract and I submitted the talk for KubeCon NA 2020. The talk got accepted and I told my father: "Look Pa, I'm gonna teach people how to bypass the software I build".
Boy, did you call that one...

I'll spare you the details, but what ensued was a fairly predictable (and stereotypical) Italian family drama: he started yelling at me, and shouted that to do such a thing was a grave and unforgivable mistake. That Sysdig could have, nay - should have - fired me for even conceiving something as twisted as this. 😭
Indeed, according to him, presenting in public the shortcomings of the software I contribute to creating amounted to admit some sort of failure, either mine or of my team or of my company as a whole. In his eyes, it was an admission of defeat, the product of a subversive attitude.

I tried - in vain, as he was still shouting - to explain that a constructive attitude begins with a quest for our weaknesses. It is the first, and necessary, step in gaining self-awareness. It is only by getting to really know our limits that we can hope to overcome them.

Nothing. My father continued telling me that Sysdig was gonna fire me. That my idea was non-sense, and so on.
I felt proud of my idea, initially. But at that moment, I was filled with doubts. And very very anxious and sad.

Nevertheless, I prepared my talk. I put everything I had into uncovering bypasses for Falco, preparing the deck, and finally recording the video.

The KubeCon NA 2020 started and my talk went on air.

The feedback was astonishing! Beyond my dreams. People got my message... It meant a lot to me! 💌

A huge flow of positive tweets hit me! 🙈

Also, my talk got mentioned (here by equinix, here by stackrox) as one of the best KubeCon NA 2020 talks to watch! This completely blew my mind. 🤯

Listen to the following video starting at 45:45 if you don't believe me!

More interesting than social mentions, people sent pull requests towards the Falco drivers, by looking at my talk, to support the missing syscalls that could be used to bypass it! 👇

update(driver): execveat support #1723

sysdig-CLA-1.0-signed-off-by: Tommy McCormick mccormickt9@gmail.com

update(driver): add copy_file_range syscall support #1724

Support copy_file_range syscall

Add support for copy_file_range syscall

Launch sysdig:

sudo ./userspace/sysdig/sysdig proc.name=copy_file_range and evt.type=copy_file_range --bpf=./driver/bpf/probe.o
Enter fullscreen mode Exit fullscreen mode

Example output:

21393 04:06:29.368028000 3 copy_file_range (76573) > copy_file_range fd_in=3(<f>/home/crash/Documents/local/sysdig-repo/tests/src) off_in=0 fd_out=4(<f>/home/crash/Documents/local/sysdig-repo/tests/dst) off_out=2 len=16 flags=0(O_NONE) 
Enter fullscreen mode Exit fullscreen mode

For testing purposes

#include <sys/types.h>
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>

int main(int argc, char **argv) {
  int fd_in, fd_out;
  struct stat stat;
  loff_t len, ret;

  if (argc != 3) {
    fprintf(stderr, "Usage: %s <source> <destination>\n", argv[0]);
    exit(EXIT_FAILURE);
  }

  fd_in = open(argv[1], O_RDONLY);
  if (fd_in == -1) {
    perror("open (argv[1])");
    exit(EXIT_FAILURE);
  }

  if (fstat(fd_in, &stat) == -1) {
    perror("fstat");
    exit(EXIT_FAILURE);
  }

  len = stat.st_size;

  fd_out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0644);
  if (fd_out == -1) {
    perror("open (argv[2])");
    exit(EXIT_FAILURE);
  }

  loff_t buffin = 0;
  loff_t buffout = 2;

  ret = syscall(__NR_copy_file_range, fd_in, &buffin, fd_out, &buffout, len, 0);

  if (ret == -1) {
    perror("copy_file_range");
    exit(EXIT_FAILURE);
  }

  close(fd_in);
  close(fd_out);
  exit(EXIT_SUCCESS);
}
Enter fullscreen mode Exit fullscreen mode

Compile it

gcc -o copy_file_range.o copy_file_range.c
Enter fullscreen mode Exit fullscreen mode

Run it

./copy_file_range.o srcfile dstfile
Enter fullscreen mode Exit fullscreen mode

Notes

sysdig-CLA-1.0-signed-off-by: Luca Montechiesi lucamontechiesi@gmail.com

It was so amazing to see people getting inspired by my talk!

I leave you the YouTube recording of my Bypass Falco talk in case you wanna send some pull-request to the Falco drivers too! 😜

Falco

I will try to keep it compact, but Falco and its community, grown so much this year that I feel like this could be a separate blog post.

My feeling was very precise. I simply can't make this a mile long blog post.

I moved my review of the year 2020 in Falco on its blog.

You can find it there. Just let me say I was impressed by the things we all (the Falco maintainers and the whole Falco community) did on Falco. And I'm only referring to the topics I remembered without any effort, not all of them.

Family

During this 2021, my little brother, Francesco, found himself not able to work at the hostel where he was working in Milan, just like many other persons out there. 😰

A really bad situation... I could have given him a fish and fed him, but I knew it wouldn't have worked in the long run.

So we just started talking a lot to first identify how to use the new and unexpected free time, and what were his desires.

While brainstorming, suddenly everything was crystal clear! 🔮

He wanted to be back in tech. When he was 7 years old we used to code together some games in C++, others with Javascript engines and HTML5. Basically, we had fun together. 🕹

But then he quit drastically. Because I was so stupid (also way younger) to push him to learn hard things he was not really passionate about.

So this time I knew the mistakes to avoid.

The only thing I needed to do was to help him by teaching him how to fish.
How to discover by himself the technologies, the frameworks, the programming languages, etc. he wanted to experiment with. How to approach them.

And sit there, ready to help him when he would have asked me some guidance.

The process took the whole year, but as of today, it's complete and really successful.
He did a lot of interviews. Cracked some of them, failed others. We both learned a lot!

He now works as a Software Frontend Engineer at Chili TV during day time.

But what's more important is that he spends his free time putting so much effort and passion into learning Linux, improving his coding skills, and playing with Arduino and Raspberry!

His progress goes so fast that sometimes he scares the hell out of me! 😱

I consider helping him find his path as my top accomplishment of the year since every time I look at him playing and learning some new things he now knows he's passionate about, I can finally see he is happy. And this warms my heart more than anything! 🧡

You can clearly see how happy he's with today's project: building a simple music tone recorder with Arduino.

Who knows what will be the project he will choose tomorrow! I'll wait tomorrow morning to know more about it! 💫


To new beginnings, happy 2021 everyone! 🥳

Top comments (0)