DEV Community

Cover image for Serverless Security with Unikernels
Ian Eyberg
Ian Eyberg

Posted on

Serverless Security with Unikernels

Security is one of those topics where on one hand you see a lot of passionate developers that get upset whenever there is a new data breach (and those seem to be happening on the daily), yet on the other hand there is a very large skills gap on understanding how hackers (the bad kind) think, what makes them tick and most importantly - how they operate.

I think it's important developers start thinking about security in a
more holistic manner.

Let me give you an example. I was talking to a vp of eng the other day that said they are rather good on security cause all of their instances are inside a VPC. I agreed that was a good approach versus exposing everything on the internet but then I brought up the Capital One hack and the Door Dash hack and many others that occurred just this year. You can bet that if someone was only alerted 4-5 months after an attack such as in the case of DoorDash the miscreants have been all up and down those servers. Now exploiting a SSRF (server side request forgery) vulnerability is one thing but escalating the attack into the point where you have landed a shell inside a vpc is where this thinking falls apart.

Alt Text

Why is that?

At the end of the day attackers don't care about what exploit or what vulnerability they are using to get onto your server. The only care about getting onto your server to run their programs. For example cryptojacking attacks like the one that afflicted Tesla are very popular nowadays cause unlike ransomware you don't have to wait to get paid - it just starts making money immediately! At the end of the day it doesn't matter that Tesla had exposed kubernetes to the world the attackers just wanted to mine some monero - they could care less how they broke in. This is the point.

I'm going to show you real life attacks on Google Cloud here in a bit but first I want to set some expectations.

The Problem with Multiple Processes (or 'Just Use Threads')

Most attacks today rely on the capability of running other programs on a given server/instance/container/etc. If you can't do that because fork/execve and friends have been seccomp'd out the attack has gotten progressively harder cause now you have to start doing more exotic attacks using things like rop gadgets. However, at the end of the day the end desire remains the same - unfettered access to run whatever program the attacker wants so typically the end goal there is to pop a shell.

child process

Not being a day to day js developer I did a quick search on github to see how popular forking a new process might be. This picture shows that it definitely is not unpopular.

The recent paper A fork() in the road argues very well that we should not be using fork - at all in 2020. We have had native threads since ~2000 in Linux (yes, 20 years ago).

In the past there wasn't a strong demand to get rid of it because Linux itself was designed to run on real machines - not virtual ones. This is important to point out cause how else would you run other programs on the same physical server? However, that proposition can now be re-examined at least for cloud computing use cases which are entirely built on virtual machines.

For languages such as Java and Go you get threading out of the box so you can have as much performance as you have threads/cores available. For the interpreted language class such as Javascript and Ruby it's been common to stick X application servers behind a load balancer/reverse proxy to scale up. At the end of the day you get the exact same vCPU that you buy regardless. If you've got one vcpu user-land threads, async, and such might help you out some but forking off a half dozen worker processes won't - then you are just fighting the operating system scheduler.

Serverless Security

Serverless is clearly a desire for many developers today that don't wish to manage and run infrastructure. That makes sense as we keep pumping out tremendous amounts of software and devops salaries, at least in my neck of the woods (SF), are through the roof. Unfortunately, a lot of the status quo serverless offerings are built on top of popular cloud services leading to vendor lockin.

Unikernels are a fresh set of eyes of looking at this problem space as they allow one to deploy the same set of code to any number of vendors using tried and true vms as their base artifact, albeit not the types of vms you might be used to.

Running Node the Old Way

Let's show how you might normally provision this javascript webserver. (and yes I understand that this would be automated but it's the same thing -- work with me here) First we spin up an instance. Ok, nothing abnormal here.

Then we ssh in. Wait - hold on.

Right off the bat we are explicitly allowing the concept of users to jump into an instance and run arbitrary commands. In fact every single configuration management tool out there including terraform, puppet, and chef are explicitly built on this concept which is odious from the start.

Ok, let's continue.

Once we are on the instance we install node.js:

eyberg@instance-1:~$ sudo apt-get install nodejs
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  libicu57 libuv1
The following NEW packages will be installed:
  libicu57 libuv1 nodejs
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 11.2 MB of archives.
After this operation, 45.2 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://deb.debian.org/debian stretch/main amd64 libicu57 amd64 57.1-6+deb9u3 [7,705 kB]
Get:2 http://deb.debian.org/debian stretch/main amd64 libuv1 amd64 1.9.1-3 [84.4 kB]
Get:3 http://deb.debian.org/debian stretch/main amd64 nodejs amd64 4.8.2~dfsg-1 [3,440 kB]
Fetched 11.2 MB in 0s (42.9 MB/s)
Selecting previously unselected package libicu57:amd64.
(Reading database ... 37215 files and directories currently installed.)
Preparing to unpack .../libicu57_57.1-6+deb9u3_amd64.deb ...
Unpacking libicu57:amd64 (57.1-6+deb9u3) ...
Selecting previously unselected package libuv1:amd64.
Preparing to unpack .../libuv1_1.9.1-3_amd64.deb ...
Unpacking libuv1:amd64 (1.9.1-3) ...
Selecting previously unselected package nodejs.
Preparing to unpack .../nodejs_4.8.2~dfsg-1_amd64.deb ...
Unpacking nodejs (4.8.2~dfsg-1) ...
Setting up libuv1:amd64 (1.9.1-3) ...
Setting up libicu57:amd64 (57.1-6+deb9u3) ...
Processing triggers for libc-bin (2.24-11+deb9u4) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up nodejs (4.8.2~dfsg-1) ...
update-alternatives: using /usr/bin/nodejs to provide /usr/bin/js (js) in auto mode

Notice something strange? That's right. It didn't matter that my user is a non-root user - I could immediately 'sudo' my way to doing whatever I wanted on the instance.

That whole concept of 'least privilege' and 'user separation' that security devs like to talk about is by default on many servers not present.

Unfortunately, as soon as we do that we realize that Debian 9 (the first instance that Google offered to give us comes with node version 4.

eyberg@instance-1:~$ nodejs --version
v4.8.2

Now our options are to either trash this instance or download a tarball. Let's go for that other option (even though knowing that if someone else touches this instance it might cause problems down the road).

eyberg@instance-1:~$ wget https://nodejs.org/dist/v12.13.0/node-v12.13.0-linux-x64.tar.xz
--2019-11-14 18:08:59--  https://nodejs.org/dist/v12.13.0/node-v12.13.0-linux-x64.tar.xz
Resolving nodejs.org (nodejs.org)... 104.20.23.46, 104.20.22.46, 2606:4700:10::6814:172e, ...
Connecting to nodejs.org (nodejs.org)|104.20.23.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14055156 (13M) [application/x-xz]
Saving to: ‘node-v12.13.0-linux-x64.tar.xz’

node-v12.13.0-linux-x64.tar.xz                     100%[================================================================================================================>]  13.40M  --.-KB/s    in 0.1s

2019-11-14 18:08:59 (114 MB/s) - ‘node-v12.13.0-linux-x64.tar.xz’ saved [14055156/14055156]
eyberg@instance-1:~$ unxz node-v12.13.0-linux-x64.tar.xz
eyberg@instance-1:~$ tar xf node-v12.13.0-linux-x64.tar

Let's jump into the code!

What this next snippet does is pop a webserver on that offers two urls to list the contents of a directory. One is a lot safer than the other as we'll soon find out. (Again, I'm not a js dev so excuse the ugliness of the code.)

var http = require('http');
var fs = require('fs');
var url = require('url');

const { exec } = require('child_process');

var port = 80;

http.createServer(function (req, res) {

  if (req.url == '/safe') {
    var files = fs.readdirSync('/');
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end(files + '\n');

 } else {

try {
  var cmd = 'ls';
  var resbody = '';

  var query = url.parse(req.url, true).query;

  // this is *unsafe*
  if (query.cmd) {
    cmd = query.cmd;
  }

  exec(cmd, (err, stdout, stderr) => {
    if (err) {
      resbody = err
      console.error(err)
    } else {
     resbody = stdout
    }

    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end(resbody + '\n');
  });

} catch(e) {
  console.error(e)

  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end(e + '\n');
}

}

}).listen(port, "0.0.0.0");
console.log('Server running at http://127.0.0.1:' + port + '/');

Now let's run our program:

eyberg@instance-1:~/node-v12.13.0-linux-x64/bin$ ./node bob.js
Server running at http://127.0.0.1:80/
events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: listen EACCES: permission denied 0.0.0.0:80
    at Server.setupListenHandle [as _listen2] (net.js:1283:19)
    at listenInCluster (net.js:1348:12)
    at doListen (net.js:1487:7)
    at processTicksAndRejections (internal/process/task_queues.js:81:21)
Emitted 'error' event on Server instance at:
    at emitErrorNT (net.js:1327:8)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  code: 'EACCES',
  errno: 'EACCES',
  syscall: 'listen',
  address: '0.0.0.0',
  port: 80
}

Oh no! We forgot ports under 1024 are 'privileged'. Well no problem here - cause sudo make me a sandwich right?

We have gone from bad to worse. A sane setup would probably have a frontend proxy sitting in front of this that can drop privileges after getting setup and forwarding on the request but now you might need to call in your devops person huh?

eyberg@instance-1:~/node-v12.13.0-linux-x64/bin$ sudo su
root@instance-1:/home/eyberg/node-v12.13.0-linux-x64/bin# ./node bob.js
Server running at http://127.0.0.1:80/

Ok, let's hit it up:

➜  ~  curl -XGET http://34.68.46.143/
bob.js
node
npm
npx

Well - that works but is it safe?

➜  ~  curl -XGET http://34.68.46.143/?cmd="touch%20tmp"

This first query passes in the command "touch tmp" which creates a new file in that directory - bad news bears. The %20 you might recognize as the url encoding for the space character.

➜  ~  curl -XGET http://34.68.46.143/
bob.js
node
npm
npx
tmp

As we can see, we can run arbitrary commands on our end server and worse it's running as root.

This is a very oftenly abused software development pattern called 'shelling out'. There is almost never any good reason to do this and if you have code linters or static analysis setup on your ci there's a good chance it'll flag it or whoever is reviewing your PRs should.

Now if we refactor the offending command injection into the '/safe'' equivalent we might get this instead:

➜  ~  curl -XGET http://34.68.46.143/safe
bin,boot,dev,etc,home,initrd.img,initrd.img.old,lib,lib64,lost+found,media,mnt,opt,proc,root,run,sbin,srv,sys,tmp,usr,var,vmlinuz,vmlinuz.old

Leaking out your root filesystem probably isn't the best thing to do but at least you aren't injecting commands anymore.

Now, this is just one 41 line program here but this is a full blown linux system. Let's see what else is on here before we retire this example.

Attack Surface

Envision Normandy 1944.

Alt Text

The attack surface when we talk about linux systems is the amount of utter crap that we can attack.

root@instance-1:~# find / -type f | wc -l
76369

76,000 files! Just to run a 41 line javascript program?

I wonder how many shared libraries we have on this system?

root@instance-1:~# find / -type f -regex ".*\.so.*" | wc -l
751

750?? If we check out node we can see there are only 8 explicitly linked to node - why do we want/need the rest?

root@instance-1:~# ldd /home/eyberg/node-v12.13.0-linux-x64/bin/node
        linux-vdso.so.1 (0x00007ffeaf7e9000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fef28ab4000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fef28732000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fef2842e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fef28217000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fef27ffa000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fef27c5b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fef28cb8000)

What about executables?

root@instance-1:~# find / -type f -executable | wc -l
1339

1300?!? We can attack 1300 programs on this fresh instance? All we did was install node. Let's try that query again.

root@instance-1:~# find / -type f -executable | xargs file | grep executable | wc -l
754

Well we drilled it down close to halfway but still 750??

A heavily seccomp'd container infrastructure might prevent some of this behavior but then you are missing out on the whole serverless part of the idea and container security does not have a great track record. Also, we haven't even begun to talk about why the linux kernel is +15MLOC - half of it is just drivers for hardware that doesn't exist in a virtual machine, then there's all the support for users, and IPC and scheduling and .... anyways, that's for a different blogpost.

So we've now shown that merely setting up a node webserver can be a pain even when we aren't doing things like putting it into an init manager or dropping privileges or any other sane activity.

Securing it becomes a whole new level of batshittery.

Serverless Unikernels

Let's start fixing the problem now that we have identified it. Let's take this same node.js webserver and turn it into a unikernel using the Nanos kernel and the OPS unikernel orchestrator.

If it's the first time you've done this you might want to check out this tutorial first.

Before we build the image - want to see the entirety of the filesystem first? I didn't show you the filesystem in the previous example cause no one wants to sift through 20+ pages of a tree listing.

➜  sec-article  ops pkg contents node_v12.13.0
File :/node
File :/package.manifest
Dir :/sysroot
Dir :/sysroot/lib
Dir :/sysroot/lib/x86_64-linux-gnu
File :/sysroot/lib/x86_64-linux-gnu/libc.so.6
File :/sysroot/lib/x86_64-linux-gnu/libdl.so.2
File :/sysroot/lib/x86_64-linux-gnu/libgcc_s.so.1
File :/sysroot/lib/x86_64-linux-gnu/libm.so.6
File :/sysroot/lib/x86_64-linux-gnu/libnss_dns.so.2
File :/sysroot/lib/x86_64-linux-gnu/libpthread.so.0
File :/sysroot/lib/x86_64-linux-gnu/libresolv.so.2
Dir :/sysroot/lib64
File :/sysroot/lib64/ld-linux-x86-64.so.2
Dir :/sysroot/proc
File :/sysroot/proc/meminfo
Dir :/sysroot/usr
Dir :/sysroot/usr/lib
Dir :/sysroot/usr/lib/x86_64-linux-gnu
File :/sysroot/usr/lib/x86_64-linux-gnu/libstdc++.so.6

Yep - that's all 20 files of it. Actually 6 of those are just directory entries.

Ok, let's build the image first:

➜  sec-article  cat build.sh
#!/bin/sh

export GOOGLE_APPLICATION_CREDENTIALS=~/gcloud.json
ops image create -c config.json -p node_v12.13.0 -a main.js
➜  sec-article  ./build.sh
[node main.js]
bucket found: my-bucket
Image creation started. Monitoring operation operation-1573756133681-59752a74faae6-6944eb54-9f7ee502.
............
Operation operation-1573756133681-59752a74faae6-6944eb54-9f7ee502 completed successfully.
Image creation succeeded node-image.
gcp image 'node-image' created...

Then we can boot it:

➜  sec-article  ops instance create -z us-west1-b -i node-image
ProjectId not provided in config.CloudConfig. Using my-project from default credentials.Instance creation started using image projects/my-project/global/images/node-image. Monitoring operation operation-1573756213461-59752ac110224-644f49fc-4440b475.
.....
Operation operation-1573756213461-59752ac110224-644f49fc-4440b475 completed successfully.
Instance creation succeeded node-image-1573756213.
➜  ~  curl -XGET http://35.247.123.61/
Error: spawn ENOSYS
➜  ~  curl -XGET http://35.247.123.61/safe
dev,etc,kernel,lib,lib64,main.js,node_v12.13.0,proc,sys,usr
➜  ~  curl -XGET http://35.247.123.61/cmd\="touch%20tmp"
Error: spawn ENOSYS

So we can see we are safely not allowing any other processes to be spawned on the end machine. It isn't a matter of filtering the calls either - the system itself straight up doesn't have support for it. If you want more programs running just boot up another instance and if you want more performance out of the server look at other languages.

There are other reasons why we advocating serverless unikernels like this besides security and in upcoming blogposts we'll start showing other superpowers of this style of infrastructure.

Top comments (0)