Matt Morgan for AWS Community Builders

Posted on Mar 8, 2021

rego.fyi: A Study in Serverless Authorization with Open Policy Agent

#cdk #security #serverless #opa

Open Policy Agent is a decision engine built on a declarative language called Rego. OPA is pronounced "oh-pa", but to na kang setóp da mesach! so go ahead and say "oh-pee-ayy" because I know you want to. OPA is general-purpose and written in Go. It allows the creation and compilation of policies written in Rego which can then be hydrated with some kind of data source and then compared against an input to produce a result. OPA docs, linked above, are excellent so check them out for more information.

tldr
Why OPA?
Serverless OPA
rego.fyi
Rego Policy
OPA Authorizer
Web App
AWS CDK
Conclusion

tl;dr

The code is here!
rego.fyi

Why OPA?

The paradigm presented by OPA is compelling when applied to the use case of multi-tenant SaaS applications. Such applications have complex authorization rules which may include ensuring a user acts within their tenant or personal data, ensuring a user has the right role or permission, ensuring the user belongs to a tenant that is subscribed to the service being provided and many others. Often these rules are implemented in imperative logic throughout the application. A check for subscription may be implemented in middleware while limiting the scope of tenant access is often found in the WHERE clause of a SQL query. Spreading authorization concerns throughout an application makes it very hard to audit and understand what rules the application are actually enforcing and it makes it easy for bugs to creep in.

The more complex the application, the more authorization becomes a problem that needs a single solution. This is the problem OPA can solve. It's enticing to think about authorizing a microservice architecture with OPA. This would allow developers to focus on the problem the service needs to solve and share a common authorization abstraction.

Serverless OPA

The usual way to implement OPA for microservices is to stand up an authorization service implementing OPA and have other services invoke it over http using some of the published middleware. Even the OpenFaaS version written in Go depends on a standalone authorization service. I wanted to see if I could use OPA in a 100% serverless environment, making policy decisions in an API Gateway request authorizer without the overhead of additional http requests or the need to run a separate service. I found inspiration in this excellent sls-lambda-opa repo.

I think of this solution as a layered architecture, where the bottom layer is the authorizer implementing the OPA library, capable of compiling Rego policies. On top of that is the actual policy that states I want to compare a claim like permissions or subscriptions or I'm interested in the HTTP resource and method. Above that is the service or endpoint-specific data that states the actual resources, methods and subscriptions that will be evaluated. Then finally we have the user's session or context, delivered in a JSON Web Token (or JWT).

Each of these layers can be decoupled from the others. An authorizer function implementation might be used across several services with the same policy but different data hydrating the policy.

rego.fyi

My first attempt at putting all this together was a fairly terrible demo that needed a REST client and the copying of tokens. I had the urge to build a little fullstack app and so I came up with rego.fyi. In some ways, it's a less impressive take on The Rego Playground, but mine is serverless and built on the kind of architecture I want to work with. I'll also caveat that I am absolute trash when it comes to visual design. I'm just awful at it. Respect to those who are good, but I'm not one of you.

The architecture of my app is a little different than what I'd envision using in production. I wanted to be able to experiment with different policies, so the policies, along with data and the user input are all sent to be evaluated by my authorizer function. The authorizer compiles the policy in real time, makes the policy decision based on the data and user input, then returns an IAM policy document specifying whether my Lambda function can be invoked, as appropriate.

In a real application, I likely wouldn't want to compile the policy on request, but instead compile it once on startup. I don't have the expectation of needing to change policies on the fly, though if I did, the policy could be loaded from S3 or a database. What I would probably do in a real application is load the policy from a Lambda Layer.

Rego Policy

I am by no means an expert on the rego language and won't give an overview here when there are already useful docs. I did manage to put together a workable policy for my experiment.

package policy

import data.requests
import data.permissions
import data.subscriptions

default allow = false

allow {
    check_policy[input]
}

check_policy[input] {
    r = requests[_]
    some i; match_with_wildcard(permissions, input.permissions[i])
    some j; match_with_wildcard(subscriptions, input.subscriptions[j])
    match_with_wildcard(r.methods, input.method)
    match_with_wildcard(r.resources, input.resource)
}

match_with_wildcard(allowed, value) {
    allowed[_] = "*"
}
match_with_wildcard(allowed, value) {
    allowed[_] = value
}

The syntax of this policy is explained well in the docs, but just to call out a few things, everything in check_policy can be considered an AND comparison while the duplicative call signature of match_with_wildcard makes it an OR comparison. some i; match_with_wildcard(permissions, input.permissions[i]) is a fairly elegant one-liner that makes sure one item in the left-side array matches at least one item in the right-side array.

Note the policy defines fields I care about, namely the HTTP method and resource as well as my custom claims of permissions and subscriptions. This policy doesn't include anything about the values that should be compared to, but does describe the shape of the data and how it should be compared. In order to give those values, we need a data file.

{
  "requests": [{ "methods": ["GET"], "resources": ["/orders"] }],
  "permissions": ["start_order", "view_invoice"],
  "subscriptions": ["newsletter"]
}

This data file could apply to one API while another one with permissions like cuddle_hedgehogs or introspect_navel protects another one. That's the power of this layered approach! But the best part is rego ships with a testing framework.

The best way to experience a good separation of concerns is with some solid unit tests. They are quite easy to write.

package policy

test_get_allowed {
    allow with input as {"permissions":["start_order"], "resource": "/orders", "method":"GET", "subscriptions":["newsletter"]}
}

test_get_wrong_subcription_denied {
    not allow with input as {"permissions":["start_order"], "resource": "/orders", "method":"GET", "subscriptions":["pizza_of_the_month"]}
}

test_get_wrong_permission_denied {
    not allow with input as {"permissions":["change_password"], "resource": "/orders", "method":"GET", "subscriptions":["newsletter"]}
}

I'm providing the different user inputs and expecting them to either be allowed or not. Rego also gives me test coverage out of the box!

% opa test . -c
{
  "files": {
    "policy.rego": {
      "covered": [
        {
          "start": {
            "row": 7
          },
          "end": {
            "row": 7
          }
        },
        {
          "start": {
            "row": 9
          },
          "end": {
            "row": 10
          }
        },
        {
          "start": {
            "row": 13
          },
          "end": {
            "row": 18
          }
        },
        {
          "start": {
            "row": 22
          },
          "end": {
            "row": 22
          }
        },
        {
          "start": {
            "row": 24
          },
          "end": {
            "row": 25
          }
        }
      ],
      "not_covered": [
        {
          "start": {
            "row": 21
          },
          "end": {
            "row": 21
          }
        }
      ],
      "coverage": 92.3
    },
    "policy_test.rego": {
      "covered": [
        {
          "start": {
            "row": 3
          },
          "end": {
            "row": 4
          }
        },
        {
          "start": {
            "row": 7
          },
          "end": {
            "row": 8
          }
        },
        {
          "start": {
            "row": 11
          },
          "end": {
            "row": 12
          }
        }
      ],
      "coverage": 100
    }
  },
  "coverage": 94.75
}

Okay, that's a bit verbose and looks like I need another test, but still really useful. There's a --format=pretty option, but it doesn't seem to do anything. Perhaps it's a WIP.

Anyway, this is great! Our old applications with some of the authorization logic in middleware, some in SQL and some in between just can't compete with the ability to unit test the policy logic separate from any application code.

OPA Authorizer

The only difference between a RequestAuthorizer and a TokenAuthorizer is the TokenAuthorizer only sees the specified token while a RequestAuthorizer sees the entire request. This is a better fit, since I want to look at things like the HTTP method and path.

My authorizer function needs to:

Unpack my "token" (which really consists of the policy, data and user token, base64 encoded for demo purposes)
Compile the policy with the provided data.
Compare the user input and request to the policy.
Return an appropriate IAM policy to allow or deny access to the function handler.

This authorizer needs to be written in Go because the supporting OPA libraries are only available to the Go runtime. I normally write TypeScript, but this was my second try at Go and I think I did okay, thanks to countless examples and tutorials across the Internet. In fact, it's fair to say that imitation is a sincere form of flattery.

This was also the first time I've used Go with Lambda and I must say, this might be addicting.

Lambda Console — Millisecond billing, yeah!

I'm doing around 50ms with cold starts and single-digits otherwise, even though I'm doing all of those things (unpacking, compiling, deciding, generating) on every request. If the policy is compiled at start time, it's even faster.

Web App

My poor design skills notwithstanding, I'm reasonably good at programming in React. I built off a previous effort to do a simple fullstack non-CRA* React app with esbuild. I think it works pretty well.

In order to keep my app from looking like complete trash, I used Material-UI from Google for the components and that was pretty easy. I also delved into Testing Library and I find it quite nice and loved that I could write tests without having to render anything shallowly. I used React Context for state and I like that a lot better than working with Redux. You can check my code or query about this in the comments as I'm not going to do into great detail here, but as someone who doesn't program in React every day, it's nice checking in and seeing these innovations.

The app is just one page with no routing. When thinking about how to do this app, I actually thought about trying to figure out some kind of client-side JWT or perhaps do a round trip to a backend to get a user token. Ultimately I decided that cryptographic signing is beyond the scope of what I wanted to do in this app, so all it does is stringify the various form fields, base64 encode that string and finally pass the whole thing as an Authentication header. This allows me to have a decoding piece in my authorizer but of course it's in no way secure.

AWS CDK

I love working with CDK. My post on CDK S3 websites covers most of topics relating to asset bundling, but it's worth remarking on here. My CDK app handles all the asset bundling of my React web app (written in TypeScript), my authorizer function (written in Go) and my function handler (written in TypeScript). Because of the way asset bundling works in CDK, I can safely cdk deploy and only the parts of the application that have changed will deploy. This is great for a fullstack application and makes deployments very fast. I'll dig into this topic a bit more in a future post.

I also was able to get a unified npm test command that runs all the tests for:

CDK infrastructure as code written in TypeScript
React web app written in TypeScript
Lambda function handler written in TypeScript
Authorizer function written in Go
Policy written in Rego

% npm t

> cdk-esbuild-s3-website@0.0.1 pretest /Users/mattmorgan/mine/rego.fyi
> npm run lint


> cdk-esbuild-s3-website@0.0.1 lint /Users/mattmorgan/mine/rego.fyi
> eslint . --ext=.js,.ts


> cdk-esbuild-s3-website@0.0.1 test /Users/mattmorgan/mine/rego.fyi
> npm run test:opa && npm run test:go && npm run test:ts


> cdk-esbuild-s3-website@0.0.1 test:opa /Users/mattmorgan/mine/rego.fyi
> opa test ./opa

PASS: 3/3

> cdk-esbuild-s3-website@0.0.1 test:go /Users/mattmorgan/mine/rego.fyi
> go test ./...

ok      _/Users/mattmorgan/mine/rego.fyi/fns/go 1.222s

> cdk-esbuild-s3-website@0.0.1 test:ts /Users/mattmorgan/mine/rego.fyi
> jest --coverage --silent

 PASS   node  fns/ts/lambdalith.spec.ts
 PASS   dom  ui/providers/PayloadsProvider.spec.tsx
Bundling asset Default/AuthZFun/Code/Stage...
go version go1.16 darwin/amd64
Bundling asset WebTestStack/DeployWebsite/Asset1/Stage...
Bundling asset ApiTestStack/AuthZFun/Code/Stage...
0.8.56
go version go1.16 darwin/amd64
Bundling asset TestStack/DeployWebsite/Asset1/Stage...
0.8.56

> cdk-esbuild-s3-website@0.0.1 build /Users/mattmorgan/mine/rego.fyi
> npm run clean && npm run build:website


> cdk-esbuild-s3-website@0.0.1 build /Users/mattmorgan/mine/rego.fyi
> npm run clean && npm run build:website

 PASS   dom  ui/components/Header.spec.tsx
Bundling asset Default/LambdalithFn/Code/Stage...
 PASS   dom  ui/components/TextArea.spec.tsx

> cdk-esbuild-s3-website@0.0.1 clean /Users/mattmorgan/mine/rego.fyi
> rimraf cdk.out coverage website/js


> cdk-esbuild-s3-website@0.0.1 clean /Users/mattmorgan/mine/rego.fyi
> rimraf cdk.out coverage website/js

Bundling asset ApiTestStack/LambdalithFn/Code/Stage...
 PASS   node  cdk/lambda.spec.ts
 PASS   dom  ui/components/RequestControl.spec.tsx

> cdk-esbuild-s3-website@0.0.1 build:website /Users/mattmorgan/mine/rego.fyi
> NODE_ENV=production ts-node --files esbuild.ts build

 PASS   dom  ui/App.spec.tsx

> cdk-esbuild-s3-website@0.0.1 build:website /Users/mattmorgan/mine/rego.fyi
> NODE_ENV=production ts-node --files esbuild.ts build

 PASS   node  cdk/restApi.spec.ts
Running build...
Running build...
Bundling asset TestStack/AuthZFun/Code/Stage...
 PASS   node  cdk/website.spec.ts (7.308 s)
go version go1.16 darwin/amd64
Bundling asset TestStack/LambdalithFn/Code/Stage...
 PASS   node  cdk/rego.fyi-stack.spec.ts (8.261 s)
-----------------------|---------|----------|---------|---------|-------------------
File                   | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s
-----------------------|---------|----------|---------|---------|-------------------
All files              |     100 |      100 |     100 |     100 |
 cdk                   |     100 |      100 |     100 |     100 |
  getCFAndZone.ts      |     100 |      100 |     100 |     100 |
  lambda.ts            |     100 |      100 |     100 |     100 |
  rego.fyi-stack.ts    |     100 |      100 |     100 |     100 |
  restApi.ts           |     100 |      100 |     100 |     100 |
  website.ts           |     100 |      100 |     100 |     100 |
 fns/ts                |     100 |      100 |     100 |     100 |
  lambdalith.ts        |     100 |      100 |     100 |     100 |
 opa                   |     100 |      100 |     100 |     100 |
  policy.rego          |     100 |      100 |     100 |     100 |
 ui                    |     100 |      100 |     100 |     100 |
  App.tsx              |     100 |      100 |     100 |     100 |
 ui/components         |     100 |      100 |     100 |     100 |
  Header.tsx           |     100 |      100 |     100 |     100 |
  RequestControl.tsx   |     100 |      100 |     100 |     100 |
  Sidebar.tsx          |     100 |      100 |     100 |     100 |
  TextArea.tsx         |     100 |      100 |     100 |     100 |
 ui/pages              |     100 |      100 |     100 |     100 |
  Rego.tsx             |     100 |      100 |     100 |     100 |
 ui/providers          |     100 |      100 |     100 |     100 |
  PayloadsProvider.tsx |     100 |      100 |     100 |     100 |
-----------------------|---------|----------|---------|---------|-------------------

Test Suites: 10 passed, 10 total
Tests:       27 passed, 27 total
Snapshots:   6 passed, 6 total
Time:        8.804 s, estimated 9 s

Thanks to jest projects, I'm able to run my .tsx tests with the jsdom test environment and the .ts tests with node. Fullstack testing!

Conclusion

My success at getting this working as a RequestAuthorizer makes a very compelling case for introducing OPA into serverless projects. We can join the delegation of complex authorization rules with API Gateway request validation and find ourselves in a world where our function handlers are very simple - or maybe even skip them entirely.

COVER IMAGE

Oldest comments (3)

Ankit • Jan 31 '22

Rego Policy example you have presented has two methods but both have the same signature how does this work for you match_with_wildcard

Matt Morgan AWS Community Builders • Jan 31 '22

Hi Ankit, that's an OPA convention. I mentioned this above, but you aren't the first to wonder about it. the duplicative call signature of match_with_wildcard makes it an OR comparison is how I put it. I don't find the OPA docs super clear on this point, but it is explained here: openpolicyagent.org/docs/latest/po...

Hope that helps clear it up.