What's next for AssemblyLift? Data-oriented cloud dev through WebAssembly and capability-based security

#webassembly #rust #serverless #cloudnative

After nearly two years in part-time development, AssemblyLift has been through three major revisions, sitting now at v0.3.2.

The focus up to now has been on creating a low-friction environment for building applications with functions and APIs, and on the core functionality of the WebAssembly runtime. Using a simple TOML definition, AssemblyLift makes it easy to deploy Rust functions to AWS Lambda fronted by API Gateway. TL;DR it's a nifty little API-oriented compute framework!

Conspicuously missing from this framework however is any notion of data. Where do we store it? Who is allowed to access it? I've personally never written a dynamic web service that didn't need to interact with a data store in some way or another!

It's common to pair serverless compute like Lambda with a serverless database like DynamoDB. This pairing requires the Lambda function to have an attached IAM policy granting it the necessary access to DynamoDB resources. With AssemblyLift currently, attaching IAM policies is done either via the AWS Console or by including your own Terraform code with the AssemblyLift project (this process is covered in another blog post).

This service-to-service permissions model is common among cloud providers, and is usually implemented as Role-Based Access Control or RBAC. In our example above, a Lambda function is assigned an execution role which has one or more policies attached. If the function is allowed to assume the role, and if the attached policies allow it the function can access DynamoDB. Or more accurately, it can access whichever DDB resources are specified in the policy resources list. AWS policies also allow you to specify a list of conditions on the data which can be accessed, for example by only allowing GetItem for items with a particular prefix on the primary key. The official AWS docs contain more details on DynamoDB access management.

In a serverless context, this model of access control can create some problems. To maintain the Principle of Least Privilege it is recommended to keep a one-to-one relationship between functions and roles, i.e. functions should not share roles. At scale you may have hundreds of functions, for each of which a unique access policy is maintained. This means that changes to access policy for any data potentially has to be replicated many times. This complexity increases the risk of having misconfigured and/or inconsistent access policies, even if they are represented as code. Using policy conditions to restrict data access is problematic, as your database "schema" (the format of your keys) is then hard-coded separately from the function code.

We didn't want to expose this complexity in AssemblyLift. It's the kind of "plumbing" code nobody enjoys writing, and it's also the kind that tends to get glossed over simply because it's hard to get right. Fortunately, it is now 2021 and we have some new tools at our disposal!

Entity-Capability Data Access Control

Entity-Capability Data Access Control (ECDAC) is the tentative name for the data access model being developed for AssemblyLift (as a kind of nod to the Entity-Component-System architecture which inspired it). The model is intended to alleviate the issues with policy definition described above. It is also meant to address issues with maintaining & communicating entity/object schema definitions between many functions and services, which will be discussed below.

In this scheme there are (as you might have guessed) two essential components:

Entities
Capabilities

When we talk about an Entity, we mean an object of some kind with a unique ID (e.g. Person, Account, Post, Comment, etc). This is the same sort of entity concept which is often used in database design, including DynamoDB "single-table" design.

Capabilities refers to capabilities in the capability-based computing/security sense. Connor Hicks at Suborbital wrote an excellent blog post which discusses the details and benefits of the capability-based model better than I probably can here. In short, capabilities define what an application component can do. A Capability is typically some kind of unforgeable token or object, which embeds a set of access rights.

An important distinction between access rights encoded as a Capability, and access rights encoded in an RBAC policy, is that Capabilities can be transmitted and shared and possessing a valid Capability implies valid access. By possessing a valid Capability, application code can verify that it is allowed to perform an action. In an RBAC system on the other hand the entire function must assume a role (effectively pretending to be a "user"), and in turn take on the union of all permissions provided by the attached policies. Access is validated independently of the function code (after the code has already executed) by an authorization service and rejected if the policy doesn't allow the action.

In our ECDAC model, an entity is composed of a unique ID, a shape describing any necessary schema, and a set of capabilities defining how data described by this entity can be accessed. This inversion of control -- keeping access control with the data rather than the service -- is made possible by using WebAssembly modules to implement our Entities!

The implementation details are still being worked out, but the central idea is that our data Capability is compiled as an executable WASM module, which is an Entity insofar as it implements the Entity ABI. An Entity is a Capability as a WASM module.

A service defines an Entity as a kind of dependency. This means the service possesses an Entity, which in turn means possessing a Capability. An AssemblyLift guest will interact with Entities through a high-level API, which delegates to the Entity module to verify the Capability, and operate on data in a particular location according to the Entity shape/schema. The Entity module is self-verifying and needs no outside processes to validate the requested action. If we want Entities to perform database operations on behalf of the guest, they will need either an IOmod or WASI environment. Otherwise it will return back to the AssemblyLift host to complete the action (I'm personally torn on what the best choice here is 🙃).

An example Entity definition for an AssemblyLift project might look like the following:

[entity]
name = "user"

[[fields]]
name = "name"
type = "string"
attributes = ["searchable"] # Optional

[[fields]]
name = "email"
type = "string"
attributes = ["unique", "primary"] # Optional

[[actions]]
action = "load"
[actions.cap]
location = "asml/service/service-name/*"
effect = "allow"

[[actions]]
action = "store"
[actions.cap]
location = "dynamodb/region/table-name"
effect = "allow"

In this example we have an Entity representing a user which has defined a name and email field. The entity specifies two actions load and store, each of which have an associated Capability. The load action specifies a location, which is a unique identifier specifying a resource which can perform the action on the Entity (i.e. the Entity says "who can retrieve me?"). Similarly the store action specifies a location indicating where it can be stored (i.e. the Entity says "who can receive me?").

From within our function code, we can interact with an API which abstracts the low-level Entity system implementation.

let user_prototype = Entity::new("user");
match user_prototype.query_all().await {
    ....
}

We see a number of benefits over traditional RBAC/IAM:

Reduced policy complexity: by defining access in terms of database entities, we eliminate the need to maintain unique policies for each function in an application.
Simplified access definition: data-oriented access control decouples resource access from data access, and lets us reduce the surface area of the policy language.
Improved data access transparency: "who can access this data?" is answered by the data's Entity definition.
Consistent entity schema: the shape of an Entity is stored in the module, distributing schema rules to every function which has the Entity.