DEV Community: Samrose Ahmed

How we build a regional cell-based cloud architecture on AWS

Samrose Ahmed — Tue, 12 Apr 2022 00:00:00 +0000

At least for me, building on the cloud gives often gives me unexpected joy. To think that I can deploy servers in Bahrain or Japan from my couch is still something I get excited about!

At Apptrail, we run completely on the cloud, specifically AWS, and one of the main reasons is how easy it is to launch services in new geographical regions. We deploy our services to independent cloud regions and provide our customers with regional endpoints (e.g. events.us-west-2.apptrail.com).

Here are some of our learnings from building a fully regional service on AWS.

What are cloud regions?

Cloud providers have a concept of regions for their cloud. These are isolated geographical regions where their physical infrastructure is hosted and where they offer cloud services. For example, at the time of this article, AWS offers 26 different cloud regions. Software applications built on the cloud can leverage cloud regions to also build a concept of regions in their applications to improve their services.

Cellular architecture

To understand regionality, it's good to understand cellular architecture. Cellular (or cell-based) architecture is a way of separating systems into isolated cells to reduce the blast radius from something going wrong. The idea is that a failure shouldn't ever cross cells, and so cellularization is a way of creating those independent partitions. The actual method of choosing a cell can be anything, from random based on ID to logical like geographical region. You can also have nested cells for further isolation (e.g. Large volume customers in US West region could be a particular cell). AWS Availability zones, for example, are also sub-regional cells. All in all, cellularization is a very powerful way of designing available systems, you can learn more about it here.

Why regions exist

Regions, then, are just cells that are based on geographical region. Regions serve several purposes. From an availability and reliability perspective, regions are at the center of a cloud's reliability strategy. In a cellular architecture, regions serve as the first and most principal cells. They ensure that there is a logical containment of resources that are isolated from others. This helps prevent widespread outages.

Regions do refer to actual geographic regions, so this is also an important part of why they exist. One benefit of this is latency. Providing a way for customers to ensure that their workloads run in a particular region helps them with running workloads close to their customers.

Another important usecase is compliance and data sovereignty. Many countries are passing legislation requiring user data and other sensitive data to be physically stored in locations under their jurisdiction. Cloud regions are a way to ensure compliance with such regulations.

Multi region architectures

Because cloud providers do the hard work of making sure all their services are available in isolated regions, one benefit customers get is being able to run multi region workloads. Multi region architecture is a way to get another layer of reliability beyond multi availability zone. You can, for example, run a load balanced service across multiple regions using Route 53 for latency based routing and health check based failover. One can also use regions for data preservation and backup, e.g. with Amazon S3. Region wide outages are pretty rare at AWS so active multi region for services is likely overkill (as it may also come with downsides like cross region data transfer), but everyone has their availability requirements and multi region architectures are a powerful tool to leverage for increased availability.

Regionalized architectures

A regionalized architecture is one where the regionality is part of the interface of the service. For example, in regional architectures, one has dedicated regional endpoints that clients use for each supported region. Each regional service in a regional architecture should be independent of other instances running in other regions.

As an example, say we have an image processing API. Our clients use this API to upload, process, and retrieve videos. To regionalize this API, we can deploy the API to multiple regions and offer endpoints like region1.videoapi.example.com and region2.videoapi.example.com for our clients to use.

Difference between multi region and regionalized architectures

Comparison between regional (left) and multi region architectures

The key difference between multi region and regionalized architectures is that in regional architectures, the region is part of the contract of the service, whereas in multi region architectures it is solely part of the implementation. In a multi region architecture, regions are used as cells for increased availability or as a failover. There is no guarantee that a specific customer's requests will run in a specific region. Rather, the customer doesn't even have to know about the concept of region vis-à-vis the service. In a regional architecture, however, the region is part of the public API, and customers choose or are informed of the region they are associated with.

Choosing regions for a regional architecture

When building a regionalized architecture, one needs to pick a list of geographical regions to support and a methodology to assign them. You're likely deployed on a cloud, so a simple decision is to choose your regions to correspond to cloud regions. This is what we do at Apptrail, for example. One can rename them, or just use the same names, which is what we do at Apptrail ourselves.

However, a one-to-one mapping is not necessary. You can also come up with regions that correspond to multiple cloud regions. For example, a region structure could look as follows:

Region	AWS Region(s)
us	us-east-1, us-west-2, us-east-2
eu	eu-west-1, eu-west-2
jp	ap-northeast-1, ap-northeast-2
in	ap-south-1, ap-south-2

Regionalization scheme using multi region cells

Such a scheme gives one some benefit of each application region consisting of multiple cloud regions, which can improve availability.

When to use a regional architecture

Most services likely don't need to be fully regionalized. Unlike multi-regionality, which is solely a technical engineering decision, choosing to regionalize your public APIs or services is more so a product decision. It should generally serve some need or purpose for your customer. We observe common usecases:

Data residency requirements

Regionalization of your services is a way of allowing your customers to ensure their data is stored and processed in a specific region. This is often requested due to regulatory reasons.

Latency

For other applications, minimizing client latency is highly important. Keeping requests in the region closest to a customer is a way to achieve this. However, as keeping requests in region is not inherently a strict requirement, complete regionalization may or may not be the correct approach here, as a single endpoint with latency based routing may also fulfill the requirements. The desired experience of the customer is important to consider here (e.g. do you want your customer to have to think of regions?).

Choosing components to regionalize

When evaluating a regional architecture, a natural question is what components to regionalize. Specifically, which services to have a publicly regional interface for. There are several considerations here, and the answer goes back to why one is using a regional architecture in the first place. A useful heuristic we use is one between control planes and data planes. Generally, control plane services shouldn't be regional, but dataplane services can be. To understand why, we may consider the common actions we do in the control plane: e.g. billing, user management, or other global configuration. These often have single region dependencies and are relatively infrequent. For us, it's neither important nor desirable to have these be regionalized. On the other hand, our data plane processes that process and store important data are regionalized.

Note that while this is a general guideline that we ourselves have found useful, it's in no way prescriptive. There are many cases where one may make the control plane regionalized as well. In general, the reason for regionalization should be central to this decision. One should consider how regionalizing a service will help or prevent them from achieving that goal.

How we deploy to many regions

Infrastructure and deployment automation is key to being able to maintain consistency, reliability, and understandability while deploying many services across different regions. Thankfully, infrastructure as code solutions, particularly the AWS CDK, make this much easier.

Staged deployment using Waves

We use a concept of waves when deploying software at Apptrail. Going back to cellular architecture, a wave is essentially a set of cells to deploy to concurrently. For example, currently, our waves are:

Wave	Apptrail Regions
Wave 1	ap-south-1
Wave 2	eu-west-1
Wave 3	us-west-2, us-east-1

Our current waves at Apptrail

Deploying to one wave at a time ensures that faulty changes don't cause global outages. Combined with bake times (adding wait times between waves to monitor for degradation) and automated rollbacks, waves can help ensure that you don't have catastrophic failures. We've found them to be very useful for staging our changes. You can learn more about waves here.

Infrastructure as Code - CDK

Completely automating your infrastructure is essential when deploying to many regions since one is duplicating one's entire application and services each time they deploy to a new region. We use the AWS CDK to help us do this. CDK is a wrapper around CloudFormation that lets you write your infrastructure as real code (e.g. we use Typescript). It makes building reusable abstractions (called constructs) as easy as writing a class or a function (Learn more about it here, we think it's one of the coolest things AWS offers!).

The CDK also comes with useful high level abstractions out of the box, so you don't have to reinvent the wheel.

A pipeline deploying one of our regional services to one of our waves

For example, we use CDK Pipelines for deploying all of our infrastructure at Apptrail. In CDK Pipelines, stages and waves are supported natively so you can easily deploy cellular applications. At Apptrail, we've developed a standard Pipeline construct that sets up our standard waves and makes creating a regionalized service conforming to Apptrail regions and waves very simple.

Other considerations around regional architectures

Here are some other things we've learned building regional applications on AWS.

Use separate AWS accounts per region

This is general AWS best practice but reinforces thinking of each instance of a service deployed in a region as separate with clearly delineated blast radiuses.

Managing cost

Regionalization inherently comes with some additional cost. However, when an application is architected correctly, the additional cost due to regionalization should mainly be a fixed cost per region, which is generally negligible at scale. Regionalized services shouldn't generally have excessive have cross-region data transfer (databases and other global data is often the exception, but this should be low cost).

Global data

A common consideration when evaluating a regionalized or multi region architecture is what to do with global data. There's no one answer here. Keeping in mind our earlier discussion on choosing components to regionalize and data planes & control planes, there are several ways we could deal with this. For example, we could keep global data in one control plane region, and our regionalized data plane services can depend on this control plane. This is what we use ourselves at Apptrail. However, this may or may not be acceptable. If latency is the primary motivation for regionalization, then this is less than ideal. There are several ways to deal with this, including DynamoDB Global Tables and database replication, that are beyond the scope of this article (see this video for some more information). On the other hand, if the main reason for regionalization is data residency, and our control plane only stores non-sensitive configuration, then this is a more fitting approach.

Conclusion

As it gets easier and easier to deploy to the cloud, and with increased concern about data sovereignty, regional architectures are becoming more common. This gave you an overview of how we easily maintain our regional services on AWS at Apptrail. You might not need regionalization for your next project, but I hope this was helpful and informative.

What do you think of regionalization, do you use it in your applications? Feel free to reply.

The difference between internal and customer facing audit logs

Samrose Ahmed — Mon, 07 Mar 2022 00:00:00 +0000

As a software company, you likely store audit logs internally for debugging, security, and compliance. But can your customers access these audit logs self service?

Learn about the difference between storing audit logs internally and offering your customers self service access to their own audit logs.

What are audit logs?

Audit logs are a record of activity in an application. They help answer questions like who, what, where, when, and how a specific action occurred, and what resources it affected. They are used for debugging, security, monitoring, and compliance.

Internal audit logs

A software company should store audit logs internally as a best practice. They are needed to be for developer debugging and also to be able to answer support questions like what happened during a specific incident. Maintaining audit logs is also required for compliance with common standards like SOC II.

Internal audit logs are generally stored in log providers like Cloudwatch or DataDog, or object stores like Amazon S3. They are only accessible by employees of the software company, preferably specifically the security or DevOps teams. An employee should be able to query the logs to extract specific data in response to common questions like who performed a specific action.

In a multitenant SaaS company, each audit log is likely associated with a specific tenant. For example, if we examine an audit log for a DeleteUser operation, the audit log will contain information about which tenant the operation was related to. However, employees generally have access to all tenants' audit logs.

How can a customer access their audit logs?

In a situation where a SaaS company is following best practices and storing audit logs for all actions in their application, how does a customer of the company access audit logs relating to their account? Because the audit logs are internal, not partitioned by tenant, and not accessible except to internal employees, the process for a customer to access their audit logs is manual. A common flow is a customer creates a support ticket, an engineer or support associate queries the internal logs, and posts the information back to the customer in the ticket. Companies often build internal tooling to make this process easier for their employees.

Disadvantages of only maintaining internal audit logs

Exclusively storing audit logs internally without offering a way for customers to easily consume those audit logs comes with many drawbacks. The SaaS company is essentially serving as a human proxy to the internal audit logs system. First, it is reactive, meaning the customer can only request their audit logs after an event has occurred, when often they need them most when the event is occurring (or even before to perform security monitoring). Second, the entire process is manual, which wastes both the SaaS company's and customers' time and severely limits the number of audit logs that the customer can request. Thirdly, it prevents customers from being to extract full value out of their audit logs, and keeps customers from having visibility into activity in their account.

External, or customer facing, audit logs

External audit logs are audit logs that the customers of a SaaS product can access self service.

A key first requirement here, naturally, is that each customer is only able to access their own audit logs and not any other customer's audit logs. This requires storing the audit logs so that they are partitioned by customer. It also requires building an API service that lets authenticated customers query their own logs.

Such a system allows a SaaS customer to access their audit logs automatedly without needing to involve the SaaS company.

Advantages of self service audit logs

Being able to access audit logs self service opens many usecases for SaaS customers. For example:

The customer is free to query and retrieve as many audit logs as they want. Ideally, they should be able to continuously export all their audit logs out of the SaaS for analytics, monitoring, or archival.
Both the SaaS owner and SaaS customer do not need to expend manual effort to exchange audit logs.
Customers can build proactive security monitoring systems that use audit logs for threat detection, using SIEM or other tools. This allows customers to make realtime use of their audit events.

In general, externally facing audit logs give SaaS customers full insight into their SaaS activity. This enables them to consume and use SaaS audit logs themselves for security monitoring, auditing, risk management, or any other use that requires audit events.

When should I use internal or external audit logs?

Internal and external audit logs are not mutually exclusive, rather they are complementary.

When evaluating using internal or external audit logs, the answer is fairly simple. Every company should maintain internal audit logs. This is a best security practice, and is mandated by compliance standards like SOC 2. It is required for debugging and to be able to answer questions in the case of an incident.

SaaS companies should also offer their customers a way to access their own audit logs self service. This is the best way to ensure that your customers have the best security posture, and to prevent yourself from having to deal with manual audit requests.

Summary

To summarize, SaaS companies should store audit logs internally and should also offer a system for their customers to easily access their own audit logs without a manual process. To outline the differences between internal and external audit logs:

	Internal audit logs	External audit logs
Used by	SaaS company employees	SaaS customers
Implementation	Log providers, object stores	Custom: API layer + data store or managed (Apptrail)
Customer access	Manual	Self service

We can see that internal and external audit logs serve two different purposes and that internal audit logs are not a substitute for external audit logs. Internal audit logs are a baseline measure that lets the employees of a SaaS company audit all API and user activity whereas external, or customer facing, audit logs allow customers to access their own audit logs.

If you're an engineer or owner working on a SaaS product, keep in mind building externally facing audit logs for your customers.

S3 POST Policy - The hidden S3 feature you haven't heard of

Samrose Ahmed — Mon, 14 Feb 2022 00:00:00 +0000

Say you're building an application and you need to let your users upload files to S3. How would you go about it?

Particularly, imagine our clients are uploading a lot of files of different sizes, and are sensitive to latency.

Let's walk through a journey of AWS APIs, and explore a little known feature of S3 called POST Policies.

Presigned URLs

Your immediate instinct may be to use S3 presigned URLs. Presigned URLs let you create a URL that you can share and allow a user to download or upload to an S3 bucket. You create the presigned URL server side using IAM credentials that have the valid S3 permissions and then share the URL to allow user actions. Clients simply use HTTP clients to connect to the URL. You can set an expiry time to any date time to ensure the access is short lived and you can also attach an IAM policy to the presigned URL to limit the permissions the client has.

All in all, presigned URLs are pretty powerful and sound like a great choice for allowing credential-less S3 actions. They have one major limitation, however.S3 presigned upload urls require you to know the Content length before hand. Because of the way presigned URLs work, using AWS4 Signatures, the Content-Length is a required component when generating the presigned URL. This means we can't return one presigned URL and allow our clients to upload objects of variable size while it is not expired.

As a workaround, we can have the client request a presigned URL every time they want to perform an upload. This is not necessarily a big deal, and may be perfectly suitable for many scenarios.

However, this results in an additional call for every upload and may not be desirable. For example, say we want our clients to have a short lived session where they can upload a large number of objects, and latency is important.

A new API using temporary credentials

Presigned URLs don't seem to work well for our usecase, so let's try a new approach. AWS IAM offers APIs to request temporary security credentials. There's a few different APIs, but let's see if we can use AssumeRole to return temporary credentials to our client.

As an approach, we can call STS AssumeRole server side and return temporary IAM credentials to our client. We can use IAM session policies to limit the S3 permissions the client has access to. We can also use the DurationSeconds parameter to limit the validity of the credentials, but only up to a minimum of 15 minutes.

Our clients would then use the credentials and upload files using the AWS SDK. If we're offering this as a part of our API, we'd likely want to write a language native client that wraps the AWS SDK and takes care of refreshing the credentials.

Security-wise, you may feel icky having to return credentials using your API, but the approach is generally sound as long as the credentials are short lived and you are giving access to authenticated callers with narrow permissions.

This approach works, but besides having to implement this API, comes with several downsides:

Dealing with unsigned credentials.
Our client needs to depend on heavyweight AWS SDKs, rather than a simple HTTP Client.
We can't control the size of the object our client uploads. This can be a concern with untrusted or semi-trusted clients, who could upload very large files to our S3 bucket.

Enter POST Policy

We're dissatisfied with our previous approach. Let's explore a lesser known Amazon S3 feature: POST Policy. You might be thinking, "POST?, isn't it S3 PUT object?", and you're right, but Amazon actually introduced a POST API for uploading S3 objects to enable browser based S3 uploads.

As a note, you'll generally hear of POST Policy in the context of browser based uploads, but there's nothing inherent preventing us from using it in any environment.

A POST policy is essentially a JSON document that you create, sign, and return to your clients to specify what conditions are required for a successful POST object upload.

What does a POST policy look like?

POST policies are fairly powerful: you can specify the exact date time the policy expires and include conditions on properties like the ACL, Bucket, Key prefix, and Content length range (minimum and maximum). You can also use operators like "starts_with" in addition to exact matches to add dynamic logic to your policy.

Let's take a look at an example POST policy.

{
  "expiration": "2022-02-14T13:08:46.864Z",
  "conditions": [
    { "acl": "bucket-owner-full-control" },
    { "bucket": "my-bucket" },
    ["starts-with", "$key", "stuff/clientId"],
    ["content-length-range", 1048576, 10485760]
  ]
}

The policy results in the following conditions:

The policy expires on Mon Feb 14 2022 13:08:46 UTC. After this time, any client using the policy to perform a POST upload will get a 403 error.
The ACL on the object must be Bucket owner full control, which ensures we, as the bucket owner, have full control of uploaded objects.
We specify a specific bucket by name (my-bucket) to allow uploads to.
The S3 key of the uploaded object must have a specific prefix. Here, we use it ensure our client only has permission to upload under their prefix by specifying their client ID as the key prefix.
The uploaded object must be between 1MB and 10MB in size.

Signing the policy

After forming a POST policy, we have to sign the policy using valid IAM credentials (with the requisite permissions), similar to how we sign presigned URLs. You can view the complete procedure on calculating the signature here but it essentially involves Base64 encoding the policy and signing it using AWS SigV4. Unfortunately, unlike presigned URLs, the AWS SDKs don't provide helper methods to create the POST Policy. You can write it yourself or consult the few examples and community libraries out there. If you're using Java/JVM, check out Minio's implementation as a well maintained reference.

Letting our clients perform POST uploads

Once we've created the POST policy, our client's can use the POST policy to perform POST S3 uploads. S3 POST uploads are multipart form data requests to the S3 Bucket URL (e.g. https://examplebucket.s3-us-west-2.amazonaws.com/) containing the key's specified in the POST policy. As a code example in Python:

from uuid import uuid
import requests

# Add an endpoint for the client to request a POST Policy
post_policy_form_data = requests.get("/postPolicyFormData")
post_policy_form_data = {
  "x-amz-date": "20220213T233352Z",
  "x-amz-signature": "efa9bbc<...>",
  "acl": "bucket-owner-full-control",
  "x-amz-security-token": "<...>",
  "x-amz-algorithm": "AWS4-HMAC-SHA256",
  "x-amz-credential": "ASIA<..>.",
  "policy": "eyJleHBpcmF...<base64 encoded policy>",
  "Content-Type": "application/json"
}

filename = str(uuid4()) + ".json"
key = os.path.join("client_id", filename)
content = "file_content"
multipart_form_data = { **post_policy_form_data, "key": key, "file": (filename, content) }

upload_url = "https://examplebucket.s3-us-west-2.amazonaws.com"
res = requests.post(upload_url, files=multipart_form_data)

POST policies satisfy all of our requirements.

We are using signed policies without raw credentials.
Our clients can make HTTP requests without the AWS SDK.
We can granularly control the expiration, permissions, and object properties,
Our clients can upload objects as long as the POST policy is not expired without needing to make additional requests.

Conclusion

We took a short look at the S3 object upload landscape, and discovered a powerful feature called POST Policies.

Usecases

Lets take a look at some usecases that POST policies unlock:

Browser based uploads

Particularly for large sized files, POST policies provide a convenient way to enable your client's to upload files client side, without needing to go through a server proxy. This can be convenient if you're using API Gateway or Lambda which have content size limits. Additionally, this can give your client better upload speed.

Short lived low-latency upload sessions

As we discussed, we can use POST policies to let our clients maintain a short lived, controlled session with specific permissions where they can upload many objects of variable size, without any additional latency besides S3 latency.

Guidance

As a takeway, if you are looking to incorporate S3 object upload from your clients in your application, follow the general guidance:

Prefer presigned URLs. They're simpler, are already supported in the AWS SDKs and are well documented.
Use POST policies otherwise. As we discussed, if latency is important, or you want form based browser uploads.
Don't use the second approach we discussed (using AssumeRole). You can generally achieve the equivalent using a POST policy.

Are you using POST policies, or do you know of an interesting usecase they enable? Feel free to reply.

What makes a good audit trail?

Samrose Ahmed — Sat, 05 Feb 2022 00:00:00 +0000

At Apptrail, we obsess over audit trails and how to make them most valuable for our customers, so we thought we'd examine some real world examples of audit trails and see what makes some stand out from others.

There's a lot of details and requirements that go into building an audit trails solution, from availability and immutability to security and delivery, but, let's examine some of the features that differentiate solutions from one another.

What's an audit trail anyway?

If you're already familiar with audit trails, feel free to skip this section.

An audit trail is a way to record user or API activity and surface that information. Generally, an audit trail lets admins or users answer the Who, When, Where, and What of an action. Audit trails can be used to monitor suspicious activity or replay activity in the aftermath of an event.

We use audit trails, audit logs, and audit events here interchangeably.

Some real world examples

Lets take a whirl through some popular tools that offer audit logs.

Google Admin

Google Workspace has a mature audit logs offering. Let's see what we can do:

Immediately, we can see all the recent actions that took place with Google workspace. The information contains the event name, description, time, who did the action (actor), and the IP address associated with the actor. The data is filterable and offers a basic CSV export through the UI. Google offers several audit logs, and data retention (how long the data is stored), ranges from 6 to 15 months.

Stripe

Stripe uses logs in a few places. Let's take a look.

Security history

These events contain important actions, and similarly add information about the when, who, and where of the action. The results are accessible in the Dashboard UI, and exportable to CSV.

Request logs

Stripe also offers developer oriented request logs. These are often used for debugging but are also essentially audit logs.

The Stripe request logs contain full request, response, and context data for every HTTP request made to the API. They're viewable through the Stripe Dashboard UI and filterable on each dimension. Stripe request logs have a 15 month data retention period.

Github

Github offers a pretty full featured audit logs solutionto its Enterprise customers. You can access Github audit logs by 1) using the web UI, 2) polling with the REST API, and 3) streaming to destinations like S3 or Splunk using their audit log streaming feature.

Streaming audit logs is an important feature that a lot of audit logs solutions lack. It unlocks a lot of usecases, like being able to explore a large amount of data or retaining ownership over data, that a UI or API based approach don't allow.

AWS CloudTrail

AWS offers audit logs for most of its services using CloudTrail. You can query audit logs from the Console UI and using the AWS APIs. You can also deliver AWS audit logs to your S3 bucket or CloudWatch logs group. CloudTrail offers a pretty limited (heavily paginated and throttled) LookupEvents API to query audit data but in general nudges you towards sending audit logs to S3.

And more

There's many more software services offering audit logs. For the sake of brevity:

Zendesk offers audit logs through a UI, with filters, CSV export, and a REST query API. Logs are retained for 1 year, which is not configurable.
1Password offers a UI based activity log with viewable and searchable activity including event, actor, and date, but omitting context such as IP. The activity log has a non configurable retention of 6 months. Additionally, they offer a more extensive REST Events API for programmatic access and access to more audit events.

Comparison

That covers a bit about audit trails, how customers use them, and what elements make them most useful to customers. Summarizing our survey:

	Filterable?	Exportable?	API access?	Streaming?	Configurable retention?
G Admin	✔️	✔️	❌	❌	❌
Stripe	✔️	✔️	❌	❌	❌
1Password	✔️	✔️	✔️	❌	❌
Zendesk	✔️	✔️	✔️	❌	❌
Github	✔️	✔️	✔️	✔️	✔️
CloudTrail	✔️	✔️	✔️	✔️	✔️

We can see audit trails vary on several dimensions:

Self service access

Admins being able to access the audit logs themselves, without needing a manual request, is the first step to a customer facing audit trail. At the bare minimum, a web UI to explore audit logs should be provided.

Exportability

Viewing audit logs in a UI is good for one off usecases, but users often want to export the data for analysis with other tools.

Programmatic API access

Users want to be able to interact with their data programatically, for scripting, workflows, etc. The API should allow for querying events by time and filtering on fields. These APIs are generally paginated and throttled, as there is an unbounded number of audit logs.

Audit log streaming

When there's a large amount of audit logs, a poll based API is not sufficient, and users will want to have their data pushed into tools like S3 or Splunk for data analysis.

Data retention

Audit logs must be immutable to protect their integrity, but they are usually retained for a period of time, ranging from months to years. Ideally, this should be configurable by the user. Adding audit log streaming also automatically enables this, as it gives the customer ownership of their audit data (essentially unlimited retention).

Conclusion

That was a short overview of some popular tools and how they offer audit trails to their customers. We can see there are a range of different features that go into an audit logs solution, and software providers vary in what they currently offer.

Do you have any examples of great audit logs? Feel free to share.

Interested in adding world class audit logs to your own SaaS? Apptrail is fully managed Audit trails as a Service. Learn more or Get started.

DEV Community: Samrose Ahmed

How we build a regional cell-based cloud architecture on AWS

What are cloud regions?​

Cellular architecture​

Why regions exist​

Multi region architectures​

Regionalized architectures​

Difference between multi region and regionalized architectures​

Choosing regions for a regional architecture​

When to use a regional architecture​

Data residency requirements​

Latency​

Choosing components to regionalize​

How we deploy to many regions​

Staged deployment using Waves​

Infrastructure as Code - CDK​

Other considerations around regional architectures​

Use separate AWS accounts per region​

Managing cost​

Global data​

Conclusion​

The difference between internal and customer facing audit logs

What are audit logs?​

Internal audit logs​

How can a customer access their audit logs?​

Disadvantages of only maintaining internal audit logs​

External, or customer facing, audit logs​

Advantages of self service audit logs​

When should I use internal or external audit logs?​

Summary​

S3 POST Policy - The hidden S3 feature you haven't heard of

Presigned URLs​

A new API using temporary credentials​

Enter POST Policy​

What does a POST policy look like?​

Signing the policy​

Letting our clients perform POST uploads​

Conclusion​

Usecases​

Browser based uploads​

Short lived low-latency upload sessions​

Guidance​

What makes a good audit trail?

What's an audit trail anyway?​

Some real world examples​

Google Admin​

Stripe​

Security history​

Request logs​

Github​

AWS CloudTrail​

And more​

Comparison​

Self service access​

Exportability​

Programmatic API access​

Audit log streaming​

Data retention​

Conclusion​

What are cloud regions?

Cellular architecture

Why regions exist

Multi region architectures

Regionalized architectures

Difference between multi region and regionalized architectures

Choosing regions for a regional architecture

When to use a regional architecture

Data residency requirements

Latency

Choosing components to regionalize

How we deploy to many regions

Staged deployment using Waves

Infrastructure as Code - CDK

Other considerations around regional architectures

Use separate AWS accounts per region

Managing cost

Global data

Conclusion

What are audit logs?

Internal audit logs

How can a customer access their audit logs?

Disadvantages of only maintaining internal audit logs

External, or customer facing, audit logs

Advantages of self service audit logs

When should I use internal or external audit logs?

Summary

Presigned URLs

A new API using temporary credentials

Enter POST Policy

What does a POST policy look like?

Signing the policy

Letting our clients perform POST uploads

Conclusion

Usecases

Browser based uploads

Short lived low-latency upload sessions

Guidance

What's an audit trail anyway?

Some real world examples

Google Admin

Stripe

Security history

Request logs

Github

AWS CloudTrail

And more

Comparison

Self service access

Exportability

Programmatic API access

Audit log streaming

Data retention

Conclusion