Think of Hivelight as Jira for law firms.
As a productivity management system, Hivelight deals with lots of user events generated by users creating tasks, moving tasks across the Kanban board, assigning them and a whole lot more actions. All these events need to be stored as need to be able to provide a history of what happened and when it happened for every task, comment, milestone etc, a user has touched or seen.
We also need to be able to react to these events. For instance, when a user assigns a task to another user, we need to let the assignee now this task has been assigned to them. Or when a task is completed, we need to update the progress status of the milestone it belongs to.
Fortunately, our application is serverless and we are using the Serverless Framework to orchestrate the deployment. This makes building event-driven applications very easy.
Most of our data is stored in DynamoDB tables and DynamoDb Streams will tremendously help us. We will also use EventBridge to create an application event bus.
I will try to explain how we deal with events by over simplifying out setup.
Producing events
Let's imagine a user has been working on a task and marks it as done. This updates the task's status on DynamoDB's Task table. This in turn, puts the old task with the old status (OLD_IMAGE) as well as the new task with the new status (NEW_IMAGE), into the DynamoDB Stream.
This means it is easy for us to determine what has changed by creating a Lambda function that ingests that DynamoDB Stream.
Let's imagine this DynamoDB Stream record:
{
OLD_IMAGE: {
id: 1234,
name: "My task",
status: "IN_PROGRESS",
lastModifiedBy: "Alice",
lastModifiedDate: "2023-02-21",
},
NEW_IMAGE: {
id: 1234,
name: "My task",
status: "DONE",
lastModifiedBy: "Bob",
lastModifiedDate: "2023-03-29",
}
}
We can create a simple algorithm that will make sense of what has changed.
Here, we can determine that the status has changed from IN_PROGRESS to DONE. We also note that the control fields have been updated. This means we can determine not only that the status has been updated, but who has updated it, and when.
This information is enough for us to create an event. That is what the Lambda function does by ingesting changes to the DynamoDB table and creating respective events.
Using the DynamoDB Stream record example above, we could create an event that looks like that:
{
name: "task.completed",
taskId: "1234",
user: "Bob",
date: "2023-03-29"
}
Storing events
Before sending this event to the event bus, let's save it so we can query it later.
This action is simply done by saving this event into a another DynamoDB table called Event. This table has many indices to allow us to very quickly query the events by user, date, event name, etc.
As a side note, I have been interested in Amazon Timestream since its private launch was first announced. It has finally been generally available but its costs and write operations latency made it unsuitable for our budget and technology. We're are being charged by the millisecond when using Lambda functions and a 500ms latency for every write request to Amazon Timestream doesn't cut it for our stage of application growth.
In comparison, storing and querying the event in DynamoDB table is fast, simple and cost effective for our workload type.
Ingesting events
Once the event is stored into the DynamoDB Event table, we can then have a Lambda function that ingests the DynamoDB Stream record as an event, and publishes it to the EventBridge's event bus we named ApplicationEvents.
We can now listen to this event anywhere in our application. For instance, we want to trigger a Lambda function when a task status gets updated, we can specify that in our Serverless Framework definition file (serverless.yml) as such:
onTaskCompleted:
handler: src/functions/onTaskCompleted.handler
events:
- eventBridge:
eventBus: arn:aws:events:${aws:region}:${aws:accountId}:event-bus/ApplicationEvents
pattern:
source:
- task.completed
Whenever a task's status is completed, the onTaskCompleted Lambda function will be executed with an event that looks like:
{
version: "0",
id: "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
"detail-type": "ApplicationEvent",
source: "task.completed",
account: "123412341234",
time: "2023-03-29T17:31:09Z",
region: "us-east-1",
resources: [],
detail: {
name: "task.completed",
date: "2023-03-29"
taskId: 1234,
user: "Bob"
}
};
Neat!
We have seen how creating, publishing and subscribing to application events is made easy with DynamoDB Stream, Lambda and EventBridge event bus. This opens so much possibility by decoupling event producers from event consumers. This allows for cross-service, cross-application and even cross-AWS-account reliable, cheap and efficient communication.
But what about querying?
Querying events
We have all the events sitting on a table, and that can help us extract valuable information about user behaviour, application usage, etc.
Let's imagine a scenario where we would like to know how much task activity is produced by a user as well as how many tasks have been changing status per day for a given user.
We could easily get that by querying out Event table with these queries:
Get all activity for a user where from and to are timestamps:
{
TableName: "Event",
KeyConditionExpression: "user = :user AND #date BETWEEN :from AND :to",
ExpressionAttributeValues: {
":user": user,
":from": from,
":to": to
},
ExpressionAttributeNames: {
"#date": "date",
"#name": "name"
},
ProjectionExpression: "#date"
};
Get all task-related activity for a user where from and to are timestamps:
{
TableName: "Event",
KeyConditionExpression: "user = :user AND #date BETWEEN :from AND :to",
ExpressionAttributeValues: {
":user": user,
":from": from,
":to": to,
":event1": "task."
},
ExpressionAttributeNames: {
"#time": "time",
"#date": "date"
},
ProjectionExpression: "#date",
FilterExpression: "begins_with (#name, :event1)"
};
Get how many tasks have been completed or started for a user where from and to are timestamps:
{
TableName: "Event",
KeyConditionExpression: "user = :user AND #date BETWEEN :from AND :to",
ExpressionAttributeValues: {
":user": user,
":from": from,
":to": to,
":event1": "task.completed",
":event2": "task.started"
},
ExpressionAttributeNames: {
"#date": "date",
"#name": "name"
},
ProjectionExpression: "#date",
FilterExpression: "begins_with (#name, :event1) OR begins_with (#name, :event2)"
};
We are only retrieving the time of the event via the ProjectionExpression value as we are not interested in getting the actual event. We are only counting events per day by doing something like:
const dailyCounts = events.reduce((acc, event) => {
const date = event.date;
if (!acc[date]) {
acc[date] = 1;
} else {
acc[date]++;
}
return acc;
}, {});
This should give us something like
{
...
"2023-03-27": 23,
"2023-03-28": 45,
"2023-03-29": 22,
...
}
This makes it easy to create activity visualization such as the GitHub activity-like calendar chart to better understand activity patterns and trends of specific user, or entity buy querying the Event table and counting the number of occurrences per day.
For instance, the screenshot below shows all activity events daily produced by me:
Or show my log-in events only
What do you think? What are you using to record user and system generated events. Have you been successful in implementing Amazon Timestream? Let me hear your thoughts.
Top comments (0)