DEV Community

Marko Djakovic for AWS Community Builders

Posted on • Originally published at marko.dj

Building a Cloud Native Serverless Chat On AWS

I've wanted to write about this for over a year probably, but I kept putting it off for various reasons. I built a real-time chat on AWS by (ab)using the IoT Core service, and I've been eager to share my experience. Recently, AWS announced AppSync Events
and my immediate thought was, "wow, I can replace the IoT Core solution with this!"

...a new solution for building secure and performant serverless WebSocket APIs to power real-time web and mobile experiences at any scale.

Not so fast... It turns out the initial version of AppSync Events has some significant limitations. For starters, there's no way to persist data by integrating with other AWS services. Persistence, along with many other features, is said to be on the roadmap for next year. I'm eagerly awaiting these updates, but until then, let me walk you through how to build real-time apps on AWS using this IoT Core based approach. It's a surprisingly straightforward method that leverages AWS serverless capabilities effectively.

I'll present a cloud architecture for a real-time chat application, accompanied by useful code snippets and infrastructure written in CDK, ready to be deployed and tested in your AWS environment. Along the way, I'll explain the design choices, discuss the options available, and also highlight some caveats of this approach.

Architecture

We are building a simple group chat that will have some pre-defined channels that users can subscribe to and send and receive messages. The application will use IoT Core Topics as the backbone for real-time messaging, with a few additional AWS serverless services to support things like data persistence and authorization. Apart from the real-time segment, the app will have a supporting REST API to fetch existing messages per channel.

Serverless Chat Architecture

This architecture allows us to publish messages and subscribe to topics directly on the IoT Core service. Published messages will automatically be stored in a DynamoDB table and subscribed clients will receive them in real time. As you can see on the diagram, there is also a Lambda authorizer which is needed to allow actions on IoT Core. For fetching previous messages from the database upon connecting, clients will use the get messages API that queries the DynamoDB table fetching messages from a requested channel. I won't go into technicalities of this API as it's not the main focus of the article, and it is a pretty standard serverless API.

Chat message payload looks quite simple as well:

{
  "channel": "general",
  "timestamp": "2018-11-07T00:25:00.073UTC",
  "username": "johndoe",
  "message": "hello world"
}
Enter fullscreen mode Exit fullscreen mode

Before diving into the implementation, let's cover some basic concepts important for understanding the solution.

AWS IoT Core

AWS IoT Core is a pretty powerful and feature-rich service, and the depths of its possibilities are beyond the scope of this article. What's important for us at the moment is to understand that it's essentially an MQTT broker which enables real time communication between connected applications, using Topics, Rules and Actions. In all AWS accounts there is an endpoint in each region that can be used to connect to the broker, and subscribe and publish messages to topics.

Topics, Rules & Actions

Creating a Topic isn't necessary as it is just a logical resource specified as a plain string at runtime. Usually, topic names are divided by slashes into logical parts, for example chat/general. One of the standout features of IoT Core are Rules, which allows you to filter, transform, and route messages to other AWS services through Actions. For instance, you can set up a rule to listen for messages published to a topic, and then trigger an action to store the message in DynamoDB or invoke a Lambda function. Or do a plethora of other things listed under AWS IoT rule actions. I believe this service is often overlooked partly because of its name, and partly because of its apparent complexity, but it can definitely be effective outside of IoT realm. Let's suppose that our application publishes messages to chat/general topic. A rule in CDK can be creatred to automatically store them in a DynamoDB table:

new TopicRule(this, 'SendMessagesToDynamoDB', {
  topicRuleName: 'chat_ddb_rule',
  sql: IotSql.fromStringAsVer20160323("SELECT * FROM 'chat/general'"),
  actions: [
    new DynamoDBv2PutItemAction(tableName),
  ]
});
Enter fullscreen mode Exit fullscreen mode

The code snippet above creates a topic rule, which uses an SQL-like syntax for filtering data from the topic, and defines a DynamoDB PutItem action. The result of the action is a message being stored in the given DynamoDB table. One of the conditions is to have the table key names match the field names in the message payload. If not, the SQL statement could also transform the field names, for example SELECT username AS userId, in case the table key is userId. To go in depths of this very powerful feature, full spec is available at AWS IoT SQL reference.

Authorization

The IoT Core regional endpoint is readily available, and after setting up a topic rule, you might expect to be ready to start publishing messages. Not quite yet. By default, no clients are permitted to connect to the broker. IoT Core provides robust and flexible authorization methods to manage access to its resources. In this example, we use a custom Lambda authorizer, allowing you to implement any custom logic to secure your application. The only condition is that authorizer Lambda's response must include a valid policy object. While there are numerous options for authorization, this article focuses on a simplified approach. For a deeper dive into the available options, refer to the IoT Core Authorization Guide.

To keep things straightforward, this example uses a highly permissive policy that grants full access to the client application by default. The goal is to demonstrate real-time communication and help you quickly test the setup. However, this permissive policy should never be used in a production environment. The Lambda authorizer's response is an IoTCustomAuthorizerResult object, which contains the following policy document:

[
  {
    Version: '2012-10-17',
    Statement: [
      {
        Action: 'iot:Connect',
        Effect: 'Allow',
        Resource: `arn:aws:iot:${region}:${accountId}:client/User*`
      },
      {
        Action: 'iot:Subscribe',
        Effect: 'Allow',
        Resource: `arn:aws:iot:${region}:${accountId}:topicfilter/serverlesschat/channels/*`
      },
      {
        Action: ['iot:Receive', 'iot:Publish'],
        Effect: 'Allow',
        Resource: `arn:aws:iot:${region}:${accountId}:topic/serverlesschat/channels/*`
      }
    ]
  }
]
Enter fullscreen mode Exit fullscreen mode

The policy document consists of three statements:

  • allowing a clients with ids starting with User* to connect to the broker
  • allowing subscriptions to all topics under serverlesschat/channels/*
  • allowing publishing and receiving of messages on the above topics

Connecting by using an MQTT client with an appropriate client id is then quite simple:

mqtt.connect(brokerUrl, { clientId: 'User123' })
Enter fullscreen mode Exit fullscreen mode
Caveats

There are some caveats regarding the authorizer that proved they could be tricky while developing this solution, so I'd like to highlight them specifically.

Caveat 1: Granting the IoT Core service permission to invoke the Lambda authorizer function is essential for it to work. In CDK it can be as easy as:

authLambda.grantInvoke(new ServicePrincipal('iot.amazonaws.com'))

This small detail can be a real pain to troubleshoot, because there are no visible errors or logs, the clients just get rejected. So make sure it's set in case you face connection issues.

Caveat 2: As mentioned earlier, the authorizer response should be of IoTCustomAuthorizerResult type. However, for an unknown reason it didn't work, the policy just wasn't applied. Only when I changed the response to a plain string, and stringified the result did the policy actually apply. Refer to this piece of code in src/authorizer-handler.ts.

return JSON.stringify(response);

I am still not quite sure why, but I remember having similar issues with this in Java as well.

If all this so far feels a bit blurry, don’t worry! The complete code for the authorizer, along with the CDK setup and an example client, is available in the GitHub repository linked in the next section.

Show Me the Code

To demonstrate the chat functionality, I’ve created a simple HTML client powered by some JavaScript. This basic client connects to the IoT Core endpoint, enables publishing and subscribing to channels, and retrieves any persisted messages from the REST API for the selected channel. Each client instance gets a random username in the format of Userxyz, where xyz is a random three-digit number. You can find it, along with the CDK code, in the GitHub repository below. Clone it, deploy it to your AWS account, and start testing right away.

https://github.com/imflamboyant/serverless-aws-chat

Deployment and usage instructions are provided in the repository's README file.

Of course, you're not limited to this example. You can create your own clients using any technology, such as SPA frameworks or mobile applications.

Conclusions

In this article I’ve demonstrated how easy it can be to create a real-time chat application on AWS. I have not covered pricing, but given the cheap combined costs of all the involved AWS services, you can process hundreds of thousands of messages before this solution costs you a few dollars.

What else I like about it is that the extension possibilities are huge. You could extend the API to also manage users and channels, integrate the authorizer with your preferred Identity Provider using JWKS and validate JWTs, or enhance security even more by adjusting the authorizer to check user permissions before allowing subscriptions to specific channels. The chat itself could also be enriched with features like message reactions, threads, and image sharing, or even GenAI capabilities by integrating with Bedrock.

This architecture lays a solid foundation not only for chat systems but for any real-time communication features you might envision.

In the next article, I’ll migrate this solution to AppSync Events, explore options to handle the lack of data persistence, and compare its features with those of IoT Core.

Thank you for reading, and stay tuned for the next article!

Top comments (0)