Sanket Barapatre

Posted on May 17, 2021

Getting Asynchronous with SQS and SNS

#aws #eventdriven #microservices

Preface

Developing microservices architecture is a really good way to ensure that we get independently deployable which are easier to manage, scalable, develop and are fault tolerant.

A business functionality is generally achieved using meaningful and validated communication between multiple microservices which in-turn interact with their respective databases systems persisting meaningful state.

The Why

To think of its disadvantages, about the bottle-neck and the points of failure, we can pin point several areas including:

microservices failure to process due to business logic not updated, request validation failure, unhandled conditions, faulty business logic, internal feature disabled, also including failure in its internal connection to database
availability of microservice due to restarts, re-deployment on the microservice. e.g. pod restarts, new features or enhancement deployed in a microservice.

Having synchronous communication between such microservices could be dangerous if

Any deployed microservice restarts, or is re-deployed.
High latency, network failure in reaching the microservice or a general non-availability of such microservice.

A chain of REST API calls from microservice to another one, could lead to ultimate failure of the execution of the corresponding business functionality.

The What

In order to ensure a relaxation to microservices so that they can restart, take time to respond to a REST API call, one way is to make the architecture loosely coupled and have them talk to each other using events rather than typical request-response HTTP calls.

HTTP calls can timeout occasionally due to network failure, high latency, or microservice being re-loaded, rate limit exceeding on microservices or some other general downtime of a microservice.

The How

Such a simple mechanism to achieve is by using, SQS AWS Simple Queue Service and Simple Notification Service as means of communication.
In a nutshell,

A microservice sends out a meaningful and short message to a SNS topic.
A SNS topic can be subscribed to send message to various SQS listened by each microservice. Hence fan-out of a message.
A microservice can listen to an incoming message using a SQS listener module.

Such a system ensures easy fault tolerance as well, since we can monitor the complete flow of message generation from source, to final consumption by a microservice.

The Benefits

Traceability & Fault Tolerance

Ensuring traceability by using following properties in the envelope of a message,

eventId: each message (or event) generated by a microservice has a unique id, preferably UUID which ensures uniqueness of a event and store it.
traceId: Each business flow involves multiple messages generation in a single flow, e.g. place order, cart checkout, payment processing, order placed. Such a flow that generates a message in response to an incoming message, can have same traceId for all messages so that a complete flow can be traced. A traceId is generated at source and is passed on to new messages generated as a result.
spanId: a spanId is similar to a traceId except it need not cover all messages. It is an additional safety net that spans over two events, which are linked together. e.g. if a microservice consumes a message A and sends out message B, they have same spanId, so we know these messages are linked.
version: every message that we consume can have a version. In case of a breaking change in a message, we could upgrade the version allowing it to be processed differently and also enable version as a means of comunication breaking, major or even minor changes in an event.
context: the business context which gives a rough idea and a meaning to the event. Suppose, ORDER_PROCESSING, could be a business context for the flow for messages that cover messages where customer selects means of payment, processing of actual payment happens and updating status of payment.

This is how an event can look like:

{
"eventId":"518c9aac-b6b9-11eb-b34e-8f1bd39e6f13",
"traceId":"57df4c56-b6b9-11eb-a0a2-f72362d4cbcb",
"spanId":"5e9245a8-b6b9-11eb-9ced-935329a9daeb",
"version":"1.1.0",
"context":"ORDER_PROCESSING",
"data":{
        "name":"customer name",
        "items":"I bought this"
    }
}

Testable

Having SQS and SNS as a means of event driven communication ensures testability since each microservice is a black box consuming a message and producing another. Hence multiple scenarios can be tested using different incoming messages and testing various outgoing message.
Also, AWS provides a means of sending a test message in SNS, or SQS so that we can test a deployed application as well as replay some messages which may have failed.

Flexibile with microservice availability:

Introducing DLQs: Dead Letter Queues

A microservice can have some downtime, and all the messages that it was supposed to consume can be stored in a AWS SQS. Once the service is up, it can start consuming messages from where it left off and keep on working.
If a microservice takes time to load, then messages from SQS after being replayed for a configurable number of times, is send to a Dead Letter Queue (or DLQ).
This DLQ stores all failed messages and can be pushed back to the main queue to replay it.
Also in case if a microservice is not able to process a message due to feature currently not available, message is broken or invalid, then such a message is retried and pushed back to the DLQ. Here, we can later analyse the message and update how we handle this message in our microservice.

Conclusion

Having asynchronous communication using AWS Simple Queue Service and SNS Simple Notifications service makes microservices de-coupled and an easy way to have Event Driven Architecture with event as means of communication.

DEV Community

Getting Asynchronous with SQS and SNS

Preface

The Why

The What

The How

The Benefits

Traceability & Fault Tolerance

Testable

Flexibile with microservice availability:

Conclusion

Further Reading

Top comments (0)

Read next

VPC y Subredes en AWS - Parte 2: Configuración de Conectividad Segura entre Recursos y hacia Internet

Resolving ECS Task Definition Security Risks Detected by AWS Security Hub Using Secrets Manager

Joins, Scale, and Denormalization

Amazon Q Developer Tips: No.6 Exploring Use Cases