David Saltares for Aula Education

Posted on Feb 8, 2021 • Edited on Feb 15, 2021

Aula - Behind the scenes of The Learning Experience Platform - Our stack

#react #startup #serverless #node

Aula is the Learning Experience Platform (LXP) for higher education. We make it easy for educators to create community-first learning experiences that truly engage students.

😅 TL;DR

Terraform.
React, React Native and Redux.
AWS Lambda/serverless running on Node.js, S3, SNS, SQS, etc.
MongoDB Atlas.

⚙️ The stack

Let's dive right in! 🏊‍♂️

⌨️ Javascript

Our entire stack is written in Javascript. This includes our web applications, mobile client and all of our back-end services.

Using modern Javascript throughout our codebase feels great. It gives our small team of full-stack developers a consistency, which coupled with Javascript's flexibility empowers us to be productive and move at a very fast pace.

🏛 Isolated, reproducible environments

We work with higher education institutions, a regulated sector where data privacy and security are of the utmost importance. No Aula back-end service should be able to access data from multiple institutions. Additionally, institutions should scale according to their needs in a cost effective manner.

That is why each partner institution we work with gets their own fully isolated environment. In practical terms, this translates into a separate sub-account and Virtual Private Cloud (VPC) on AWS. Traffic can go in and out of the VPC through a NAT instance running on an EC2 machine via Docker.

Every time we sign a new partner institution we set up a VPC and all the other AWS resources we require. This is an involved and error prone process and that is why we let machines do it!

We heavily rely on infrastructure as code and use Terraform to define and provision new environments. Terraform removes the human error factor and makes creating new environments a far easier process.

🚛 Storage

Each use case requires a particular type of storage. As previously mentioned, each store is separate per institution and lives within their corresponding VPC.

MongoDB via Atlas for our long-lived application data: posts, comments, messages, assignments, etc.
ElasticSearch running on EC2 to power our search feature.
S3 to store images, videos and other files users can share on the platform.
Redis for our WebSocket server instances to communicate and deliver real-time updates to our front-ends.

🏎️ Microservices

At Aula, we have built our back-end around the microservices paradigm. We believe it to be the best fit for our use case because:

We can collaborate more easily without conflicts.
Services can auto-scale independently in a much more cost-effective way.
Issues affecting one service can be isolated, allowing the rest of the application to function as usual.

We minimise the drawbacks of microservices with:

Shared utility libraries to reduce boilerplate.
Tooling to manage deployment complexity.

Most of our back-end logic runs on AWS Lambda and the Serverless framework with the Node.js runtime. This allows us to focus on the business logic rather than managing servers. We find that Lambda functions scale phenomenally well. Migrating from a Docker/EC2 model to Lambda has truly transformed Aula Engineering, dramatically improving stability and developer experience.

Let's go through the journey of two common workflows at Aula, such as creating a post and sending a message, and see what happens under the hood.

The client makes an HTTP request to an API endpoint.
API Gateway, managed by AWS invokes the corresponding Lambda function that implements the route handler.
The Lambda function will handle authentication, run some business logic, save content to the database as well as pushing an event to our SNS event bus. This will allow other services to react to the event in a non-blocking, fault-tolerant way.
An SQS queue for live updates picks up the event and triggers a new Lambda function that forwards the event to our Websocket server.
Our Websocket server runs on Fargate and communicates across instances via Redis. The new post or message event reaches other relevant users via sockets! Fargate removes a lot of the complexity of scaling a cluster of containers.

The Lambda, SNS, SQS pattern described above is used for many other features like push notifications or search indexing. The use of dead letter queues (DLQs) when a particular queue message fails to be processed provides us with great error recovery capabilities. Messages are not lost and processing can be retried.

User analytics

Aula is a data-driven company. We rely heavily on user analytics to make product decisions. Furthermore, educators using Aula depend on student engagement data to identify those who may be falling behind and need a bit of help.

It's essential that our analytics pipeline is reliable.

Both our clients and back-end services may generate analytics events. These events are processed by a lambda function that uses Kinesis Firehose to aggregate events into an S3 bucket. Analytics events and our MongoDB store are synchronised into our Snowflake data warehouse, and are transformed for consumption by Metabase, our Business Intelligence tool.

⚛️ Web and mobile clients

At Aula, we love React for its one-way data flow, composability and huge community. We use React in all our client applications. On mobile, React Native gives us all the benefits of React plus access to native features when needed.

On top of being fantastic tools, their level of consistency keeps the mental burden on the team to a minimum, which allows us to be flexible and work across the product easily.

In order to avoid repetition when building features for both web and mobile, both platforms share the whole Redux store, actions, reducers and selectors. The re-use also results in fewer bugs and a more consistent experience across devices 🐛!

♻️ Build, test and deploy workflow

Managing a collection of microservices, multiple front-ends and many different environments can certainly become cumbersome without the appropriate tooling. We have automated all these processes with a combination of off-the-shelf products and our own Aula CLI.

All of our code lives in a monorepo hosted on Github, which we also use for issue-tracking. When a PR is open, a Circle CI job kicks off that lints and tests the changes.

Once a PR is merged, another Circle CI job is spun, which:

Builds the front-ends and services that have changed.
Creates a new monorepo version.
Deploys the version to our staging environment.
Runs E2E tests written in Cypress.
Deploys the version to our internal Aula environment. We dog-food our own product 🐶!

We trigger deployments to production via Aula CLI, which in turn spins up jobs on Circle CI.

The Aula CLI also provides tooling around other common workflows such as test user creation and individual service deployments.

🚨 Logging, monitoring and alerting

Observability is paramount when maintaining a SaaS product. You should not wait until your users tell you something is not quite working before you act 😱!

All our back-end services output logs and metrics into CloudWatch. Alarms are then set based on our SLAs. For example, when the availability of a REST endpoint goes below 99.9% for 1 minute, an alarm is raised and routed to Opsgenie, our on-call management tool.

Alerts are periodically tuned to maximise how useful they are whilst keeping noise to a minimum.

Client-side errors are tracked via Sentry.

🚀 Next

We have a public product portal where you can peek into our roadmap and see what we're working on right now.

As a team, we continuously improve our processes and technologies. For example, we're introducing Typescript as we build a brand-new content editor based on Slate.

👋 Join us

Do you want to join a remote and diverse team, work with exciting technologies and build a community-first platform that helps educators make learning truly engaging? We're looking for Senior Software Developers!