DEV Community

Ray Smets
Ray Smets

Posted on

AWS Best Practices at Work

The Story
Nexkey is a smarter access control system, featuring a mobile app and cloud-connected hardware to make any door smart. We allow businesses to manage access instantly, get insights into their space to streamline operations, message users, and automate workflows through a flexible API.

We take reliable and secure access control seriously. In order to see our vision of a physical keyless society, those are tenets that we must uphold as an organization. There is a great need for responsive, secure and scalable cloud infrastructure to provide frictionless key sharing experiences, real-time alerting, and audit log capabilities. In order to be truly frictionless, users must have a sense of immediacy in their user experience, even as the number of devices and users under management scale.

Having joined the company in October 2018 the pleasure of leading the backend re-architecture initiative started shortly after β€” taking calculated steps to ensure that we can best serve our growing customer base. Taking a look at the last year we have great empirical data to show for our work highlighted by the 31ms average response time.

Alt Text
Figure 1. Our SLA report as produced via New Relic.

How it was done

  • Network re-architecture. Proper use of VPCs, dynamic firewalls, application load balancers
  • Database modifications. More efficient database indexing & changed hosted database service
  • Service-oriented app decoupling.
  • Lots of code refactoring.

Network Redesigned
While this certainly had one of the most drastic effects on our scalability potential and response times this was relatively easily improved upon. The former configuration was still using default Virtual Private Cloud settings of which the database was not a part of. We found that leveraging this AWS cloud formation template made for a very simple standing up of a new VPC with properly configured public and private subnets and their corresponding NAT gateway configurations. It allowed for the basis of a reproducible VPC. Similar to nearly all SAAS orgs we currently have multi-tenant Production and Staging environments however the potential for creating a single-tenant VPC to meet potentially large enterprise standards would be trivial.

Database Improvements
We use the non-relational database, MongoDB, and in October of 2018 we were using mLab as our hosted database service. One highlight of the now-defunct service was their auto-indexing which we used to our advantage as it was extremely simple yet powerful. Upon their acquisition and impending decommissioning we opted to switch to ScaleGrid due to being able to have the database instances managed with our own and newly created VPCs. Many competing hosted databases services run in their third party cloud and one must peer a connection to get the benefits of private interface networking. While this is viable, it did not give us the control desired over our data layer. ScaleGrid affords this and peace of mind of managing redundancy and observability tools to keep an eye on our data persistence layer.

Security
With the migration of our services into a properly configured VPC, we inherently have much greater control over our network. Prior to this work, all of our services and instances were exposed over public interfaces. Now everything is safeguarded behind a Web Application Firewall, load balancers, and very strict security group settings. We have also migrated nearly all of our services to a server-less AWS compute solution, ECS Fargate, which not only decreases operational complexity but also increases our security profile. This is due to the underlying compute being managed, patched and updated all behind the scenes. We will never have to manually perform a server security patch again. Furthermore leveraging Firecracker to manage microVMs used under the ECS Fargate hood allows for extremely fast instance spin up times further bolstering our on-demand scalability capabilities.

Taking advantage of new application load balancers from VPC migration we were able to pull off the shelve initial Web Application Firewall configurations that include rules against botnets, network scanners, and known blacklist ips enabling the ability to stop noisy and potentially malicious traffic at the gate before ever making it our backend services.

It is also worth noting that our mobile apps have also implemented certification pinning as an additional measure to ensure the authenticity of our network traffic to our mobile applications.

Service-Oriented Decoupling
While we still have a mostly monolithic application, some key operations have been decoupled to allow for more elastic scalability requirements. One of the most notable services that halved our response time was the Web Socket service. Leveraging our new application load balancers from the VPC migrations in hand with our new ECS service configurations we were able to simply apply load balancer filtering rules to divert traffic to separate target groups which can scale independently one another.

In order for the formerly tightly coupled web socket service and our main app to still communicate with one another Redis was leveraged as a fast and distributed message queue.

Conclusion
As a result of the culmination of our work we are now 11x faster while serving 5x more traffic to our web portal and mobile apps in addition to far more effective security policies in place. This can be attributed to leveraging the vast number of resources that AWS provides and taking the time to simply implement many operational best practices. We have really been able to see the benefits. Furthermore, moving to serverless compute infrastructure, we are benefiting from both the industry-driven reduced costs and the easy to manage and scale characteristics. It’s a fun time over here! We have lots of exciting and well-engineered innovation happening on the hardware and mobile app front as well. Thanks for reading and please follow for updates.

Ray Smets, Nexkey Lead Backend Engineer

Top comments (0)