DEV Community: Raphael Chinenye

How To Provision AWS ECS for Deploying Microservices

Raphael Chinenye — Fri, 27 Aug 2021 10:20:59 +0000

The process of deploying applications on the AWS Elastic Container Service (ECS) Fargate, which is a Managed serverless container orchestration service provided by AWS, has a couple of requirements that should be covered before the process is carried out. The applications/services should conform to the Microservice architecture by including the necessary bits that qualifies an application to thrive in the Microservice architecture, these include Containerization, having a Logging mechanism, a status check mechanism and should avoid over-ramped features. Having these requirements will not only enhance application development but will also help to ensure that applications and services are running as expected. These considerations should be incorporated in the design and implementation process of the applications/services.

Requirements

After the application/services is ready to be deployed in the ECS environment, there are several touch points that should also be considered before proceeding with the setup. There’s a reference to the How-To Article on this aspect, where the required services are enumerated. Prior to setting up the respective Cloud or on-premise infrastructure that would be needed by the application, a clearly defined requirement should be obtained from the development team, stating the different resources that the application would need to run, the requirements should then be analyzed by the infrastructure and architecture team to advise on the best course or the best choices after clearly considering the costs and implications of using those resources. After there’s a general consensus on the way to go regarding those requirements, the infrastructure is set up based on the agreement. Some of these requirements may include but not limited to Databases, Cache services, Messaging services like ActiveMQ, Kafka and AWS SQS, Serverless platforms for batch jobs, Machine learning services like AWS Rekognition etc. They can also be seen in the How-To Article above. These services should mostly be setup before going on to deploy the services/applications for them to integrate into.
An account on AWS that has access to the following basic services is required;

ECS
CodePipeline
CodeBuild
CloudWatch
EC2
S3
ECR
VPC

The following should be available for the required services to build on

An assigned VPC for the ECS cluster and Load Balancer with multiple Availability zones for the subnets
A Repository on GitHub that serves as the source code
A Dockerfile that will be needed to build a docker image for the deployment of the services
All the Environment variables and application properties, needed for the application to run
A predefined region for doing the creation and deployment;

After ensuring that the requirements mentioned earlier are in place, then the applications are ready to be deployed on the platform. The containers play a vital role in the deployment and functioning of microservices in general, and they also play that role in functioning on ECS. But before starting work on the deployment of the containers, there is a need to provision the services/infrastructure needed.

Setting Up the Infrastructure

AWS Elastic Container Registry

An AWS ECS cluster would first get created which is a logical separation or aggregation of ECS services and tasks, these clusters are where various elements like Container Insights are set. After setting up the Cluster, a repository should be defined where container images will be stored and pulled from when deploying. This will be done via AWS Elastic Container Registry, which provides a managed and secure way for hosting container images repositories that could be public or private. Except the company decides to provide Open Source Images, all images should be private and secure, the images should also follow the standard process when being created. The process is also described in the How-To article. The ECR service configurations should also include the option to scan the image on push to scan for any vulnerability in the container image. Images that have high vulnerability content should be escalated appropriately.

Task Definitions

After an repository on ECR is created for storing the container images for a particular service following the approved process, the ECS Task Definition should follow, this is where we define/declare the structure and configurations that should be provisioned with the containers, which include the Environment variables, the container image URI, the task name and other parameters that may be needed. Essentially, there are some parameters that needs to be ascertained based on the requirements of the development team such as the compute capacity needed to run these application workloads, this would determine the amount of compute to assign to a microservice. The container definitions is what would then contain any configuration that is required on the container level following the stipulated policies. The first image pushed to the repo should be a test image based on a test run build using the standard service for Cloud Continuous Integration as of the time of writing, using AWS CodeBuild and following the SOP and processes.

Application Load Balancer

After the task definition is created, an application load balancer on AWS Elastic Load Balancer is then configured to route traffic to the services running on the cluster. The load balancer is configured to accept traffic from the 80 and 443 ports and forwarded to the required target groups based on the matching of appended endpoints. These target groups are the tasks that are running based on the set definition.

AWS ECS Service

After the Load Balancer is configured, ECS services are now configured to run the tasks as a logical group or set that can have different interactive capabilities at a task wide scale. This allows for easier orchestration and administration. The service is also configured to receive requests for the allocated Load balancer and is allocated a security group for screening requests that comes in to the container. Based on the requirements, a health check grace period is set so as to allow the applications to start up before health checks begin. The services are also placed in the appropriate VPC and subnets. It’s also in service configuration that health checks are set. If the console is used then the routing rule for the application Load balancer is also configured. Depending on the environment and requirements, the Autoscaling group may be configured based on known behaviors and metrics. There are options of setting the autoscaling trigger to track the memory, CPU or Network activity. During the development of the application, this is also a vital part that should be taken into consideration as this will shape the way the autoscaling configuration will be designed.

After deploying the service, then a pipeline can be setup using the build project used for test running the CI for ECR. It’s also imperative that monitoring and observability is setup to track the various key performance, metrics and activities. This would require setting up a dashboard to track services based on a logical aggregation of parameters like logs, metrics such as point in time memory consumption, CPU consumption, number of running tasks, Network activity such as number of connections, and non application services monitoring too.

Threading the future with Hivemind

Raphael Chinenye — Sat, 19 Sep 2020 15:10:55 +0000

Decentralization is a wave that’s been rocking the software world recently, from the advent of Torrent, a file transfer protocol for peer-to-peer file sharing that makes it possible to distribute data all over the internet in a decentralized manner, to the coming of the already popular cryptocurrency, a decentralized ledger managing system for coin assets owned by individuals, with torrents, large file sizes can actually be fetched off the internet for little to no cost at all and with cryptocurrency, the hassles and exhausting protocols required by banks is skipped like Mr. Fantastic seriously stepping over the Empire state building in New York.
Decentralization not only reduces the workload on individual servers around, it also distributes resources across different locations and with the amount of data that may be needed to improve on existing models like GPT-3 with 175billion parameters, projected to increase by 1000s of folds. As of the time of writing this, even the most accelerated hardware resource can only train 0.15% of that future projection in realistic time, that’s why a group of intelligent individuals developed something that could help solve a growing issue in the space of Machine Learning and AI.
Ladies and Gentlemen, I give you HiveMind!
In the Machine Learning world, breakthroughs have come in from so many directions especially in the Neural Networks space from bigger Convolution Neural Networks (CNN), which are doing quite better at Computer Vision, to the way pre-trained transformers are rocking the Natural Language Processing world, and we haven’t even talked about the amazing GPT-3 that could be the one that wrote this up ;). The commonplace of all these sophisticated models is the fact that they all rely on training on a lot of parameters. Take a look at this below:

The above image shows that the Test loss declines with an increase in parameters, the more the merrier, right? But not so much in this case, training large Neural Networks is far from being cheap. It costs $25million to train the previous largest language model…That’s could get you a sizable property in the most luxurious places in Lagos, even the popular GPT-3 requires $4million for a single training with cloud GPUs and about $250million for the entire cluster, DFKM. Crazy isn’t it? It has been proven that the more parameters trained on, the better the model. But then given these costs, even if Jeff Bezos decided to train larger Neural Networks, he’d get bankrupt before the next FIFA World Cup, let alone, Researchers who are not supported by mega corporations except, of course, we want to limit the scope of AI to select individuals and this surely won’t let AI prosper. The advancements in AI is due to the growing community of enthusiastic individuals willing to take AI further and further. Although, the seeming restrictions are not caused by obscene policies or overpriced tools or the infinite tsukuyomi, but it’s the shear need for computational power which can be quite expensive.
Inspired by the overly intuitive blockchain technology and the fact that a group of highly ambitious individuals had us, the ones that are not funded by mega-rich corporations in mind, introduced the learning@home project, a crowdsourced training of Large Neural Networks using Decentralized Mixture-of-Experts.

Now, what exactly is the HiveMind? It’s not a sort of apocalyptic Artificial General Intelligence (AGI) that has interconnected neurons in the form of smaller Neural networks that combines in complexity to become one heck of a megalith intelligence, phew. According to the Learning@home documentation;

“Hivemind is a library for decentralized training of large neural networks. To meet this objective, Hivemind models use a specialized layer type: the Decentralized Mixture of Experts (DMoE)“

Sounds really cool already. Right? And then it’s open sauce, I mean, open source ;). This Decentralized Mixture of Experts (DMoE) comes with a couple of huge benefits that gets your fingers itching to get your hands on the library;

They allow you to train models that may be too much for a single computer
They give good and quite competitive performance even when the internet connection is poor. Think BitTorrent
They have no central coordinator so there’s no real fault trap
In Hivemind, a combination of numerous experts, termed “Mixture Of Experts” replaces a single Neural Network (NN) layer with numerous smaller “experts”. Now only few of these experts are used during the computation for a single input and they all have distinct parameters. These experts are placed on different computers in the Decentralized Mixture of Experts layer, which is responsible for discovery, distributing and managing the assignment of experts across the network by sharing their metadata using the popular Distributed Hash Table (DHT). Once the peers are discovered, the output is then computed by averaging the responses of the chosen experts for a particular input data.
In Hivemind, all peers in the network;
Hosts one or more experts, depending on their hardware
Run training of models asynchronously, calling experts from other peers in the network
Discover other peers using the Distributed Hash Table

On each forward pass, a peer first determines what “speciality” of experts is needed to process the current inputs using a small “gating function” module. Then it finds the most suitable experts from other peers in the network using the DHT protocol. Finally, it sends forward pass requests to the selected experts, collects their outputs and averages them for the final prediction. Compared to traditional architectures, the Mixture-of-Experts needs much less bandwidth as every input is only sent to a small fraction of all experts.

More importantly, the decentralized Mixture-of-Experts layers are inherently fault-tolerant: if some of the chosen experts fail to respond, the model will simply average the remaining ones and call that dropout. In the event that all k experts fail simultaneously, a peer will backtrack and find another k experts across the DHT. Finally, since every input is likely to be processed by different experts, hivemind peers run several asynchronous training batches to better utilize their hardware…. Pretty intuitive.

So, some people like some people I know may, all after this, be wondering “What is Hivemind really for?“

in short, Hivemind is designed for you to;

Run crowdsourced deep learning projects using computational power from volunteers and other participants
Train Neural Networks on multiple servers with varying compute, bandwidth and reliability

Hivemind is certainly a big leap into the future. Giving everyone a chance to explore the depth of Artificial Intelligence through deep learning is a major upgrade from the past. The creators of Hivemind are also planning a worldwide open deep learning experiment. There’s more to come ;)

Hivemind is in the early alpha stage at the time of writing: the core functionality to train decentralized models is there, but the inferface is still in active development. If you want to try Hivemind for yourself or contribute to its development, take a look at the quickstart tutorial. Feel free to contact learning@home on github with any questions, feedback and issues.