DEV Community: Vignesh C

Azure ML - Automobile Price Predictor

Vignesh C — Tue, 23 Jan 2024 20:27:47 +0000

The objective is to create and train a Machine Learning model to predict automobile price based on different parameters using Azure ML and public data set. We will also go a step further to set up and deploy web service.

Steps:

1) Navigate to https://studio.azureml.net/ and sign up / sign in.

2) Add new experiment using the (+) symbol at the bottom and select blank experiment.

3) Leverage the public data set - Automobile price data set and drag it to the mapping area.

Click and select Visualization

4) Search experiment items for "Select Columns in Dataset". Drag it to the mapping area to include All columns
Excluding the column 'normalized-losses'.

5) Link them as below.

6) Search experiment items for "Clean Missing Data". Drag it to the mapping area to the next step. Add the cleaning mode to remove entire row.

7) Save and Run experiment.

8) After careful feature engineering, determine the key parameters and select only those columns from the data set. In order to train the data set we also feed the price field.

9) Now we introduce a step to Split data. 75% of data will be used for training the model. Introduce the below step

10) For price prediction, we will use linear regression to train the model. Introduce the below steps.

11) Score the model

12) Evaluate the model and visualize

13) The model is now ready to be deployed. Below changes are done to exclude exact price field in real time. We will use the Scored Labels.

14) Now we are ready to set the web service

15) We will feed the input of web service to the score model. Run the experiment and deploy the web service

16) API key has been generated for further usage of web services.

17) Test the model with real time data

Understanding OAuth 2.0

Vignesh C — Tue, 15 Aug 2023 20:47:07 +0000

What is OAuth?

OAuth (Open Authorization) is an open standard for authorization that allow web apps to request access to third party systems on behalf of its users without sharing any account credentials.

OAuth 2.0 is an authorization framework which delegates access and permissions between APIs and applications in a safe and reliable exchange and made more compatible for use by both websites and apps

It also allows for a greater variety of access tokens, like having short-lived tokens and long-lived refresh tokens.

Key Components of OAuth 2.0

Resource Owner - The user who owns the resources
Client - The application requesting access
Authorization Server - Issues access tokens
Resource Server - Hosts protected resources
Access Token - Grants access with specific scopes
Redirect URI - After permission is granted

BEST PRACTICES

Choose the Right Grant Type
Implement Secure Token Management
Consistent User Consent Mechanisms
Regular Security Reviews and Updates

Git Cheat Sheet

Vignesh C — Wed, 17 Feb 2021 17:32:40 +0000

Credits: Nina Jaeschke

Reporting in Power BI - Quick Steps

Vignesh C — Mon, 18 Jan 2021 04:12:24 +0000

Power BI is a business analytics service by Microsoft. It aims to provide interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards.

Power BI Reporting

Steps:

1. Prepare Data

Importing Data (from data sources)
Transforming Data (Edit query in Query editor)
- Rename table
- Append Query (two or more tables to append columns)
- Fixing Metadata
- Filter Rows (condition)
- Remove Columns
- Adding Columns
- Merge queries (inner, outer, left, right)
- Split by delimiter

Cleansing Data (Edit query in Query editor)

2. Create Data Model, Enhance

Define table relationships
- Fix data model for summarization
- Fix query and create relationships
Configure Model properties

Adding Calculated columns and Measures (DAX - Data Analysis Expression)

Create hierarchies

3. Create Report

Report elements

Visualization elements
Static elements
Page

Format

Chart (Conditional Formatting)
Page (background)

4. Publish Report

PBIX file
Pin to Dashboard

5. Visualize Data

Filters, Slicers
Highlighting, Drilling

6. Refresh Data

Publish dataset
Refreshing data
Power BI Gateway set up - Personal
Power BI Gateway - Enterprise

Architecting with Google Kubernetes Engine

Vignesh C — Mon, 18 Jan 2021 03:42:37 +0000

Subscribe to my Youtube channel

Kubernetes is an open-source platform that helps you orchestrate and manage your container infrastructure on-premises or in the cloud.

It automates the deployment, scaling, load balancing, logging, monitoring, and other management features of containerized applications. These are the features that are characteristic of typical platform-as-a-service solutions. Kubernetes also facilitates the features of infrastructure-as-a-service, such as allowing a wide range of user preferences and configuration flexibility.

Google Kubernetes Engine is a managed Kubernetes service on Google infrastructure. GKE helps you to deploy, manage, and scale Kubernetes environments for your containerized applications on Google Cloud.More specifically, GKE is a component of the Google Cloud compute offerings. It makes it easy to bring your Kubernetes workloads into the cloud.

Cloud Run is a managed compute platform that enables you to run stateless containers via web requests or Pub/Sub events. Cloud Run is serverless: it abstracts away all infrastructure management so you can focus on developing applications. It is built on Knative, an open-source, Kubernetes-based platform. It builds, deploys, and manages modern serverless workloads. Cloud Run gives you the choice of running your containers either fully-managed or in your own GKE cluster.

Cloud Functions is an event-driven, serverless compute service for simple, single-purpose functions that are attached to events. In Cloud Functions, you simply upload your code written in JavaScript, Python, or Go; Google Cloud will automatically deploy appropriate computing
capacity to run that code.

We will deploy and manage containerized applications on Google Kubernetes Engine (GKE) and the other tools on Google Cloud

• Deploy solution elements—including infrastructure components like pods, containers, deployments, and services—along with networks and application services

• Deploy practical solutions, including security and access management, resource management, and resource monitoring

Cloud Build

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Cloud%20Build

Configure Pod Autoscaling and NodePools

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Configure%20Pod%20Autoscaling%20and%20NodePools

Creating GKE Deployments

• Create deployment manifests, deploy to cluster, and verify Pod rescheduling as nodes are disabled

• Trigger manual scaling up and down of Pods in deployments

• Trigger deployment rollout (rolling update to new version) and rollbacks

• Perform a Canary deployment

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Creating%20GKE%20Deployments

Deploy GKE

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Deploy%20GKE

Deploy Jobs

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Deploy%20Jobs

Deploy with Helm

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Deploy%20with%20Helm

Networking

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Networking

Services and Ingress Resources

https://github.com/IamVigneshC/GCP-Architecting-with-Google-Kubernetes-Engine/tree/main/Services%20and%20Ingress%20Resources

Machine Learning journey : Day 4 (Python | Descriptive Statistics | Case Study)

Vignesh C — Wed, 13 Jan 2021 03:15:19 +0000

Subscribe to my Youtube channel

Cardio Good Fitness Case Study

The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness. The market research team decides to investigate whether there are differences across the product lines with respect to customer characteristics. The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.

The team identifies the following customer variables to study:

product purchased, TM195, TM498, or TM798;
gender;
age, in years;
education, in years;
relationship status, single or partnered;
annual household income ;
average number of times the customer plans to use the treadmill each week;
average number of miles the customer expects to walk/run each week;
self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

Perform descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line.

# Load the necessary packages

import numpy as np
import pandas as pd

# Load the Cardio Dataset

mydata = pd.read_csv('CardioGoodFitness-1.csv')

mydata.head()

Head:

Tail:

Describe:

mydata.describe(include="all")

Info:

Histogram:

import matplotlib.pyplot as plt
%matplotlib inline

mydata.hist(figsize=(20,30))

Boxplot:

import seaborn as sns

sns.boxplot(x="Gender", y="Age", data=mydata)

Pair Plot

Crosstab:

Countplot:

Pivot table:

Correlation with heat map:

Self Study: Data Science - Machine Learning journey : Day 3 (R Programming | ROC and AUC Curves)

Vignesh C — Sun, 10 Jan 2021 23:08:50 +0000

Before getting into this topic, you shall optionally refer Day 1 post for the basics of ROC and AUC, Day 2 post to install R package and R Studio Desktop.

Subscribe to my Youtube channel: MyDigitalWorld - Cloud, DevOps, Data Science

Day 1 video : https://youtu.be/DPjFVNuMHaE
Day 2 video : https://youtu.be/WyFoSSvbNRA

Assume you have a set of 100 candidates preparing for a competitive examination. We are going to draw a ROC and AUC chart in R plotting the number of hours spent learning or preparing for the exam vs the result obtained, either pass or fail. We will ROC to compare the different thresholds and find the best threshold and AUC to determine which ML algorithm performed best.

1. Import libraries for ROC and Random forest algorithm

library(pROC) # install with install.packages("pROC")
library(randomForest) # install with install.packages("randomForest")

2. Generate random numbers for 100 samples for number of hours spent in learning

set.seed(420) # random number generation ensures that you get the same result if you start with that same seed each time you run the same process

num.samples <- 100

## generate 100 values from a normal distribution with
## mean 50 and standard deviation 12, then sort them
elearn <- sort(rnorm(n=num.samples, mean=50, sd=12))

3. Now we will decide if a sample is pass or not

## rank(elearn) returns 1 for the least time spent, 2 for the second least time, ...
##              ... and it returns 100 for the highest time spent for exam preparation
## So what we do is generate a random number between 0 and 1. Then we see if
## that number is less than rank/100. So, for the least sample, rank = 1.
## This sample will be classified "pass" if we get a random number less than
## 1/100. For the second least sample, rank = 2, we get another random
## number between 0 and 1 and classify this sample "pass" if that random
## number is < 2/100. We repeat that process for all 100 samples
pass <- ifelse(test=(runif(n=num.samples) < (rank(elearn)/num.samples)), 
  yes=1, no=0)
pass ## print out the contents of "pass" to show us which samples were
      ## classified "pass" with 1, and which samples were classified
      ## "not pass" with 0.

4. Plot the data

plot(x=elearn, y=pass)

5. Fit a logistic regression to the data

glm.fit=glm(pass ~ elearn, family=binomial)
lines(elearn, glm.fit$fitted.values)

6. Draw ROC and AUC using pROC

## ROC graphs should be square, since the x and y axes
## both go from 0 to 1. However, the window in which I draw them isn't square
## so extra whitespace is added to pad the sides.
roc(pass, glm.fit$fitted.values, plot=TRUE)

## Now let's configure R so that it prints the graph as a square.
##
par(pty = "s") ## pty sets the aspect ratio of the plot region. Two options:
##                "s" - creates a square plotting region
##                "m" - (the default) creates a maximal plotting region
roc(pass, glm.fit$fitted.values, plot=TRUE)

## NOTE: By default, roc() uses specificity on the x-axis and the values range
## from 1 to 0.  To use 1-specificity (i.e. the 
## False Positive Rate) on the x-axis, set "legacy.axes" to TRUE.
roc(pass, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE)

## If you want to rename the x and y axes...
roc(pass, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage")

## We can also change the color of the ROC line, and make it wider...
roc(pass, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4)

## If we want to find out the optimal threshold we can store the 
## data used to make the ROC graph in a variable...
roc.info <- roc(pass, glm.fit$fitted.values, legacy.axes=TRUE)
str(roc.info)

## and then extract just the information that we want from that variable.
roc.df <- data.frame(
  tpp=roc.info$sensitivities*100, ## tpp = true positive percentage
  fpp=(1 - roc.info$specificities)*100, ## fpp = false positive precentage
  thresholds=roc.info$thresholds)

head(roc.df) ## head() will show us the values for the upper right-hand corner
             ## of the ROC graph, when the threshold is so low 
             ## (negative infinity) that every single sample is called "pass".
             ## Thus TPP = 100% and FPP = 100%

tail(roc.df) ## tail() will show us the values for the lower left-hand corner
             ## of the ROC graph, when the threshold is so high (infinity) 
             ## that every single sample is called "not pass". 
             ## Thus, TPP = 0% and FPP = 0%

## now let's look at the thresholds between TPP 60% and 80%...
roc.df[roc.df$tpp > 60 & roc.df$tpp < 80,]

## We can calculate the area under the curve...
roc(pass, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE)

## ...and the partial area under the curve.
roc(pass, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE, print.auc.x=45, partial.auc=c(100, 90), auc.polygon = TRUE, auc.polygon.col = "#377eb822")

7. Fit the data with a random forest

rf.model <- randomForest(factor(pass) ~ elearn)

## ROC for random forest
roc(pass, rf.model$votes[,1], plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#4daf4a", lwd=4, print.auc=TRUE)

8. Layer logistic regression and random forest ROC graphs

roc(pass, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE)

plot.roc(pass, rf.model$votes[,1], percent=TRUE, col="#4daf4a", lwd=4, print.auc=TRUE, add=TRUE, print.auc.y=40)
legend("bottomright", legend=c("Logisitic Regression", "Random Forest"), col=c("#377eb8", "#4daf4a"), lwd=4)

The final chart looks like this:

Refer my GitHub Repository for the code.

Credits: StatQuest

Self Study: Data Science - Machine Learning journey : Day 2 (Statistics | R | Python | Anaconda | Jupyter)

Vignesh C — Sun, 10 Jan 2021 04:40:47 +0000

Prerequisites:

Statistics is generally considered as one of the prerequisites to study machine learning. We need statistics to help transform observations into information and to answer questions about samples of observations.

Statistics is needed in Machine Learning for..

Another prerequisite to data science - machine learning is a programming language - R or Python. R is used for statistical analysis to build models while Python is used beyond statistics with wide range of libraries and having better integration with other programming languages.

Applied Statistics:

Two broad categories in the field of statistics:

Descriptive statistics
Inferential statistics

Descriptive statistics is the process of categorizing and describing the information.

Inferential statistics includes the process of analyzing a sample of data and using it to draw inferences about the population from which it was drawn.

We need to get familiarized with all these concepts to continue our machine learning journey effectively. Most of these concepts would have been covered as part of our graduate degree.

Install R Studio

Install R and R Studio Desktop for your version of OS from here..

Sample R code to illustrate AUC and ROC from Day 1:

https://github.com/IamVigneshC/Machine-Learning-Data-Science/blob/master/R/ROC_AUC.R

Install Python

You can install and use python through command line or through Anaconda which come along with a tutorial, reference for various libraries.

Once installed, you shall open JupyterLab or Jupyter notebook and work on Python.

Some of my samples to get started:

https://anaconda.org/iamvigneshc

https://github.com/IamVigneshC/Machine-Learning-Data-Science/tree/master/Python

Starting my Data Science - Machine Learning journey : Day 1 (Confusion Matrix | Sensitivity | Specificity | Bias | Variance)

Vignesh C — Fri, 08 Jan 2021 03:19:06 +0000

Subscribe to my Youtube channel: https://youtu.be/DPjFVNuMHaE

Day 1

Types of Machine Learning

Machine Learning Algorithms can be classified into 3 types as follows –

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised learning

In Supervised Learning, the dataset on which we train our model is labeled. There is a clear and distinct mapping of input and output. Based on the example inputs, the model is able to get trained in the instances. An example of supervised learning is spam filtering. Based on the labeled data, the model is able to determine if the data is spam or ham. This is an easier form of training. Spam filtering is an example of this type of machine learning algorithm.

Unsupervised Learning

In Unsupervised Learning, there is no labeled data. The algorithm identifies the patterns within the dataset and learns them. The algorithm groups the data into various clusters based on their density. Using it, one can perform visualization on high dimensional data. One example of this type of Machine learning algorithm is the Principle Component Analysis. Furthermore, K-Means Clustering is another type of Unsupervised Learning where the data is clustered in groups of a similar order.

The learning process in Unsupervised Learning is solely on the basis of finding patterns in the data. After learning the patterns, the model then makes conclusions.

Reinforcement Learning

Reinforcement Learning is an emerging and most popular type of Machine Learning Algorithm. It is used in various autonomous systems like cars and industrial robotics. The aim of this algorithm is to reach a goal in a dynamic environment. It can reach this goal based on several rewards that are provided to it by the system.

It is most heavily used in programming robots to perform autonomous actions. It is also used in making intelligent self-driving cars. Let us consider the case of robotic navigation. Furthermore, the efficiency can be improved with further experimentation with the agent in its environment. This the main principle behind reinforcement learning. There are similar sequences of action in a reinforcement learning model.

Getting started with the Basics

Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed

Train the ML methods
Test the ML methods

We predict using different ML methods and document the results.

Confusion matrix helps to compare different ML methods and decide which performs best. We represent the training and testing data and document the actuals vs predicted in a matrix form depending on the number of parameters involved

Cross Validation is used to decide which machine learning method would be best for our dataset.

Sensitivity and Specificity

Sensitivity measures the proportion of positives that are correctly identified (i.e. the proportion of those who have some condition (affected) who are correctly identified as having the condition)

Specificity measures the proportion of negatives that are correctly identified (i.e. the proportion of those who do not have the condition (unaffected) who are correctly identified as not having the condition)

Bias and Variance

The inability of a ML method to capture the true relationship is called Bias
The difference in fits between data sets is called Variance (training vs testing data)

ROC and AUC

ROC (Receiver Operator Characteristic) graphs and AUC (the area under the curve), are useful for consolidating the information from a ton of confusion matrices into a single, easy to interpret graph.

ROC curve makes it easy to identify the best threshold for making a decision
AUC helps in deciding which categorization method is better

GCP - Streaming IoT Kafka to PubSub

Vignesh C — Wed, 06 Jan 2021 04:05:33 +0000

Subscribe to my YouTube channel : MyDigitalWorld

Objectives:

• Launch a Kafka instance and use it to communicate with Pub/Sub

• Configure a Kafka connector to integrate with Pub/Sub

• Setup topics and subscriptions for message communication

• Perform basic testing of both Kafka and Pub/Sub services

• Connect IoT Core to Pub/Sub

Architecture:

Introduction:

With the announcement of the Google Cloud Confluent managed Kafka offering, it has never been easier to use Google Cloud's great data tools with Kafka. You can use the Apache Beam Kafka.io connector to go straight into Dataflow, but this may not always be the right solution.

Whether Kafka is provisioned in the Cloud or on premise, you might want to push to a subset of Pub/Sub topics. Why? For the flexibility of having Pub/Sub as your Google Cloud event notifier. Then you could not only choreograph Dataflow jobs, but also use topics to trigger Cloud Functions.

So how do you exchange messages between Kafka and Pub/Sub? This is where the Pub/Sub Kafka Connector comes in handy.

Tip: Here we use a virtual machine with a single instance of Kafka. This Kafka instance connects to Pub/Sub and exchanges event messages between the two services.

In the real world, Kafka would likely be run in a cluster.

Details:

Youtube link: https://tinyurl.com/y6dd28vr

GitHub: https://github.com/IamVigneshC/GCP-Streaming-IoT-Kafka-to-PubSub

Blog: https://tinyurl.com/y4q9k4cp

Confluent: Developing a Streaming Microservices Application - Kafka

Vignesh C — Wed, 30 Dec 2020 03:46:06 +0000

Background

We build services using a Streaming Platform, some will be stateless: simple functions that take an input, perform a business operation and produce an output. Some will be stateful, but read-only, as in when views need to be created so we can serve remote queries. Others will need to both read and write state, either entirely inside the Kafka ecosystem, or by calling out to other services or databases. Having all approaches available makes the Kafka’s Streams API a powerful tool for building event-driven services.

https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/

Objectives

Persist events into Kafka by producing records that represent customer orders
Write a service that validates customer orders
Write a service that joins streaming order information with streaming payment information and data from a customer database
Define one set of criteria to filter records in a stream based on some criteria
Create a session window to define five-minute windows for processing
Create a state store for the Inventory Service
Create one persistent query that enriches the orders stream with customer information using a stream-table join

Check out the steps here

Additional References:

https://docs.confluent.io/platform/current/streams/developer-guide/dsl-api.html

https://www.confluent.io/blog/distributed-real-time-joins-and-aggregations-on-user-activity-events-using-kafka-streams/

https://www.confluent.io/product/ksql/

https://github.com/confluentinc

GCP - Automating DevOps Workflows with GitLab and Terraform

Vignesh C — Tue, 29 Dec 2020 00:59:48 +0000

Overview

GitLab is a complete DevOps platform delivered as a single application. GitLab provides a single UI for development and operational teams within organizations to work concurrently throughout the application development to delivery life cycle. As a result of applications becoming more and more complex, organizations turn to automation, like CI/CD, in hopes to streamline processes. However, to increase reliability and consistency in automation beyond software development, more and more teams are resorting to declarative automation and their respective workflows in code and storing them in code repositories. Often and dynamically triggering automation as code to quickly iterate and deliver value into production is becoming more and more popular. This phenomenon is commonly known as GitOps, where tools such as Terraform and GitLab shine.

Google Cloud provides scalable cloud infrastructure with managed services for various forms of compute, storage, networking, etc. With extensive open APIs, almost any and every aspect of it can be automated and managed via code the proper tools. Organizations looking to adopt cloud native solutions in an agile way need a better way to manage the inherent complexities when elastically scaling their services, breaking up monolithic applications, and securely operating at speed through cross functional teams. All of the prior has been the foundation of the DevOps transformation as we recognize today.

The fundamental challenge DevOps attempts to address is the unification of Developer and Operator workflow. When automating the application development process, Continuous Integration / Continuous Delivery (CI/CD) is crucial in iteratively deploying code reliably, in a secure fashion from source code into production. When automating Infrastructure administration and provisioning, Infrastructure as Code (IaC) is an emerging method to quickly and reliably build deployment targets for applications. IaC enables practitioners to deploy infrastructure through declarative means and maintain the desired state and lifecycle of cloud resources.

GitLab is the single application that provides all of the necessary functionality for DevOps teams out of the box for a seamless, low maintenance, just-commit-code software development, and delivery experience. GitLab enables teams to plan sprints and projects, organize code in git repositories, operationalize application code with CI/CD pipelines, secure source code through various vulnerability scanning, and much more. GitLab provides means to construct flexible automation workflows that are complementary to other tools like Terraform. Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. The infrastructure Terraform can manage includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc. Together, GitLab and Terraform can be configured together to provide DevOps teams the capability to manage their cloud through IaC, continuously and reliably.

Sample Application Overview

We will be using a sample application that we have called Vote-App which is representative of a N-Tier microservice architecture, containerized application that will be deployed to a Kubernetes cluster provided by GKE. Below is a high level overview of the sample application architecture:

Vote UI: Is a python containerized microservice using flask to generate a front end UI for end users to place a ‘vote'. This service will be running on GKE.

Results UI: Is a NodeJs containerized microservice used to generate a front end UI for end users to see voting results. This service will be running on GKE.

Redis: This will be a managed Redis service powered by Cloud Memorystore which will serve as a cached queuing system for votes. This managed service will store aggregated votes inserted by the *Vote UI *service.

Postgres: This will be a managed Postgres service powered by Cloud SQL which will serve as a central database for all votes casted and accounted for. This managed service will be queried by the Results UI to show results.

Worker: This Is a .Net Core containerized microservice that queries votes in the que stored in Redis cache and enters them into the Postgres database. This service will be running on GKE.

DevOps Workflow Overview

You will have access to two GitLab users, Developer and Operator. After initial GitLab configuration and post importing of all relevant projects, you will role play deploying microservices and managing Infrastructure via IaC automated through CI/CD from code commit to running in production.

You will start with only a single, pre-provisioned GKE cluster as a deployment target for your microservices. In this example, we will simply be committing to the Master branch to deploy into a single staging environment. The high level workflow is as follows:

The user will commit code to the application code repositories to scan, package and deploy the service.
This will kick off a series of CI/CD pipelines to
- Execute code security scanning for vulnerabilities
- Package the code into a docker container
- Trigger a child pipeline to provision dependent infrastructure.
- Deploy all necessary Kubernetes resources (pods, services, secrets, etc.) to GKE.
The Terraform child CI/CD pipeline stored will:
- Use Terraform to init, plan, and apply to provision a Memstore instance and a Cloud SQL Postgres Instance on GCP.
- The CI/CD pipeline will also update and deploy a ConfigMap and Secrets resource to Kubernetes that the microservices will reference for appropriate DB endpoints and credentials.
- Lastly, the services will then be deployed to Kubernetes and be accessible through a web endpoints by the end user.

The workflow will be concerned with a single GitLab subgroup spanning multiple GitLab Projects outlined below. Each project listed below is a git repository with a .gitlab-ci.yml file already defined for an automated pipeline to execute upon code commit.

Continue reading here...

Additional References:

https://www.terraform.io/docs/index.html
https://docs.gitlab.com/ee/ci/