DEV Community: shreyan1999

Demystifying DevOps

shreyan1999 — Thu, 20 May 2021 11:15:09 +0000

Software Development is one of the important processes in today’s world which deals with coding, programming, testing, documenting debugging and many more. They are the building blocks to any kind of technology in the world. When we talk about software development, a term that comes to our mind is DevOps. The term DevOps seems very interesting, yet confusing to many. Initially, I too was unaware about DevOps in the Tech World. However, on slowly demystifying this term “DevOps”, I leant quite a lot about its history, significance and the reason for its popularity.

DevOps is an abbreviated form of “Developer and Operations”. What does this exactly mean? Why is it so popular in the industry? Let's demystify it one by one!! Hence, in the next few posts, I would like to simplify and explain about DevOps.

Serverless on Microsoft Azure: A brief Introduction

shreyan1999 — Tue, 08 Sep 2020 14:46:03 +0000

Nowadays, there are a lot of Entrepreneurs who are trying to launch their own products in the market, majority of their focus being on the software aspect of technology. However, this comes with a truckload of obstacles, which the young Entrepreneurs will have to overcome.The major factor here is expenditure.
Expenditures are of two types:
1.CapEx (Capital expenditure)
2.OpEx (Operational expenditure)

Back in those days, both these expenditures used to be really high. This is because any software at that time needed a technical setup like servers,internet running round the clock.This also came with other general issues like the Electricity Expenditure incurred to keep the servers running, cooling the servers and so on. Buying the infrastructure and setting up the whole setup comes under the Capital Expenditure.While, the cost incurred while running the service comes under operational Expenditure.

With the advent of the “Cloud Computing” these costs were considerably reduced and people started to migrate to a cloud platform.Cloud provides all the infrastructure with a “Pay as you go” model which means we pay for only the resources that we use and the duration for which we use it. This reduced the costs to a huge extent.

Any organisation primarily focuses on the functioning and business aspect of the firm. Strategies are essential to improve the quality and sales. However, the organisations failed to do this completely. Though IaaS (Infrastructure as a service) and Paas(Platform as a service) reduced the work,they weren't the optimal solution. One had to still think on the resources needed and the upgrade of versions for example. Now this is where the concept of serverless came into picture.
Serverless computing doesn’t eliminate the servers totally but it's the abstraction of the servers. One has to just worry about building the applications. Rest all the things are handled by the cloud provider. The upgrades,underlying infrastructure scaling, and so on!

Serverless is a new simplified version which a lot of businesses are inculcating due to the ease of the usage. The benefits of the Serverless includes,
Reduced Devops,The solutions can be shipped much faster since we only focus on the core logic of the application and also we are just charged for the workflows which are running, and not for the whole time of running.

Microsoft Azure also has a good serverless system.There are majorly three components of serverless compute on azure:

Azure Functions
Azure Logic Apps
Azure Event grid
Azure Functions: This is the ability to run the custom code in any language be it C#, JavaScript, node or any other language on demand at scale in the cloud. This works based on the events. Lets look more into this in our upcoming blog.
Azure Logic apps: This helps to chain different functions together. This has a “DO THIS THEN THAT” capability and can be considered as a serverless workflow.
Azure Event Grid: This is an event routing service which lets us handle the events in an easier way. We will delve into this more in the upcoming blogs.

Serverless is a new technology which will enable the organisations to work efficiently by letting them focus on their business logics rather than working hard on the software side.That’s it for the introduction.Stay tuned for more!

Big Data on Azure

shreyan1999 — Fri, 10 Jul 2020 08:07:47 +0000

Data is a very crucial part of any enterprise service. It is present everywhere, abundantly in today’s world. An enormous amount of data is generated every single second. Now one might wonder,if such a huge amount of data is generated everyday then how is it managed and how is it useful? We will learn about this in this article. I will also inform you about how Microsoft Azure is emerging as one of the leading solution providers in the field of Data Science.

A cloud platform which mainly consists of:
Compute/Servers: Networked commodity machines located in data centers.
Storage/Databases: The storage services located on the servers in the data centers.
Intelligence/Analytics: Interacting with people in many possible ways.

Let us see some statistics on how data is changing,
1.Maximum data is generated in the past 2-3 years, when compared to the data from the time of the start of mankind. There is no doubt that due to the covid-19 pandemic, with more people switching over to the internet, maximum data is expected to be generated which is expected to be an all time record high.
2.Everyday, we see that approximately 2.5Exabytes of data is generated. (order of 10^18)

Why is the reason behind this rapid growth of data?
1.Internet: The major contributor to the data boom. Approximately 4 billion humans use the internet and 5 billion search queries are found. We can actually estimate the crucial role the internet has in this data boom.

2.Social media: This is definitely not a term we are not familiar with. Social Media is the next big contributor for the data boom.Every like,share counts! Every viral thing on social media makes a huge difference!

3.Internet of Things: Though not very popular, it has a significant contribution in data generation.It is sure to increase in the coming decades.

There are many other contributors but I have just jotted down a few

Having learnt about what data is, let us now explore types of data which are available,

1.Structured Data: This majorly includes all such data which is formatted or is defined well with a proper structure.For examples in rows and columns,
2.Unstructured Data: This kind of data is not at all structured for example media files,text files.
3.Semi-Structured Data: It is a hybrid of the above two formats.for example JSON,XML.

Usually we deal with unstructured and Semi structured data in today’s world. Talking with respect to enterprise level,usually they have 50% unstructured data in the order of petabytes.Just to add on,the industry has seen an explosion of semi and unstructured data in the last few years.

Now let's come to the main topic of the blog. BIG DATA.

The name itself makes it clear it to some extent. A data which is too large or complex to be analysed in traditional data processing software apps is what we call big data. Additionally, it has 3 major characteristics,which can be called the 3Vs.
What are the 3Vs?
1.Volume: one of the main characteristics of big data is volume. What volume are we talking of here? It is of the order of petabytes or more. Petabytes soon won't be the appropriate term to refer to, if we look at the size of the data that is being generated.
2.Velocity: Similar to velocity in Physics we all know, here it is the rate of the data growth. This is usually expressed in exponential terms.
3.Variety: This refers to the types of data which comes in. As we discussed earlier, it comes in 3 different formats i.e. Structured, Unstructured and Semi Structured Data.

All these seem to be in place. But what do we do with such volumes of data?

To understand this better, let us look at some real life examples,

Rolls Royce: Besides manufacturing automobiles,Rolls royce also manufactures aircraft engines. It has over 13000 engines which are operational and send real time data on various parameters of the engine. The main reason for this data collection is to increase the fuel efficiency of the engines. Fuel efficiency is the biggest concern for the airlines. Some landings save fuel whereas some turn exorbitantly expensive. Improving the fuel efficiency by analysing the real time parameters from all the operational engines is a great step taken by Rolls Royce to improve their quality of engines.

Hewlett Packard: This is a very familiar company which produces many electronic gadgets, spare parts and accessories. Service is a major concern for such companies. Many a time customers are not satisfied with the quality of after sale services provided by the companies. HP collected all the data of the customer problems, used a combination of Data and AI to improve the technical support. As a result, they observed an increase of 75% automated queries resolving compared to the previous methods.

Having seen these two, now we can relate it to things around us. All the leading social media services rely on this great technology for their smooth functioning. Healthcare, Transport, E-commerce to name a few rely on big data or will switch over to it very soon.
Now let us talk about the ways in which we can analyse this big data. This is a great challenge in itself right? Handling such precious data is obviously no joke. It should be taken care of really well.This is exactly where the cloud technologies come handy. Any data we might think of first needs to be stored and then be analysed.
We rely on relational databases for storage purposes and to also go ahead with a distributed file system called Data lakes.
Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming and interactive analytics. Azure Data Lake works with existing IT investments for identity, management and security for simplified data management and governance.

The next important aspect we need to discuss about is compute
We have options like No SQL databases, Apache spark, Hadoop, Azure synapse analytics and so on.

Microsoft Azure provides robust services for analysing the big data. As we discussed earlier the azure data lake is a wonderful and secure method to store the data.We can later process this using Spark on Azure databricks. Azure provides a hassle free experience with best-in class cloud security. Azure stream analytics is a service for real-time data analytics.

What does Azure databricks offer?
1.Optimimsed Spark engine
2,ML runtime
3.ML flow
4.Choice of language
5.Collabortive notebooks
6.Delta lake
7.Native integrations with Azure services
8.Interactive workspace.

Azure has hundreds of services to offer, which makes it a very hassle free application to deploy any kind of enterprise ready applications.Seems interesting? Doesn’t it?

AI Bias: What to do?(Part 3)

shreyan1999 — Fri, 26 Jun 2020 17:23:50 +0000

This is part 3 of the 3 series post.Let's pick it up from where we left in last post.Here are few ways to reduce biasing

Choosing a suitable learning model for the problem:

It is important to understand the problem first and then identify the best model for a given situation. The machine learns from the data given to it. Now in unsupervised models , it learns from their data set and can show biased output as the model can mix up the data. Whereas in supervised models it allows more control over bias in data selection. Hence, one should take time and look through the various vulnerabilities, troubleshoot ideas and adopt different strategies when building the model.

Choose a representative training dataset:

The training data should be diverse with different groups. It can have different models for different groups but if the data is insufficient for one group, then usually weighting is considered to increase the importance in training. But this can cause biasing and the model is forced to study the limited data. Therefore, one should be careful with weights, especially large ones.

Monitor performance using real data:

Usually after training, one uses testing data to verify the model which might not be similar to the real data. The developer fails to realize the vulnerability of the model. Therefore, real data should be used to test the model and biasing can be identified by the developer and they can rectify it.

A model for monitoring purposes:

Another possible way to address the problem is to build a machine learning model that can independently identify biasing in the data provided before it can be used in the intended model. This model has already been provided with ideal unbiased data and hence it is able to identify the possible vulnerability in the input data.

Setting the Boundary:

Another apt practice is to set a boundary between automation and human intervention. It’s necessary to decide to what extent should the processes or systems be automated. If the whole system is automated without any human intervention, it will become prone to disasters and is highly risky. For example, a model in a hiring company might be biased unintentionally or might have become biased over the time due various reasons as discussed earlier. This might lead to a biased output and thus causes troubles to both, the job seeker and the employer. Therefore, the best practice would be an that an AI system / algorithm gives the advices and a human double checks the decision of the AI system.This will improve the quality of the algorithm too and will avoid any sort of discrepancies in the functioning. Both, the humans and the AI can work hand in hand if this is followed.

Increasing the availability of data:

More diverse the data, more will be the accuracy. This is what we have seen in the AI systems but there is also another benefit of feeding more data to the system. It can considerably reduce the risk of biasing. Giving more data during the initial training makes the model ‘understand’ the data well and not just ‘learn’ it. Let us take an example to consider this. Consider a high school student learning maths and appearing for the exam. Though we say memorizing maths is difficult, it is not impossible. So, there is a probability that the student might memorise the math questions if the questions are limited if the same pattern of questions reappear in the paper. Hence, we can say that the best way to avoid this is to not follow the pattern in the paper to ensure that the students “understand” math well.The same analogy can be used for our model. It should have a variety and vast amount of data in order to reduce the risk of bias along with increasing the accuracy.

There are many ways to overcome this biasing, however it is majorly linked to human error, inadequate and low quality data, model training and monitoring.There are many significant developments taking place in this field and It is extremely necessary that the AI system shouldn’t fall into wrong hands or else we well have no control over the severe consequences that follow. AI is such a powerful system that many crimes are taking place with the help of AI . But on the other hand AI is also serving as an effective system in order to curb crime, for example, the fraudulent transaction detection helps the banking sector to strengthen their sector. But the bane that tags along is also very dangerous. Many people who are into studying and developing AI systems are working to provide resources for organizations which are looking to deploy AI fairly. It is not a set point which can be told to curb this bias problem, but it’s a long process which should be followed inorder to minimize the biasing problem which is seen in the AI industry. AI has the potential which is enough for business, economy and to tackle some of the serious issues which are present in society, but that will be possible only if humans trust the AI system and to gain this trust unbiased results which is a need. Hence, we can conclude saying that, when AI systems can assist us , there is also a need for the AI systems to be helped in order to help us efficiently.

AI Bias: What to do? (Part 2)

shreyan1999 — Fri, 26 Jun 2020 17:18:57 +0000

This part 2 of the 3 series blog.As discussed in the earlier blog the data is really necessary,If not treated well it can lead to disasters.Unfortunately there are many AI systems which have been victims of this biased data. A data can turn into a biased one right from the initial step which is why it should be preprocessed well.For example, a faulty percept (eg.a faulty sensor) gives out wrong data and if it is not monitored well then definitely it will lead to a tragedy. Therefore, Monitoring the things during the initial step is one of the most important aspects in order to avoid faulty data This is the case of getting a wrong or biased data unintentionally or mishandling.
But what about the data which is being biased? Before getting into this let us analyse what biasing is. Bias can be seen in both, a fair and unfair manner. Here we will analyse the unfair way where in there is some sort of discrimination or inequality among some individuals, groups or community. AI is a rose but on the other hand it has thorns attached as well. Where we think of AI being used to reduce this discrimination, It can also fall into wrong hands and can be used to further increase the bias. Lets see some examples where this has already taken place,
Racism in US Healthcare allocation:
There was a problem in the AI system used in the US for the allocation of 200 million patients. But it was observed that the black people were receiving a lower standard of care from this system. Despite the high level of risk they were allotted to the lower risk levels which was not appropriate.This happened because mostly the black patients were not able to afford the fees. Hence, AI perceived that they were not entitled to such a standard and started discriminating. It should have been monitored well beforehand to never let this issue happen.

COMPAS:

This technology was used in the US courts, to guide sentencing by predicting the likelihood of a criminal reoffending. However, a few years ago it was reported that this algorithm reported highly in favour of whites. It was prejudiced that the blacks have a high tendency of reoffending.

Only men CEO:

The algorithm mis learnt looking at the advertisements that only men can be the ceo’s because usually most advertisements show only the male community as CEO’s and other high level positions. This was a case where the reinforcement algorithm learnt it wrong and led to confusions and sex discrimination.
This list continues to go on, innumerably, making it a bane in this boon of AI. Which is why the important question arises. How can we tackle this serious issue and avoid such instances? We are aware of the severe consequences one must face for committing such mistakes. It hurts the sentiments and emotions of people, in extreme cases it might be life threatening too!
Though many reasons are responsible for biasing in Artificial Intelligence, Humans form a big reason as the data might be manipulated and biased at the source or any steps of data mining intentionally or unintentionally. Besides, these algorithms used are also a cause, for example many unsupervised and reinforcement learning are prone to such disasters, if not handled carefully, as the model here learns on its own and the chances are that it might learn from misleading data which make it a biased system in real life application as seen before. It is necessary to look into this issue as it can cause discomfort and disputes among the users.So, how do we design a model? EU has an answer for that. They have suggested that a model should be:

1) Robust

2) Ethical

3) Lawful

So how to reduce bias? Let's see that in the next blog!

AI Bias: What to do? (Part 1)

shreyan1999 — Fri, 26 Jun 2020 17:15:59 +0000

Artificial Intelligence is a technology which covers industries from almost every possible field. We can say that it is a tool which is really helpful to us in many ways.

Basically, Artificial Intelligence is a huge set comprising of Machine learning,Computer vision, Natural language processing, Deep learning to name a few. One might be intrigued when they hear the term “Artificial Intelligence”. As in how can Intelligence be artificial? Is it not natural? Yes of course it is natural for humans, but how about the machines being intelligent? It sounds so cool when we get the maximum out of machines if they are intelligent right? So how can the machines be intelligent? Well, to understand all this we should know what intelligence means. Intelligence is not the ability of scoring marks or excelling at something but it’s the ability to think and respond through actions,words etc. In other words we can say that it means to act and think humanly along with thinking and acting rationally .That’s what it means when we say all are intelligent. So now that we know what intelligence exactly is, let’s see how machines can be intelligent. There are examples like Sophia,a humanoid robot which was granted citizenship in Saudi Arabia looking at the way it behaved just like a human. There are many such humanoids and robots in the world today which are artificially intelligent. It’s the data which aids the process. Data to the robot is like mitochondria to the cell, is the powerhouse for their knowledge,their behaviour and their actions.
It's just analogous to bringing up a child. If the child is trained well with good qualities ,he/she can be a good citizen. In one way we say that the surrounding should be good for children in order for them to be better citizens.There are many examples of children getting into wrong company and spoiling their lives. In our case here it’s the data which decides the future of the machine.A good unbiased data gives out a good result whereas the bad and biased data is a threat to the industry and the society,It can lead to many disasters. There are many movies which have shown this through their characters. Machines learn from the data which they are fed on.
Before training the machine, we need to know how the stage till training is reached which is when we understand where the data might go wrong.Data is a collection of different pieces of information from countless resources. More the data, better is the result. The data collected initially is very raw, just like the gold in a gold mine. Before making jewellery out of it, it undergoes several processes and during these processes the gold’s purity is determined based on carats. Similarly regarding data, this raw data collected from the resource undergoes determines the quality of the result we get finally.
The process of “data processing” or “Data mining”. Some major steps are:

1.Data collection: The first stage of data mining where the data is collected and stored.The data might come from any percepts like sensors,people etc.The data can be in any form Image,text or video.Later this data is stored in some suitable formats.
2.Data Treatment: This step mainly takes care of the data analysis for example the variables available,their range,the data types ,missing values are taken care.
3.Data transformation: The crucial step where the missing values are filled and some of miscellaneous data is dropped inorder to improve the quality of the data.
4.Data training:Before the data is being tested into real applications,It should be validated and tested well in order to perform well in any kind of variations.
5.Model development: Model refers to the the final file after the training.This can recognize the patterns.We use “Algorithms” to train the models.Algorithms are a set of instructions which are used on the model to help it identify the pattern.

Covid-19 Analysis on Azure ML Service

shreyan1999 — Fri, 19 Jun 2020 17:26:09 +0000

This is written as a part of MSP Developer Stories initiative by Microsoft Student Partners (India) program( https://studentpartners.microsoft.com/).
Artificial Intelligence and Machine learning are the two buzzwords which are well heard of in today’s tech-world. Often, people confuse it to be two different things but when we actually study deeply, we find out that one is a subset of the other. Artificial Intelligence comprises Machine Learning, Deep Learning, Natural language processing to name a few.
‘Intelligence’ is a very commonly used term nowadays. What is Intelligence? It is the power to see, think and act, this involves taking decisions as well which we do in our day to day life. Similarly, Artificial Intelligence is nothing but making this artificially. But, the question arises, why artificial? Let us take an example, humans can do several calculations. However, we are completely prone to making errors. Complex calculations take humongous amounts of time to be done manually and may lead to a lot of human errors. Also, we tend to get tired after a while. However, if the same intelligence is programmed into a machine, it can carry out infinite calculations, efficiently, accurately and in very less time. So, basically we can say that It’s for the machines in order to make the best use of intelligence by giving it the power to analyze and make decisions. We have enormous use cases of Artificial Intelligence, be in facial recognition on our cell phones or the social media handles for tagging pictures. There are wide applications of Artificial intelligence. It is believed that some day AI will take over the human race completely.
How do we make these machines intelligent? What is the powerhouse? The answer to all these questions is “DATA”. Yes, the machines learn from the data we provide. Data might be in any form text, images, videos or anything for that matter. Data is analyzed with the help of “Algorithms” and we get the results. Unlike the regular coding where we give in the input and the expected output is shown, here, we feed in the data and we get results out in some forms. Algorithms do the job of telling the machine how to analyze the data in order to enable them to give out good results.
This tutorial can be done using Machine learning studio as well. This is a simple drag and drop, low code platform. Basic python fluency is needed to understand the code of this tutorial. We will analyse the covid-19 data and predict the upcoming cases using machine learning algorithms. I have compared it by using two different algorithms i.e., Linear regression and Support Vector Regression.
Now let us learn it by doing it hands-on. I will be using Microsoft Azure Machine learning service platform in this tutorial.

To open this, navigate to the azure portal (https://azure.microsoft.com/) and then sign in using Microsoft account and create a ‘Machine learning’ service. Fill in the details and create the resource
Once the resource is deployed, you can see an overview page as shown above. Navigate to ‘Experiments’ and launch Azure ML interface. A new window pops up and we can see the interface as shown below.

Click on the ‘Create New’ and select a new ‘JupyterLab’ file with the recent python version.
NOTE: In this tutorial we will go through a basic method of training a model on a notebook. Machine learning service can be further used to deploy models. We can also use Pipelines and Automated ML as well.
Well now we see a new jupyter notebook where we can build our ML model.
First and foremost, we need to import our 3 basic libraries
Pandas: The name is derived from the word Panel-Data. Used for data manipulation and analysis.
NumPy: Numerical-Python. It is an advanced library for mathematical computation. Mainly for working with multidimensional arrays.
Matplotlib.pyplot: Mathematical Plotting library. This is a visualization library. It gives user the flexibility and full control over plotting graphs.
Apart from these, we import some more libraries,
Seaborn : Advanced library for legend functions.
Datetime: For understanding the time format.
Time delta: Understanding the time duration.

We also import a few things from Scikit-learn Library.Scikit-learn library is a very vast library. It is very widely used in the field of Machine learning. It includes many methods which are needed for the data preprocessing as well as fitting the algorithm to the data. Since we use two algorithms, we import SVR and Linear Regression.Gridsearch is used for searching the parameters.Ridge and Lasso are the methods.


    import pandas as pd
    import matplotlib.pyplot as plt                           
    import seaborn as sns                                     
    import numpy as np                                        
    import datetime as dt                                     
    from datetime import timedelta                            
    from sklearn.model_selection import GridSearchCV
    from sklearn.linear_model import LinearRegression,Ridge,Lasso
    from sklearn.svm import SVR

Once we are done with importing the libraries,we go ahead and import the dataset by using the command as follows. Dataset is a systematic collection of data in rows and columns. The dataset I am using here contains the covid-19 cases from all around the world. These datasets can be in any form but we use the CSV (comma separated version) format in this tutorial

   covid=pd.read_csv('covid_19_data.csv')
   covid.head() #returns first 5 columns

.head() function displays the 5 entries of the dataset.
Machine Learning is all about math. It works on Statistics. Hence it is important that one makes sure that the correct information has been provided. No dataset comes in well processed like cereals or rice, We need to preprocess the data just the way we pick out the stones from cereals or rice. This is called “Data Preprocessing”. Data Processing consists of steps like Taking care of missing data, feature scaling, label encoding and many more. This makes sure that we keep the data ready in order to fit it into a Machine learning algorithm like Linear Regression.

   print("Size/Shape of the dataset",covid.shape)
   print("Checking for null values \n",covid.isnull().sum())
   print("Checking Data type:",covid.dtypes)

This gives us the number of null values. They are the missing values from the dataset. We need to fill them with the mean of that particular column or eliminate the whole row consisting of the missing value.

   covid.drop(["SNo"],1,inplace=True)
   covid.head()

The serial number column is not needed here hence we drop the column form the dataset. Since Serial number is not used in any of the calculations in our algorithm, we don’t need it. The ‘inplace’ makes sure that the model understands that the column is dropped.

 covid["ObservationDate"]=pd.to_datetime(covid["ObservationDate"])

Here we use pandas library and datetime libraries to convert into the datetime format. Pandas come in handy when we work with the dataset. Especially during the pre-processing step.

datewise=covid.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})

We find the sum of the specified columns. Aggregating the confirmed, Recovered and death column. Here we group different cases that are datatypes.

print("Basic information")
print("Total number of confirmed cases around the world",datewise["Confirmed"].iloc[-1])
print("Total number of Recovered cases around the world",datewise["Recovered"].iloc[-1])
print("Total number of Death cases around the world",datewise["Deaths"].iloc[-1])  #index location in the form of integers.-1 means starts from end
print("Total number of active cases around the world",datewise["Confirmed"].iloc[-1]-datewise["Recovered"].iloc[-1]-datewise["Deaths"].iloc[-1])
print("Total mumber of close cases",datewise["Recovered"].iloc[-1]+datewise["Deaths"].iloc[-1])

Displaying the cases.

plt.figure(figsize=(15,5))
sns.barplot(x=datewise.index.date,y=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"])
plt.title("Distribution plot for Active cases")
plt.xticks(rotation=90)

plt.figure(figsize=(15,5))
sns.barplot(x=datewise.index.date,y=datewise["Confirmed"]+datewise["Recovered"]-datewise["Deaths"])
plt.title("Distribution plot for Closed cases")
plt.xticks(rotation=90)

Plotting the active and closed cases using the matplotlib.

datewise["WeekofYear"]=datewise.index.weekofyear
week_num= []  #for next week projection
weekwise_confirmed =[]
weekwise_recovered = []
weekwise_deaths =[]
w = 1
for i in list(datewise["WeekofYear"].unique()):
    weekwise_confirmed.append(datewise[datewise["WeekofYear"]==i]["Confirmed"].iloc[-1])
    weekwise_recovered.append(datewise[datewise["WeekofYear"]==i]["Recovered"].iloc[-1])
    weekwise_deaths.append(datewise[datewise["WeekofYear"]==i]["Deaths"].iloc[-1])
    week_num.append(w)
    w=w+1
plt.figure(figsize=(8,5))
plt.plot(week_num,weekwise_confirmed,linewidth=3)
plt.plot(week_num,weekwise_recovered,linewidth=3)
plt.plot(week_num,weekwise_deaths,linewidth=3)
plt.xlabel("week number")
plt.ylabel("Number of cases")
plt.title("Weekly Progress of different type of cases")

Monitoring the weekly progress of the different types of cases is being plotted as above.

plt.figure(figsize = (15,6))
plt.plot(datewise["Confirmed"].diff().fillna(0),label="Daily increase in confirmed cases",linewidth = 3)
plt.plot(datewise["Recovered"].diff().fillna(0),label="Daily increase in Recoveredd cases",linewidth = 3) #diff means approximate values are taaken and filled in empty
plt.plot(datewise["Deaths"].diff().fillna(0),label="Daily increase in Death cases",linewidth = 3)
plt.xlabel("Timestamp")
plt.ylabel("Daily Increment")
plt.title("Daily increase")
plt.xticks(rotation=90)
plt.legend()
print("Average increase in the number of Confirmed cases everyday",np.round(datewise["Confirmed"].diff().fillna(0).mean()))
print("Average increase in the number of Recovered cases everyday",np.round(datewise["Recovered"].diff().fillna(0).mean()))
print("Average increase in the number of Death cases everyday",np.round(datewise["Deaths"].diff().fillna(0).mean()))

We fill the missing column values with the mean of that particular column.

countrywise=covid[covid["ObservationDate"]==covid["ObservationDate"].max()].groupby(["Country/Region"]).agg({"Confirmed":"sum","Recovered":"sum","Deaths":"sum"}).sort_values(["Confirmed"],ascending =False)
countrywise["Mortality"]=(countrywise["Deaths"]/countrywise["Confirmed"])*100
countrywise["Recovery"]=(countrywise["Recovered"]/countrywise["Confirmed"])*100

fig,(ax1,ax2) = plt.subplots(1,2,figsize = (25,10))
top_15confirmed = countrywise.sort_values(["Confirmed"],ascending=False).head(15)
top_15deaths = countrywise.sort_values(["Deaths"],ascending=False).head(15)
sns.barplot(x=top_15confirmed["Confirmed"],y=top_15confirmed.index,ax=ax1)
ax1.set_title("Top15 countries as per number of confimred cases")
sns.barplot(x = top_15deaths["Deaths"],y=top_15deaths.index,ax=ax2)
ax2.set_title("Top15 countries as per number of death cases")

We calculate the country wise mortality rate as shown above.

india_data = covid[covid["Country/Region"]=="India"]
datewise_india = india_data.groupby(["ObservationDate"]).agg({"Confirmed":"sum","Recovered":"sum","Deaths":"sum"})
print(datewise_india.iloc[-1])
print("Total Actvie Cases:",datewise_india["Confirmed"].iloc[-1]-datewise_india["Recovered"].iloc[-1]-datewise_india["Deaths"].iloc[-1])
print("Total Closed cases",datewise_india["Recovered"].iloc[-1]+datewise_india["Deaths"].iloc[-1])

datewise_india["WeekofYear"] = datewise_india.index.weekofyear
week_num_india = []
india_weekwise_confirmed = []
india_weekwise_recovered = []
india_weekwise_deaths = []
w = 1
for i in list(datewise_india["WeekofYear"].unique()):
    india_weekwise_confirmed.append(datewise_india[datewise_india["WeekofYear"]==i]["Confirmed"].iloc[-1])
    india_weekwise_recovered.append(datewise_india[datewise_india["WeekofYear"]==i]["Recovered"].iloc[-1])
    india_weekwise_deaths.append(datewise_india[datewise_india["WeekofYear"]==i]["Deaths"].iloc[-1])
    week_num_india.append(w)
    w = w+1

This code section does the data analysis for india.

plt.figure(figsize = (8,5))
plt.plot(week_num_india,india_weekwise_confirmed,linewidth=3)
plt.plot(week_num_india,india_weekwise_recovered,linewidth = 3)
plt.plot(week_num_india,india_weekwise_deaths,linewidth = 3)
plt.xlabel("Week number")
plt.ylabel("Number of cases")
plt.title("Weekly Progress of different types of cases")

This code section plots and visualizes the weekly progress of different type of cases in India.

max_ind = datewise_india["Confirmed"].max()
china_data = covid[covid["Country/Region"]=="Mainland China"]
Italy_data = covid[covid["Country/Region"]=="Italy"]
US_data = covid[covid["Country/Region"]=="US"]
spain_data = covid[covid["Country/Region"]=="Spain"]
datewise_china = china_data.groupby(["ObservationDate"]).agg({"Confirmed":"sum","Recovered":"sum","Deaths":"sum"})
datewise_Italy = Italy_data.groupby(["ObservationDate"]).agg({"Confirmed":"sum","Recovered":"sum","Deaths":"sum"})
datewise_US = US_data.groupby(["ObservationDate"]).agg({"Confirmed":"sum","Recovered":"sum","Deaths":"sum"})
datewise_Spain = spain_data.groupby(["ObservationDate"]).agg({"Confirmed":"sum","Recovered":"sum","Deaths":"sum"})
print ("It took",datewise_india[datewise_india["Confirmed"]>0].shape[0],"days in India to reach",max_ind,"Confirmed Cases")
print ("It took",datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)].shape[0],"days in Italy to reach number of Confirmed cases to India")
print ("It took",datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)].shape[0],"days in US to reach number of Confirmed cases to India")
print("It took",datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)].shape[0],"days in Spain to reach number of Confirmed cases to India")
print ("It took",datewise_china[(datewise_china["Confirmed"]>0)&(datewise_china["Confirmed"]<=max_ind)].shape[0],"days in China to reach number of Confirmed cases to India")

datewise["Days Since"] = datewise.index-datewise.index[0]
datewise["Days Since"] = datewise["Days Since"].dt.days
train_ml = datewise.iloc[:int(datewise.shape[0]*0.90)]
valid_ml = datewise.iloc[int(datewise.shape[0]*0.90):]
model_scores = []

Now lets compare the number of days countries took to reach their current covid-19 scenario.

lin_reg = LinearRegression(normalize=True)

svm = SVR(C=1,degree = 6,kernel= 'poly',epsilon=0.01,gamma =’scale’)

lin_reg.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))
svm.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

Here we fit our processed data to the algorithms. We start predicting the value that is our model is being trained here. Training our model is a very necessary step as the whole prediction score depends on how well the model is trained and preprocessed.

prediction_valid_lin_reg = lin_reg.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))
prediction_valid_svm = svm.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

Now the model undergoes the testing. Here it actually starts predicting .

new_date = []
new_prediction_lr = []
new_prediction_svm = []
for i in range(1,18):
    new_date.append(datewise.index[-1]+timedelta(days=i))
    new_prediction_lr.append(lin_reg.predict(np.array(datewise["Days Since"].max()+i).reshape(-1,1))[0][0])
    new_prediction_svm.append(svm.predict(np.array(datewise["Days Since"].max()+i).reshape(-1,1))[0])
pd.set_option("display.float_format",lambda x:'%.f' %x)
model_predictions = pd.DataFrame(list(zip(new_date,new_prediction_lr,new_prediction_svm)),columns = ["Dates","LINEAR REGRSN","SVM PREDICTION"])
model_predictions.head(10)

Here we go! The model is now ready to predict the values. We have given the data for the next 10 days. We can increase this by increasing the number in the .head() function.
Note that the SVR is a better prediction in this case. The values predicted are much closer to the actual figures. The model learns from the existing data set and predicts these values. The best algorithm should be chosen according to the need.