DEV Community: Chebon

The Complete Guide to Time Series Models

Chebon — Thu, 19 Oct 2023 12:22:12 +0000

What is a time series model

It is a set of data points ordered in time, where time is the independent variable and the goal is to make a forecast for the future.

Time series models are statistical and mathematical tools used to analyze, process and make forecasts from time series data.

Characteristics of Time Series Data

Stationarity: A time series is said to be stationary if its statistical properties do not change over time,i.e, it has a constant mean and variance and its covariance is independent of time.

From the above image, we see that the process is stationary.

Seasonality: It refers to periodic fluctuations. For instance, power consumption is normally high during the day and low at night.

Trend: The long-term movements or direction of the data.

Components of Time Series Data

Seasonal Component: There are constant variations that happen frequently, such as daily, monthly, or yearly.
Random Noise Component: It is the unpredictable or rather irregular variation of data that cannot be attributed to any specific pattern.
Trend Component: Long term movement in the data can either be upward or downward.

Types of Time Series Models

Prophet: A forecasting model developed by Facebook that is especially designed for time series data with seasonality and trend.
Exponential Smoothing: It belongs to a group of forecasting models that project future values using a weighted average of historical data.
Autoregressive Integrated Moving Average(ARIMA): It is a linear model that combines autoregression, differencing, and moving averages to create a flexible and robust forecasting model.
Seasonal Autoregressive Integrated Moving Average(SARIMA): It is an expansion of the ARIMA model that takes seasonality in the data into consideration.
Vector Autoregression(VAR): It is a model that is quite suitable for multivariate time series analysis. The model describes the relationships between multiple time series variables.

How to Build a Time Series Model.

Prepare the data: This is the initial step that include, loading the data, cleaning data, removing outliers and transforming the data to make the stationary.
Identifying the model parameters: It involves using statistical models to estimate the parameters of the chosen time series model.
Evaluate the model: This involves checking the performance of the model based on the dataset provided.
Using the model to make predictions: After the model has been evaluated and its fit for use, it can be used to make predictions about future values of the data.

Conclusion

For the analysis and forecasting of time-based data, time series modeling is an effective technique. To improve results and make better decisions, time series models can be used in a variety of fields and applications.

EDA using Data Visualization techniques

Chebon — Sat, 07 Oct 2023 00:19:11 +0000

What is Exploratory Data Analysis?

Exploratory Data Analysis (EDA) is a process of describing the data by means of statistical and visualization techniques in order to bring important aspects of that data into focus for further analysis. This involves inspecting the dataset from many angles, describing & summarizing it without making any assumptions about its contents.
Exploratory data analysis is a significant step to take before diving into statistical modeling or machine learning, to ensure the data is really what it is claimed to be and that there are no obvious errors. It should be part of data science projects in every organization.

Why is it important to perform EDA?

It helps you gather insights and make better sense of the data, and removes irregularities and unnecessary values from data. Helps you prepare your dataset for analysis. Allows a machine learning model to predict our dataset better. Gives you more accurate results.

What is Data Visualization.

It is the process of presenting insights through plots, charts, and graphs to communicate findings effectively.

What are some of the visual techniques.

Distribution plots: Also known as PDF plots are used to carry out analysis of one variable at a time. Each feature as a variable on X-axis. The values on the Y-axis in each case represent the normalized density. For instance, let's say our aim is to be able to correctly determine the survival status given the features — patient’s age. Its an example of univariate analysis.

Box plots and Violin plots: Which is also under univariate analysis. Box plot, also known as box and whisker plot, displays a summary of data in five numbers — minimum, lower quartile(25th percentile), median(50th percentile), upper quartile(75th percentile), and maximum data values. A violin plot displays the same information as the box and whisker plot; additionally, it also shows the density-smoothed plot of the underlying distribution.

The isolated points seen in the box plot of positive axillary nodes are the outliers in the data. Such a high number of outliers is kind of expected in medical datasets. Also, the patient age and the operation year plots show similar statistics.

Violin plots in general are more informative as compared to the box plots as violin plots also represent the underlying distribution of the data in addition to the statistical summary. In the violin plot of positive axillary nodes, it is observed that the distribution is highly skewed for class label = ‘yes’, while it is moderately skewed for ‘no’.

3.Heatmap: are used to observe the correlations among the feature variables. This is particularly important when we are trying to obtain the feature importance in regression analysis. Although correlated features do not impact the performance of the statistical model, it could mess up the post-modeling analysis.

The values in the cells are Pearson’s R values which indicate the correlation among the feature variables. As we can see, these values are nearly 0 for any pair, so no correlation exists among any pair of variables.

4.Contour plot: is a graphical technique for representing a 3-dimensional surface by plotting constant z slices, called contours, in a 2-dimensional format. A contour plot enables us to visualize data in a two-dimensional plot. Here is a diagrammatic representation of how the information from the 3rd dimension can be consolidated into a flat 2-D chart

5.Scatter plots: is a diagram where each value in the data set is represented by a dot.

import matplotlib.pyplot as plt
x =[5, 7, 8, 7, 2, 17, 2, 9,
    4, 11, 12, 9, 6] 
y =[99, 86, 87, 88, 100, 86, 
    103, 87, 94, 78, 77, 85, 86]

plt.scatter(x, y, c ="blue")

#To show the plot
plt.show()

Data Science for Beginners: 2023–2024 Complete Roadmap

Chebon — Fri, 29 Sep 2023 21:57:37 +0000

In the field of computer science, data science has been popular for many years. Recently, Data science programmes are being offered by many universities.

The need for data science and big data specialists is increasing dramatically as more data is being generated by vibrant services and applications. The field of data science has emerged as a fantastic career choice for software developers and data nerds.

What is data science?

**
Data science has an intersection with artificial intelligence but is not a subset of artificial intelligence.

Data science is the study of an aroused curiosity in any given field, the extraction of data from a large source of data related to the question in mind, processing data, analyzing and visualizing this data, so as to make meaning out of it for IT and business strategies.

In simple terms, it is understanding and making sense of data. A lot of tools are used in data science. They include statistical tools, probabilistic tools, linear and metric algebra, numerical optimization and programming.

Career paths in data science.
One of the lucrative and in-demand professions for qualified experts is data science. Although a job in data science is fulfilling and lucrative, getting started is not that easy. It is not necessary to have a master’s or bachelor’s degree to work in data science. One requires the proper skill set and expertise. Below are examples of career choices in the field of data science:

Data Analyst
Data Scientist
Data Engineer
Machine Learning Engineer
NLP Engineer
Business Analyst
Power BI engineer

Steps involved in developing a suitable machine learning model

Data collection: This is basically find a suitable dataset to work on, you can either import a csv, excel or a json file.
Data Preparation: Putting together all the data you have and randomizing it, Cleaning the data to remove unwanted data, missing values, rows, and columns, duplicate values, data type conversion, Visualize the data to understand how it is structured and understand the relationship between various variables and finally Splitting the cleaned data into two sets - a training set and a testing set.
Choose suitable model: Depending on the type of data you’re working with to solve a regression or a classification problem. Can be linear or logistic regression, SVM, XGBoost, LSTM , BiLSTM , Random Forest and so on.
Training the model: you pass the prepared data to your machine learning model to find patterns and make predictions.
Model Evaluation: After training your model, you have to check to see how it’s performing. This is done by testing the performance of the model on previously unseen data.
Parameter Tuning: Adjusting to see how best to improve the accuracy of the model.
Make Predictions: In the end, you can use your model on unseen data to make predictions accurately.