Data preprocessing is the process of converting raw data into computer understandable formats, it’s the first step in any machine learning operation. Data collection is usually loosely controlled and may result in out-of-range values. Data preparation and filtering steps can take a considerable amount of processing time.
Data preprocessing includes:
- Reading Data from files.
- Data cleaning.
- Instance selection.
- Data standardization.
- Data transformation.
- Feature extraction and selection.
The product of data preprocessing is the final training set. In this article, I will address some of the data preprocessing steps while using C++, also data visualization using the Matplotlib-Cpp library.
First, this article is part of a series discussing the implementation of the Machine learning Basics using C++. Please follow
In this article, I will use the iris dataset as an example of the data that we can perform each operation on it, also note that I will be using C++11 in this tutorial.
Reading Data from Files:
After downloading the iris.data file from here. let’s read the data from a file with a simple read file instructions and parse each type of data in a separate vector.
Top comments (0)