Feature Engineering:
Feature engineering is the process of selecting, transforming, and creating features (also known as variables or inputs) from raw data that are used to train a machine learning model. It involves identifying relevant features from the raw data and then transforming them to make them suitable for use in the model. The goal of feature engineering is to identify and transform features that capture relevant patterns and relationships in the data.
Scenario:
Consider that you are asked to build a Machine learning model that takes images as an input and then predicts if the image passed is of a Pizza or not. Play around and understand the concepts using SliceofML link in the reference section
The interactive webpage would navigate you through the below tasks. To visually explain the importance of the Feature engineering I would be exploring 2 models (Step 3 and Step 4) repeated but with different Features
Setting the Accuracy Goal for the Model:
Splitting the Dataset for Train and test:
Model 1:Selected the features are as Cooking and Carbs
The Model evaluation is as below
Model 2:Selected the features are as Sauce and cheese availability
The Model evaluation is as below
Both Models though have the same data set to test and train their evaluation metrics are completely different because of the features selected while building the model
Persona's Involved
Sl No | Persona | Activity |
---|---|---|
1 | Data Engineer | Identify the data objects and data source Design Feature engineering Data privacy and data masking Data profiling and cleansing Split Data for Training and Test |
2 | Data Analyst | Data Insight Detect outliers and anomalies Validate the hypothesis of the business case |
3 | Data scientists / ML engineer | Decision on packages / library need Validate the architecture and identify any new Platform / capability needed Cross validation data split Evaluate performance on the rest of your training data |
Watch - out !!
Overfitting: Overfitting occurs when the model is too complex and performs well on the training data, but poorly on new data. Incorrect feature engineering can lead to overfitting by including too many irrelevant or redundant features, or by transforming features in a way that overemphasizes noise in the data.
Underfitting: Underfitting occurs when the model is too simple and is unable to capture the underlying patterns and relationships in the data. Incorrect feature engineering can lead to underfitting by failing to include relevant features or by transforming features in a way that removes important information.
Bias: Bias occurs when the model is systematically wrong and consistently makes incorrect predictions. Incorrect feature engineering can introduce bias by selecting or transforming features in a way that discriminates against certain groups or overemphasizes certain characteristics of the data.
Reduced interpretability: Incorrect feature engineering can make it difficult to interpret the results of the model. If features are transformed in a way that is difficult to understand or if irrelevant features are included, it can be challenging to understand why the model is making certain predictions.
Decreased efficiency: Incorrect feature engineering can lead to decreased efficiency by including too many features, transforming features in a computationally expensive way, or introducing dependencies between features that make the model more difficult to compute.
Key
Overall, incorrect feature engineering can lead to a range of issues that can impact the performance, interpretability, and efficiency of a machine learning model. It is therefore important to carefully design and evaluate the feature engineering process to ensure that the resulting features are relevant, informative, and appropriate for the problem at hand.
Top comments (0)