DEV Community

Md. Ismiel Hossen Abir
Md. Ismiel Hossen Abir

Posted on

ValueError: A given column is not a column of the dataframe

Image descriptionSometimes, when attempting to use the 'pipeline,' I encounter an error called 'ValueError: A given column is not a column of the dataframe.' However, the column still exists in the dataset.

I found this error to be quite interesting. Initially, I thought about seeking help on Kaggle or Stack Overflow to solve this problem. Eventually, I discovered the solution on Stack Overflow, and now I understand why this error occurs. It's really interesting.

Let me describe how to overcome this error…

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

preprocessor = ColumnTransformer(
    transformers=[
        ('ohe', OneHotEncoder(drop='first', sparse=False, handle_unknown='ignore'), ohe),
        ('scaler', StandardScaler(), scaler),
        ('ordinal', OrdinalEncoder(), ordinal),
        ('label', LabelEncoder(), label)
    ], remainder='passthrough')
Enter fullscreen mode Exit fullscreen mode

I attempted to use 'ColumnTransformer' to preprocess everything, and that worked fine. However, when I tried to fit the pipeline, a 'column not found' error occurred.

The reason for this error is that I used the target value 'LabelEncoder' in the preprocessing section. However, the target/output part should be handled independently, as shown in the code below:

label = LabelEncoder()

y_label_train = label.fit_transform(y_train)
y_label_test = label.transform(y_test)
Enter fullscreen mode Exit fullscreen mode

After that, we can fit this without errors.

If there is another method to solve this error, you can share it. I would like to learn another way to resolve this issue.

Thank you.

Top comments (0)