Beginner Questions about ML classification / forecasts

Thu, 19 Sep 2024 09:35:07 +0000

Hello, I hope this is the right place to ask this. I'm relatively new to machine learning and currently trying out building classification models for tabular data on Google cloud.
The goal is to apply machine learning on a database of technical equipments to predict, if an equipment might require maintenance. There's lots of missing data though and I'm not sure how the data should be to produce a reliable model.
There's mostly technical static data available, but information about usage time could only be inferred from other entities. Is it the right approach to build a model that outputs boolean values for each equipment? Something like "is maintenance due within 1 month?".

I've also tried this credit card transaction dataset: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

but I don't understand some of the technical details. On the kaggle page it says

Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

But the stats of the model look like this now:

What do I make of this?

DEV Community: M

Beginner Questions about ML classification / forecasts