DEV Community

Atsushi Suzuki
Atsushi Suzuki

Posted on

Common Pitfalls in Machine Learning Model Inference for Beginners and How to Solve Them

When training machine learning models and applying them to inference tasks, it's not uncommon to encounter issues, especially when switching computing environments (e.g., training on a GPU and inferring on a CPU). This often results in unstable predictions, such as alternating between two classes (0 and 1).

Here, I'll briefly summarize the issues I encountered and how I resolved them.

1. Model Loading Mistake

I built an inference endpoint on a CPU (ml.m5.large) and found that the model, which was trained on a GPU (g4dn.2xlarge), did not produce the expected inference results.

Upon checking the logs, I encountered the following warning:

2024-04-25T05:47:04,365 [WARN ] W-9000-model_1.0-stderr MODEL_LOG - Some weights of the model checkpoint at /opt/ml/model/code/pytorch_model.bin were not used when initializing BertModel...
Enter fullscreen mode Exit fullscreen mode

This issue occurred when I attempted to directly load the pytorch_model.bin file, which was output from the fine-tuned PredictionModel, using BertModel.from_pretrained. The BertModel.from_pretrained method assumes the structure of the basic BERT model, thus it neglected the parameters of the LSTM and linear layers added to the PredictionModel, resulting in important parameters being overlooked.

pretrained_config = path.join("/opt/ml/model/code/", "config.json")
pretrained_model = path.join("/opt/ml/model/code/", "pytorch_model.bin")
config = BertConfig.from_pretrained(pretrained_config)

model = PredictionModel(config=config, pretrained_model=pretrained_model)
Enter fullscreen mode Exit fullscreen mode

The issue was resolved by properly loading the state from the fine_tuning_model.pt, which contained all the parameters of the model:

model = PredictionModel(config=config, pretrained_model=None)
model_path = path.join("/opt/ml/model/code/", "fine_tuning_model.pt")
model.load_state_dict(torch.load(model_path))
Enter fullscreen mode Exit fullscreen mode

2. Device Assignment for the Model

The model was initially set to use a CUDA device by default, which led to errors in an environment not supporting CUDA.

2024-04-28T06:59:31,905 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - Exception in model fn Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False...
Enter fullscreen mode Exit fullscreen mode

In an environment with only CPUs, it was necessary to appropriately assign the model to the correct device:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
Enter fullscreen mode Exit fullscreen mode

3. Disabling Gradient Calculations

During inference, it is recommended to disable gradient calculations when model input data is provided as tensors. While I believe this had no effect on the inference results, failing to do so led to unnecessary memory use and increased computation time.

with torch.no_grad():
    model_input = torch.tensor([preprocessed_data.getitem()], dtype=torch.long).to(device)
    model_output = model(model_input)
Enter fullscreen mode Exit fullscreen mode

Conclusion

Successfully transitioning from training to inference requires addressing key challenges like model loading, device compatibility, and efficient resource management. These solutions ensure more accurate and efficient machine learning applications.

Top comments (0)