Common Pitfalls in Machine Learning Model Inference for Beginners and How to Solve Them

#machinelearning #beginners #python #ai

When training machine learning models and applying them to inference tasks, it's not uncommon to encounter issues, especially when switching computing environments (e.g., training on a GPU and inferring on a CPU). This often results in unstable predictions, such as alternating between two classes (0 and 1).

Here, I'll briefly summarize the issues I encountered and how I resolved them.

1. Model Loading Mistake

I built an inference endpoint on a CPU (ml.m5.large) and found that the model, which was trained on a GPU (g4dn.2xlarge), did not produce the expected inference results.

Upon checking the logs, I encountered the following warning:

2024-04-25T05:47:04,365 [WARN ] W-9000-model_1.0-stderr MODEL_LOG - Some weights of the model checkpoint at /opt/ml/model/code/pytorch_model.bin were not used when initializing BertModel...

This issue occurred when I attempted to directly load the pytorch_model.bin file, which was output from the fine-tuned PredictionModel, using BertModel.from_pretrained. The BertModel.from_pretrained method assumes the structure of the basic BERT model, thus it neglected the parameters of the LSTM and linear layers added to the PredictionModel, resulting in important parameters being overlooked.

pretrained_config = path.join("/opt/ml/model/code/", "config.json")
pretrained_model = path.join("/opt/ml/model/code/", "pytorch_model.bin")
config = BertConfig.from_pretrained(pretrained_config)

model = PredictionModel(config=config, pretrained_model=pretrained_model)

The issue was resolved by properly loading the state from the fine_tuning_model.pt, which contained all the parameters of the model:

model = PredictionModel(config=config, pretrained_model=None)
model_path = path.join("/opt/ml/model/code/", "fine_tuning_model.pt")
model.load_state_dict(torch.load(model_path))

2. Device Assignment for the Model

The model was initially set to use a CUDA device by default, which led to errors in an environment not supporting CUDA.

2024-04-28T06:59:31,905 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - Exception in model fn Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False...

In an environment with only CPUs, it was necessary to appropriately assign the model to the correct device:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

3. Disabling Gradient Calculations

During inference, it is recommended to disable gradient calculations when model input data is provided as tensors. While I believe this had no effect on the inference results, failing to do so led to unnecessary memory use and increased computation time.

with torch.no_grad():
    model_input = torch.tensor([preprocessed_data.getitem()], dtype=torch.long).to(device)
    model_output = model(model_input)

Conclusion

Successfully transitioning from training to inference requires addressing key challenges like model loading, device compatibility, and efficient resource management. These solutions ensure more accurate and efficient machine learning applications.

DEV Community

Common Pitfalls in Machine Learning Model Inference for Beginners and How to Solve Them

1. Model Loading Mistake

2. Device Assignment for the Model

3. Disabling Gradient Calculations

Conclusion

Top comments (0)

Read next

Unpacking AI Risks: Oversight, Self-Exfiltration, and Data Manipulation in OpenAI’s o1 Model

Text compression & Code splitting & Modern image formats - Performance optimization

🧽 Cleaning up Security Hub with AWS Resource Explorer 🫧

Transforming Healthcare with AI: Introducing Contact Doctor's Bio-Medical-MultiModal-Llama-3-8B-V1 LLM