When training machine learning models and applying them to inference tasks, it's not uncommon to encounter issues, especially when switching computing environments (e.g., training on a GPU and inferring on a CPU). This often results in unstable predictions, such as alternating between two classes (0 and 1).
Here, I'll briefly summarize the issues I encountered and how I resolved them.
1. Model Loading Mistake
I built an inference endpoint on a CPU (ml.m5.large
) and found that the model, which was trained on a GPU (g4dn.2xlarge
), did not produce the expected inference results.
Upon checking the logs, I encountered the following warning:
2024-04-25T05:47:04,365 [WARN ] W-9000-model_1.0-stderr MODEL_LOG - Some weights of the model checkpoint at /opt/ml/model/code/pytorch_model.bin were not used when initializing BertModel...
This issue occurred when I attempted to directly load the pytorch_model.bin
file, which was output from the fine-tuned PredictionModel
, using BertModel.from_pretrained
. The BertModel.from_pretrained
method assumes the structure of the basic BERT model, thus it neglected the parameters of the LSTM and linear layers added to the PredictionModel
, resulting in important parameters being overlooked.
pretrained_config = path.join("/opt/ml/model/code/", "config.json")
pretrained_model = path.join("/opt/ml/model/code/", "pytorch_model.bin")
config = BertConfig.from_pretrained(pretrained_config)
model = PredictionModel(config=config, pretrained_model=pretrained_model)
The issue was resolved by properly loading the state from the fine_tuning_model.pt
, which contained all the parameters of the model:
model = PredictionModel(config=config, pretrained_model=None)
model_path = path.join("/opt/ml/model/code/", "fine_tuning_model.pt")
model.load_state_dict(torch.load(model_path))
2. Device Assignment for the Model
The model was initially set to use a CUDA device by default, which led to errors in an environment not supporting CUDA.
2024-04-28T06:59:31,905 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - Exception in model fn Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False...
In an environment with only CPUs, it was necessary to appropriately assign the model to the correct device:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
3. Disabling Gradient Calculations
During inference, it is recommended to disable gradient calculations when model input data is provided as tensors. While I believe this had no effect on the inference results, failing to do so led to unnecessary memory use and increased computation time.
with torch.no_grad():
model_input = torch.tensor([preprocessed_data.getitem()], dtype=torch.long).to(device)
model_output = model(model_input)
Conclusion
Successfully transitioning from training to inference requires addressing key challenges like model loading, device compatibility, and efficient resource management. These solutions ensure more accurate and efficient machine learning applications.
Top comments (0)