Introduction
PMML is a markup language to save your AI/ML model files so that you can use them for predictions later on (maybe during production).cPMML is a library created by the AmadeusITGroup to parse and run predictions in C++. In this blog, we will train a linear regression model inpython and generate a pmml file and then we will run our predictions in C++.
Creating a model file
Dependencies
We will need pandas, numpy, scikit-learn and sklearn2pmml.
pip install pandas numpy scikit-learn sklearn2pmml
Imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
The model
Dataset
For keeping things simple, let's train a linear regression model to match the equation, y = 2x + 1. We can generate a random dataset for this equation.
X = np.random.rand(100, 1)
Y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
Test/Train data
Next, we'll divide the data into test and train datasets.
df = pd.DataFrame({'X': X.flatten(), 'Y': Y.flatten()})
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
X_train = train_df[['X']]
y_train = train_df['Y']
X_test = test_df[['X']]
y_test = test_df['Y']
Training the model
For training the model, we can get the model from scikit learn library and use the dataset we generated above. We can also check the mse to get an idea of the model's accuracy.
pipeline = PMMLPipeline([
    ("regressor", LinearRegression())
])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Saving the pmml file
If you are satisfied by the performance of your model, you can export the model as a pmml file. We will save the model with the name, lr_model.pmml
sklearn2pmml(pipeline, "lr_model.pmml", with_repr = True)
Using the model file
The main step of focus in this blog is using the model in C++ program. For this, you will need to isntall the cPMML library.
Installing cPMML
To install the libray in your system, you just need to run the below command. This will run cmake, so you should have cmake installed in your system.
git clone https://github.com/AmadeusITGroup/cPMML.git && cd cPMML && ./install.sh
For Mac M1
I ran into some problems while installing this on Mac M1. Here are the steps to install this effortlessly.
- Ensure you have the latest version of 
cmakeinstalled in your system. - You can edit the 
install.shscript to remove-j 4flag from thecmake -j 4 ..command. This will turn off the multi processing. - The last line of the 
install.shscript issudo ldconfig. Change this tosudo update_dyld_shared_cache. This installs the.dylibor.solibrary files to proper destination. 
Running the predictions
Include the library
The first thing is to import the library.
#include "cPMML.h"
#include <iostream>
Load the model
Then you can load the model.
int main() {
  cpmml::Model model("lr_model.pmml");
  return 0;
}
Start predictions
The cPMML library takes input as an unordered_map of strings. For us, there is only one input which is X.
int main() {
  cpmml::Model model("lr_model.pmml");
  // This shoule yield a value close to 1
  std::unordered_map<std::string, std::string> input1 = {
    {"X", "0"}
  };
  // This should yield a value close to 21
  std::unordered_map<std::string, std::string> input2 = {
    {"X", "10"}
  };
  std::cout<<"X = 0 Y = "<<model.predict(input1)<<'\n';
  std::cout<<"X = 10 Y = "<<model.predict(input2)<<'\n';
  return 0;
}
Compilation
You can compile the code by including the cPMML library.
> g++ -std=c++11 predict.cpp -o predict.o -lcPMML
> ./predict.o
X = 0 Y = 0.967265
X = 10 Y = 21.369305
Conclusion
In this blog, we saw how to store your model as a PMML file and load it in C++ using cPMML library. You can view the code for the above here.
    
Top comments (0)