| Menu | Next Post: NLP and Elastic: Getting started |
Note: This HandsOn assumes that you have already followed the step-by-step Setup of your Elastic Cloud account and added the Samples available there to replicate the analysis mentioned here. If not, please, follow the steps mentioned there. And if you didn't follow the steps of the Elastic Data Frame - Classification Analysis HandsOn I also suggest you do this before proceeding with this HandsOn, we will be using our new model in this tutorial.
Let's create an ingest pipeline with an inference processor:
Kibana>Stack Management>Ingest>Ingest Node Pipelines
You can add the name of your preference, I'll add the name: pipeline-delay-prediction
And then you add your inference processor:
Add a processor> Processor=Inference
Open another Kibana page and copy your ML Model ID:
Kibana>Machine Learning>Data Frame Analytics>Models
Go back to the processor configuration page and paste the Model ID, as in the image below:
I also added the target field (optional) delay
to add the inference processor results to ml.inference.delay
. When done, click add.
Before creating the pipeline and starting to use it, you can test it. Click Add documents.
You can copy a document from the original index as an example and change its data to simulate a new document.
Do this using kibana_sample_data_flights
index to not get ML data.
Open a new window:
Kibana>Analytics>Discover
And select kibana_sample_data_flights
, choose any document and copy the document _id
and _index
.
Go back to the Ingest Node Pipelines window and paste the index and id there and click Add document. With this you can change the field value to simulate what you want.
I removed the fields: FlightDelay
, FlightDelayType
and FlightDelayMin
, for the same reason I removed these fields during training.
This is our updated JSON document:
[
{
"_id": "MKx29nkBQAv3jO3lIeem",
"_index": "kibana_sample_data_flights",
"_source": {
"FlightNum": "GDZWNB0",
"DestCountry": "CN",
"OriginWeather": "Clear",
"OriginCityName": "London",
"AvgTicketPrice": 952.4522444587226,
"DistanceMiles": 5743.8378391883825,
"DestWeather": "Rain",
"Dest": "Shanghai Hongqiao International Airport",
"OriginCountry": "GB",
"dayOfWeek": 6,
"DistanceKilometers": 9243.810963470789,
"timestamp": "2021-07-11T23:50:12",
"DestLocation": {
"lat": "31.19790077",
"lon": "121.3359985"
},
"DestAirportID": "SHA",
"Carrier": "Kibana Airlines",
"Cancelled": false,
"FlightTimeMin": 770.3175802892324,
"Origin": "London Gatwick Airport",
"OriginLocation": {
"lat": "51.14810181",
"lon": "-0.190277994"
},
"DestRegion": "SE-BD",
"OriginAirportID": "LGW",
"OriginRegion": "GB-ENG",
"DestCityName": "Shanghai",
"FlightTimeHour": 12.838626338153874
}
}
]
Add the input data that makes sense for your training and testing under "Documents" and click "Run the pipeline”.
This is our output:
{
"docs": [
{
"doc": {
"_index": "kibana_sample_data_flights",
"_type": "_doc",
"_id": "MKx29nkBQAv3jO3lIeem",
"_source": {
"FlightNum": "GDZWNB0",
"Origin": "London Gatwick Airport",
"OriginLocation": {
"lon": "-0.190277994",
"lat": "51.14810181"
},
"DestLocation": {
"lon": "121.3359985",
"lat": "31.19790077"
},
"DistanceMiles": 5743.8378391883825,
"FlightTimeMin": 770.3175802892324,
"OriginWeather": "Clear",
"dayOfWeek": 6,
"AvgTicketPrice": 952.4522444587226,
"Carrier": "Kibana Airlines",
"OriginRegion": "GB-ENG",
"DestAirportID": "SHA",
"timestamp": "2021-07-11T23:50:12",
"Dest": "Shanghai Hongqiao International Airport",
"FlightTimeHour": 12.838626338153874,
"Cancelled": false,
"DistanceKilometers": 9243.810963470789,
"OriginCityName": "London",
"delay": {
"prediction_score": 0.4013867640677467,
"model_id": "delay-prediction-1626961317123",
"FlightDelay_prediction": false,
"top_classes": [
{
"class_name": false,
"class_probability": 0.9983471228188069,
"class_score": 0.4013867640677467
},
{
"class_name": true,
"class_probability": 0.0016528771811931647,
"class_score": 0.0016528771811931647
}
],
"prediction_probability": 0.9983471228188069
},
"DestWeather": "Rain",
"OriginCountry": "GB",
"DestCountry": "CN",
"DestRegion": "SE-BD",
"OriginAirportID": "LGW",
"DestCityName": "Shanghai"
},
"_ingest": {
"timestamp": "2021-07-22T14:30:02.675515386Z"
}
}
}
]
}
And this was our original document:
{
"_index": "kibana_sample_data_flights",
"_type": "_doc",
"_id": "MKx29nkBQAv3jO3lIeem",
"_version": 1,
"_score": null,
"fields": {
"Origin": [
"London Gatwick Airport"
],
"OriginLocation": [
{
"coordinates": [
-0.190277994,
51.14810181
],
"type": "Point"
}
],
"FlightNum": [
"GDZWNB0"
],
"DestLocation": [
{
"coordinates": [
121.3359985,
31.19790077
],
"type": "Point"
}
],
"FlightDelay": [
false
],
"DistanceMiles": [
5743.838
],
"FlightTimeMin": [
770.31757
],
"OriginWeather": [
"Clear"
],
"dayOfWeek": [
6
],
"AvgTicketPrice": [
952.4523
],
"Carrier": [
"Kibana Airlines"
],
"FlightDelayMin": [
0
],
"OriginRegion": [
"GB-ENG"
],
"DestAirportID": [
"SHA"
],
"FlightDelayType": [
"No Delay"
],
"hour_of_day": [
23
],
"timestamp": [
"2021-07-11T23:50:12.000Z"
],
"Dest": [
"Shanghai Hongqiao International Airport"
],
"FlightTimeHour": [
"12.838626338153874"
],
"Cancelled": [
false
],
"DistanceKilometers": [
9243.811
],
"OriginCityName": [
"London"
],
"DestWeather": [
"Rain"
],
"OriginCountry": [
"GB"
],
"DestCountry": [
"CN"
],
"DestRegion": [
"SE-BD"
],
"OriginAirportID": [
"LGW"
],
"DestCityName": [
"Shanghai"
]
},
"sort": [
1626047412000
]
}
As you can see, I didn't change the other fields and values from the input document to be able to compare the original value of the variable we are classifying, in this case: "FlightDelay": [false]
with our output result, in this case: FlightDelay_prediction": false
.
This assures us that the result is correct, consistent with the original document.
Now you can close the Test Pipeline and click Create pipeline to start using this pipeline, or continue changing the value of the fields to test the model.
You can also test pipelines using the simulate pipeline API.
In this case it would be:
POST /_ingest/pipeline/pipeline-delay-prediction/_simulate
{
"docs": [
{
"_id": "MKx29nkBQAv3jO3lIeem",
"_index": "kibana_sample_data_flights",
"_source": {
"FlightNum": "GDZWNB0",
"DestCountry": "CN",
"OriginWeather": "Clear",
"OriginCityName": "London",
"AvgTicketPrice": 952.4522444587226,
"DistanceMiles": 5743.8378391883825,
"DestWeather": "Rain",
"Dest": "Shanghai Hongqiao International Airport",
"OriginCountry": "GB",
"dayOfWeek": 6,
"DistanceKilometers": 9243.810963470789,
"timestamp": "2021-07-11T23:50:12",
"DestLocation": {
"lat": "31.19790077",
"lon": "121.3359985"
},
"DestAirportID": "SHA",
"Carrier": "Kibana Airlines",
"Cancelled": false,
"FlightTimeMin": 770.3175802892324,
"Origin": "London Gatwick Airport",
"OriginLocation": {
"lat": "51.14810181",
"lon": "-0.190277994"
},
"DestRegion": "SE-BD",
"OriginAirportID": "LGW",
"OriginRegion": "GB-ENG",
"DestCityName": "Shanghai",
"FlightTimeHour": 12.838626338153874
}
}
]
}
| Menu | Next Post: NLP and Elastic: Getting started |
This post is part of a series that covers Artificial Intelligence with a focus on Elastic's (Creators of Elasticsearch) Machine Learning solution, aiming to introduce and exemplify the possibilities and options available, in addition to addressing the context and usability.
Top comments (0)