DEV Community

Samuel Earl
Samuel Earl

Posted on

Data Pipelines with Great Expectations | Step 4: Validate data

Set up a Checkpoint

A Checkpoint is a step in a data pipeline that runs an Expectation Suite against a batch of data. New validation results are created each time a Checkpoint validates a batch of data. Checkpoints can also be configured to execute Actions (e.g. send an alert message through Slack or email when a validation fails).

We are going to run our Expectation Suite against the February data in our data folder. In order to do that we need to set up a Checkpoint. In your terminal run the following command from your gx-getting-started directory (make sure to shut down any running Jupyter Notebooks first):

great_expectations checkpoint new getting_started_checkpoint
Enter fullscreen mode Exit fullscreen mode

A new browser tab will open with a Jupyter Notebook and the edit_checkpoint_getting_started_checkpoint.ipynb file inside.


File Location

The file to edit your Checkpoint is located at
gx-getting-started/uncommitted/edit_checkpoint_getting_started_checkpoint.ipynb.


This file contains the configuration code for a Checkpoint. Scroll down to the second code cell. You should see the following code:

my_checkpoint_name = "getting_started_checkpoint" # This was populated from your CLI command.

yaml_config = f"""
name: {my_checkpoint_name}
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
  - batch_request:
    datasource_name: getting_started_datasource
    data_connector_name: default_inferred_data_connector_name
    data_asset_name: yellow_tripdata_sample_2019-02.csv
    data_connector_query:
        index: -1
    expectation_suite_name: getting_started_expectation_suite_taxi.demo
"""
print(yaml_config)
Enter fullscreen mode Exit fullscreen mode

Make sure that the data_asset_name is set to yellow_tripdata_sample_2019-02.csv so we can validate the February data. Now scroll down to the last code cell and uncomment both lines of code. Then run all the cells in the notebook, which will create the Checkpoint configs in your project.

This will open the Data Docs again in a new tab.

Inspect the Validation Results for your Checkpoint

You should see that the first result listed under the “Validation Results” tab shows a failed Expectation Suite (i.e. there is a red “X” in the “Status” column). If you click on that row you will see more details about the failure.

You should see that the Expectation for "values must be greater than or equal to 1 and less than or equal to 6" failed. But why?

In the results of that Expectation you will see a small table with a heading "Sampled Unexpected Values" with a 0 below it. That means that the passenger_count column contained the value 0. Since that Expectation was configured to only allow values greater than or equal 1 and less than or equal to 6 for the passenger_count column, the validation failed. (A taxi ride with 0 passengers does not make sense.)

Top comments (0)