DEV Community

Carlota Soto for CrateDB

Posted on • Updated on • Originally published at crate.io

Visualizing time series data with Grafana and CrateDB ๐Ÿ“ˆ

Grafana is an open-source tool that helps you build real-time dashboards and graphs. It is the perfect complement to CrateDB, which is purpose-built for monitoring large volumes of machine data in real-time.

In this tutorial, I will show you how to:

  • Install CrateDB on-prem / Sign up for CrateDB Cloud
  • Load a sample dataset
  • Install Grafana
  • Connect Grafana to CrateDB or CrateDB Cloud
  • Build your first Grafana dashboard
  • If you already have CrateDB or CrateDB Cloud installed with data loaded into it, jump to the "Connect Grafana to CrateDB" section to get straight to the point.

Letโ€™s get started!

Installing CrateDB / Signing up for CrateDB Cloud

If you want to use CrateDB on-prem, an easy way to do so is with Docker. If you don't have Docker, you can download it here โ€” just follow the instructions that will pop-up. For alternative install options, check out our docs.

Once you have Docker installed, create a directory where your data will be stored. Navigate there with your terminal, and run the following command:

docker run --rm --name=cratedb --publish=4200:4200 --publish=5432:5432 --volume="$(pwd)/var/lib/crate:/data" crate
Enter fullscreen mode Exit fullscreen mode

Then, open http://localhost:4200/ in your browser. You will see the CrateDB Admin UI:

Alt Text

If you prefer to use CrateDB Cloud, you can sign up for free trial. For instructions on how to deploy a free-trial cluster, click here. You'll be done in a few minutes, no credit card required.

Loading a sample dataset

In this tutorial, Iโ€™ll use a sample dataset that includes data published by the NYC Taxi & Limousine Commissionโ€”a reduced-size version of the NYC dataset we've used in other posts. This is a great dataset to learn Grafana, as it includes a wide variety of information while loading fast into CrateDB.

The process of importing the dataset is equivalent for both CrateDB on-prem and CrateDB Cloud. If you're using the latter, you can access the CrateDB Admin UI by clicking on "Open cluster administration" in your cluster "Overview" page:

Alt Text

You will be asked the username and password associated with your cluster (the 24-character password). If you don't remember those credentials, you can check out your username (and change your password) by clicking on "Settings".

Once you introduce your credentials, the CrateDB Admin UI will pop-up:

Alt Text

The first step to importing the data is creating a table. In the CrateDB Console, type the following code, and click on "Execute query":

CREATE TABLE "nyc_taxi_sample" (
  "congestion_surcharge" REAL,
  "dolocationid" INTEGER,
  "extra" REAL,
  "fare_amount" REAL,
  "improvement_surcharge" REAL,
  "mta_tax" REAL,
  "passenger_count" INTEGER,
  "payment_type" INTEGER,
  "pickup_datetime" TIMESTAMP WITH TIME ZONE,
  "pulocationid" INTEGER,
  "ratecodeid" INTEGER,
  "store_and_fwd_flag" TEXT,
  "tip_amount" REAL,
  "tolls_amount" REAL,
  "total_amount" REAL,
  "trip_distance" REAL,
  "vendorid" INTEGER)
WITH ("column_policy" = 'dynamic', "number_of_replicas" = '0', "refresh_interval" = 10000);
Enter fullscreen mode Exit fullscreen mode

Alt Text

Now, import the dataset. Execute this query:

COPY "nyc_taxi_sample" FROM 'https://s3.amazonaws.com/crate.sampledata/nyc.yellowcab/yc.2019.07.gz'
WITH (compression = 'gzip');
Enter fullscreen mode Exit fullscreen mode

The data is now loading into CrateDB. Click on "Tables"; you will see the table you just created ("nyc_taxi_sample") getting filled. In my case, it took less than 10 minutes to get the 6 million records.

Alt Text

Before moving on to the next section, let's explore our table. Run this simple query:

SELECT * FROM nyc_taxi_sample
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Alt Text

Scroll to the right to get familiar with the columns in the table, which display information about the NYC taxi rides. Some examples:

  • pickup_datetime: this column shows the pickup date and time for each taxi ride recorded as a timestamp
  • trip_distance: the ride's distance in miles
  • passenger_count: the number of passengers present in each ride
  • total_amount: the price of the ride, in USD

Installing Grafana

To install Grafana, follow the instructions on their website. There are multiple options; in my case, I used the standalone MacOS installation.

curl -O https://dl.grafana.com/oss/release/grafana-7.4.3.darwin-amd64.tar.gz 
tar -zxvf grafana-7.4.3.darwin-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode

If youโ€™re following this method too, once you've copied Grafana to your preferred directory, install and start its local web server with the following command:

./bin/grafana-server web
Enter fullscreen mode Exit fullscreen mode

Now, access Grafana by opening http://localhost:3000/ in your browser:

Alt Text

In the previous screen, enter "admin" in both username and password. The screen below will pop up next, where you can set up a new password.

Alt Text

You're now in!

Alt Text

Connecting Grafana to CrateDB or CrateDB Cloud

If you just installed Grafana, click on "Add your first data source".

Alt Text

If you don't see the previous screen, you can also go to "Configuration -> Data Sources" in the left menu.

Alt Text

Here, click on "Add data source":

Alt Text

Once you get to a page like this, look for PostgreSQL. Select it.

Alt Text

You will get to a configuration screen.

If you are using CrateDB on-prem, enter the following values:

  • Name: CrateDB
  • Host: localhost:5432
  • Database: doc
  • User: crate
  • Password: leave it empty
  • SSL mode: disable

Alt Text

Once you've filled it up, click on "Save and test". If everything goes well, you'll see โ€œDatabase Connection OKโ€:

Alt Text

If you are using CrateDB Cloud, the configuration will be the following:

  • Name: CrateDB Cloud
  • Host: to get this information, go to the CrateDB Cloud Console. In the cluster "Overview" page, click on "Learn how to connect to the cluster".

Alt Text

A screen like this will display:

Alt Text

To create a Grafana connection, use the psql client. In this case:

my-cluster.aks1.eastus2.azure.cratedb.net

  • Database: doc
  • User and password: these are the credentials to access your CrateDB Cloud cluster (the 24-character password). Remember that they will show under "Settings" on the cluster "Overview" page
  • SSL mode: require

Once you have everything ready, your configuration page will look like this:

Alt Text

Click on "Save and test". If everything goes well, you'll see a "Database connection OK" message pop up.

Alt Text

Building a dashboard in Grafana

Now we're ready for the final step: let's build a dashboard in Grafana.

If you're new to Grafana, you'll see the screen below. Click on "Dashboards" in the menu on the left. Then, click on "Create your first dashboard":

Alt Text

If you don't see the previous option, you can also create a dashboard under "Create -> Dashboard" on the left menu.

Alt Text

Either way, you will get to a screen like the one below.

Alt Text

In Grafana, dashboards are composed of individual blocks called panels, to which you can assign different visualization types and individual queries. Click on "Add new panel".

In the next screen, we will define the query for your panel, the type of visualization (like graphs, stats, tables, or bar charts), and the time range. I won't get into all the options in the panel editor here, but I recommend you explore the menu on the right to start getting familiar with it. Also, check out the Grafana documentation.

Alt Text

We will start by defining our query. Click on "Edit SQL":

Alt Text

A console will open up where you can write your SQL code.

Alt Text

To build this panel, let's plot the number of rides per day in the first week of July 2019.

Paste this query in the SQL editor:

SELECT date_trunc('day', pickup_datetime) AS time,
COUNT(*) AS rides
FROM nyc_taxi_sample
WHERE pickup_datetime BETWEEN '2019-07-01T00:00:00' AND '2019-07-07T23:59:59'
GROUP BY 1
ORDER BY 1;
Enter fullscreen mode Exit fullscreen mode

Note: something important to know about the "Time series" format mode in Grafana is that your query needs to return a column called "time". Grafana will identify this as your time metric, so make sure the column has the proper datatype (any datatype representing an epoch time). In this query, we're labeling pickup_datetime as "time" for this reason.

This is how it looks like:

Alt Text

Now, set up the following configuration:

  • On the top of the panel, select the appropriate time range for your panelโ€”in this case, from July 1st to July 7th, 2019:

Alt Text

  • Under "Settings" on the right, define the name of your panel.
  • Under "Visualization", select "Graph".
  • Under "Display", select "Bars".

Click on "Apply" on the right side of the screen. Once everything is ready, your graph will show up:

Alt Text

The number of rides per day fluctuates between 100k and 200k, with July 2nd being the busiest day of the week.

Click on the arrow on the top left of the screen to go back to your dashboard. It now contains its first panel:

Alt Text

Before wrapping up, let's add a couple more panels to it. For example, let's include the following information:

  • How many taxi rides took place during the week?
  • What was the average distance per ride per day?

To add more panels to your dashboard, click on "Add panel" at the top of the screen.

Now that you get the basics of panel building, I will go through these a bit quicker, describing only the key steps:

  • Panel 2: How many rides took place during the first week of July?

For building this panel, paste this query in the SQL editor:

SELECT COUNT(*) AS total_rides
FROM nyc_taxi_sample
WHERE pickup_datetime BETWEEN '2019-07-01T00:00:00' AND '2019-07-07T23:59:59';
Enter fullscreen mode Exit fullscreen mode

And under "Visualization", select "Stat" as your panel type. It will look like this:

Alt Text

  • Panel 3: What was the average distance per ride per day?

For building this panel, paste this query in the SQL editor:

SELECT
date_trunc('day', pickup_datetime) AS time,
COUNT(*) as rides,
SUM(trip_distance) as total_distance,
SUM(trip_distance) / COUNT(*) AS average_distance_per_ride
FROM nyc_taxi_sample
WHERE pickup_datetime BETWEEN '2019-07-01T00:00:00' AND '2019-07-07T23:59:59'
GROUP BY time
ORDER BY 1;
Enter fullscreen mode Exit fullscreen mode

With a little bit of formatting, and after selecting "average_distance_per_ride" at the bottom of the graph (it will show the three columns returned by the query above), this is how the panel looks like:

Alt Text

Our average distance per trip ranges around 3 miles on average.

Our dashboard now looks a little bit more complete! If you want to keep building it, check out this blogpost for inspiration, where we use SQL queries to explore the NYC taxi data. You can also customize the look of your dashboard by dragging and dropping the panels, making them larger... Go to the Grafana documentation for more.

Alt Text

Don't forget to click on the "Save dashboard" button before you go.

Alt Text


There's nothing better than trying things by yourself! Download CrateDB or sign up for a CrateDB Cloud free trial. Experiment... And tell us what you think ๐Ÿ˜

Apart from Dev.to, you can reach to the Crate.io team in:

See you around ๐Ÿ

Top comments (0)