Tencent Cloud -Cloud Log Service

Posted on Jun 10

Migrate Grafana Dashboards from Elasticsearch to Tencent Cloud CLS

#grafana #logging #devops #observability

Grafana dashboards often outlive the storage system behind them. If an operations team moves logs from Elasticsearch to Tencent Cloud CLS, the expensive part is not only data migration. Existing panels, variables, operational tools, and habits can also break unless Grafana can query CLS directly.

The source article describes the official Tencent Cloud Monitor Grafana plugin and shows how CLS can replace an Elasticsearch log data source while keeping the dashboard layer in Grafana.

Install the signed Grafana plugin

The CLS data source is maintained by the Tencent Cloud Log Service team and is signed in the official Grafana plugin catalog. The source article states that the setup can be completed in about five minutes and does not require code changes.

Open Grafana Configuration -> Plugins.
Search for Tencent cloud monitor.
Install the plugin.
Return to the plugin page and enable it.

The official plugin page referenced by the source article is:

https://grafana.com/grafana/plugins/tencentcloud-monitor-app/

Configure CLS as a data source

After the plugin is enabled:

Open Data Sources.
Click Add data source.
Select Tencent Cloud Monitor.
Fill in the data source name.
Provide Tencent Cloud access credentials.
Select Log Service.
Save the data source.

How the query model maps from Elasticsearch to CLS

The source article compares the two query editors:

Elasticsearch data source	CLS data source
Top query input accepts a Lucene statement for log filtering	Region and log topic are selected first
Auxiliary input areas generate DSL for aggregation	Search-analysis query input accepts CLS Lucene plus SQL
Aggregation is configured through panel fields	Lucene and SQL are separated by the pipe character `

In CLS, the Lucene part filters logs and the SQL part performs analysis. The source article points to the CLS search-analysis syntax documentation:
{% raw %}

https://cloud.tencent.com/document/product/614/47044

Query pattern 1: log count over time

For a time-series count panel, the Elasticsearch version uses Metric = Count and Group By = Histogram. In CLS, the same idea is expressed with histogram and count.

The article notes that common aggregation functions such as Max, Min, and Distinct can be used in the same pattern by replacing count.

Query pattern 2: raw log viewing

For raw logs, Elasticsearch uses a Logs metric mode. The CLS data source only needs the corresponding Lucene statement.

Query pattern 3: error-code aggregation

The source article aggregates logs by error code and shows that Grafana variables such as $path can be used directly in CLS data source queries.

When drawing the pie chart, the article specifically notes that the right-side chart option should be ValueOptions-AllValues.

Query pattern 4: Top 5 request trends

Elasticsearch can use a Group By size value to keep the most frequent fields. The CLS source example uses having with a nested subquery:

* | select histogram(cast(__TIMESTAMP__ as timestamp), interval 1 hour) as analytic_time,
  "action",
  count(*) as count
group by analytic_time, "action"
having "action" in (
  select action group by action order by count(*) desc limit 5
)
order by analytic_time limit 1000

The result is a five-line trend chart.

Query pattern 5: latency buckets

The source article gives a wide-use panel for request latency segments. It counts requests in multiple latency bands with one CLS SQL statement:

urlPath:$path AND region:$region AND action:$action AND returnCode:$returnCode
| select
  histogram(cast(__TIMESTAMP__ as timestamp), interval 1 minute) as analytic_time,
  count_if(timeCost <= 200) as "0~500ms",
  count_if(500 < timeCost and timeCost <= 2000) as "500ms~2s",
  count_if(2000 < timeCost and timeCost <= 5000) as "2s~5s",
  count_if(5000 < timeCost) as "超过5s"
group by analytic_time
order by analytic_time
limit 1000

The same source article then shows a percentile-oriented query with approx_percentile:

urlPath:$path AND region:$region AND action:$action AND returnCode:$returnCode
| select
  time_series(__TIMESTAMP__, '$__interval', '%Y-%m-%dT%H:%i:%s+08:00', '0') as time,
  avg(timeCost) as avg,
  approx_percentile(timeCost, 0.50) as P50,
  approx_percentile(timeCost, 0.90) as P90,
  approx_percentile(timeCost, 0.95) as P95
group by time
order by time
limit 10000

Query pattern 6: migrate Grafana query variables

Constant variables and textbox variables behave the same across data sources. The source article focuses on Query-type variables.

For the $action variable, the Elasticsearch version describes the action field by DSL. In CLS, the user selects Log Service and the target log topic, then enters the corresponding SQL statement.

The article also shows a Cloud Monitor resource query for listing log topics:

Namespace=QCE/CLS&Action=DescribeInstances&Region=$region&display=${TopicName}/${TopicId}

Query pattern 7: combine log topics across regions

Some teams previously stored all data in one Elasticsearch instance, but create multiple CLS log topics by region after migration. The source article shows querying multiple regions and then using Grafana Transform to combine the results.

Migration checklist

Install and enable the signed Tencent Cloud Monitor plugin.
Add CLS as a Grafana data source with Tencent Cloud credentials and Log Service enabled.
Translate Elasticsearch Lucene filters to CLS Lucene filters.
Move aggregations into the SQL section after the | pipe.
Recreate dashboards by mapping count, raw logs, error-code aggregation, Top N trends, latency buckets, percentiles, variables, and multi-region transforms.
Keep existing Grafana panels when the visualization intent is still valid; change the query layer instead of rebuilding the whole dashboard stack.

DEV Community