Tencent Cloud -Cloud Log Service

Posted on Jun 10 • Edited on Jun 17

Parsing Mixed-Format Logs with CLS LogListener Pipelines

#observability #logging #cloud #devops

Mixed-format logs are hard to query when collection treats each line as one raw string. A single record can include a timestamp, delimiter-separated fields, embedded JSON, key-value text, wrapper markers, and metadata hidden in the file path. Tencent Cloud Log Service (CLS) uses LogListener composite parsing to structure those logs during collection. You define a processor pipeline, run extraction and transformation steps in order, and upload cleaner structured logs to a CLS log topic.

Best fit for LogListener composite parsing

Use composite parsing when one parsing mode is not enough for the log line.

Scenario	What happens in the raw log	Pipeline operation
Multiple formats in one line	A delimiter split reveals a field that contains JSON or key-value text	Split first, then parse the nested field
Fields need cleanup	Parsed fields include data that should not be uploaded	Drop, rename, or enrich fields before upload
A full record needs several stages	Extraction and field transformation must happen together	Chain processors in the LogListener settings

The practical result is lower query-time cleanup. The log topic receives fields that are already closer to the structure operators need.

Collection-time parsing flow

The workflow is a three-step pipeline:

Split the original log into segments.
Apply different processors to the segments that need additional parsing.
Assemble the final structured fields for upload to CLS.

For example, a log can be split into four sections, with each section processed by a different plugin before the final field set is uploaded.

Useful LogListener processors for mixed logs

The source workflow lists extraction processors and field-processing processors.

Function	Processor	Use case
Extract fields	`processor_log_string`	Parse advanced single-line logs with multiple characters
Extract fields	`processor_multiline`	Parse multiline logs with first-line regex
Extract fields	`processor_multiline_fullregex`	Combine first-line regex with full multiline regex extraction
Extract fields	`processor_fullregex`	Extract fields from a single-line value with regex
Extract fields	`processor_json`	Expand a field value as JSON
Extract fields	`processor_split_delimiter`	Split fields by delimiter characters
Extract fields	`processor_split_key_value`	Extract key-value pairs
Process fields	`processor_drop`	Remove selected fields
Process fields	`processor_timeformat`	Parse a source time field and set the log timestamp

Dropping unneeded fields before upload

Use processor_drop when a parsed log contains fields that do not need to be reported.

Input:

key1:value1
key2:value2
key3:value3

Goal:

key2:value2

Configuration:

{
  "processors": [
    {
      "type": "processor_drop",
      "detail": {
        "Sourcekey": ["key1", "key3"]
      }
    }
  ]
}

Problem -> operation -> result:

Problem	Operation	Result
The log contains fields that are not important for search or analysis	Drop `key1` and `key3`	Only `key2:value2` is uploaded

File-path metadata enrichment

Sometimes the log body lacks context that exists in the file path. For example, a path can include application, version, directory, and log name details.

Input log:

value1,value2

File path:

/usr/local/loglistener-2.7.4/testdir/test.log

Goal:

msg1:value1
msg2:value2
__TAG__.app: loglistener
__TAG__.ver: 2.7.4
__TAG__.logname: test
__TAG__.logdir: testdir

Configuration:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["msg1", "msg2"]
      }
    },
    {
      "type": "meta_processor",
      "detail": {
        "ExtractKeys": ["FILENAME"]
      },
      "processors": [
        {
          "type": "processor_fullregex",
          "detail": {
            "KeepSource": false,
            "SourceKey": "FILENAME",
            "ExtractRegex": "/\\w+/\\w+/(\\w+)-([^/]+)/(\\w+)/(\\w+).*",
            "ExtractKeys": ["app", "ver", "logdir", "logname"]
          }
        }
      ]
    }
  ]
}

The metadata processor requires LogListener 2.7.4 or later in this workflow.

Nested parsing for delimiter, timestamp, and key-value fields

Use nested processors when one delimiter split exposes fields that need their own parsing.

Input:

1571394459,http://127.0.0.1/my/course/4|10.135.46.111|200,status:DEAD,

Goal:

time: 1571394459
submsg1: http://127.0.0.1/my/course/4
submsg2: 10.135.46.111
submsg3: 200
status: DEAD

Configuration:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["time", "msg1", "msg2"]
      },
      "processors": [
        {
          "type": "processor_timeformat",
          "detail": {
            "KeepSource": true,
            "TimeFormat": "%s",
            "SourceKey": "time"
          }
        },
        {
          "type": "processor_split_delimiter",
          "detail": {
            "KeepSource": false,
            "Delimiter": "|",
            "SourceKey": "msg1",
            "ExtractKeys": ["submsg1", "submsg2", "submsg3"]
          }
        },
        {
          "type": "processor_split_key_value",
          "detail": {
            "KeepSource": false,
            "Delimiter": ":",
            "SourceKey": "msg2"
          }
        }
      ]
    }
  ]
}

This pipeline first splits the line by comma, parses time, splits msg1 by |, and extracts status from msg2.

Wrapper removal and embedded JSON expansion

Consider an access log wrapped with markers:

2016-01-02 12:59:59/log_start/{"remote_ip":"10.135.46.111","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}/log_end/

Goal:

Keep the timestamp.
Remove log_start, log_end, and the empty tail field.
Expand the JSON object into searchable CLS fields.

Configuration:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "KeepSource": false,
        "Delimiter": "/",
        "ExtractKeys": [
          "time",
          "msg2",
          "msg3",
          "msg4",
          "msg5"
        ]
      },
      "processors": [
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg2"
          }
        },
        {
          "type": "processor_json",
          "detail": {
            "KeepSource": false,
            "SourceKey": "msg3"
          }
        },
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg4"
          }
        },
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg5"
          }
        }
      ]
    }
  ]
}

Structured output:

time: 2016/01/02 12:59:59
agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
body_sent: 23
http_host: 127.0.0.1
method: POST
referer: http://127.0.0.1/my/course/4
remote_ip: 10.135.46.111
request: POST /event/dispatch HTTP/1.1
response_code: 200
responsetime: 0.232
upstreamhost: unix:/tmp/php-cgi.sock
upstreamtime: 0.232
url: /event/dispatch
xff: -

Common pitfalls when building LogListener parsing pipelines

Parsing JSON before isolating the JSON field.
Keeping wrapper fields such as log_start and log_end even though they have no search value.
Forgetting KeepSource: false when the raw field should not remain in the final log body.
Using one parser for a log line that clearly contains several nested formats.
Using metadata extraction without checking the LogListener version.

FAQ

When should I use `processor_json` in a LogListener pipeline?

Use processor_json after the JSON content has been isolated into a field. In the wrapped access-log example, the pipeline first splits the record by /, then applies JSON parsing to msg3.

How do I remove fields from logs before they reach CLS?

Use processor_drop and point it to the field that should not be uploaded. This is useful for wrapper markers, empty fields, and values that do not help search or analysis.

Can LogListener parse timestamps during collection?

Yes. Use processor_timeformat with the source time field and the expected time format, such as %s for a Unix timestamp.

How can I add file path context to logs?

Use metadata extraction on FILENAME, then run regex extraction to populate fields such as app, version, log directory, or log name.

Final checklist

Identify the first split that separates the main log parts.
Apply nested processors only to the fields that need them.
Drop wrapper or empty fields before upload.
Parse time fields with the expected format.
Expand JSON only after isolating the JSON field.
Verify the final CLS log record contains searchable fields instead of raw mixed-format text.

DEV Community

Parsing Mixed-Format Logs with CLS LogListener Pipelines

Best fit for LogListener composite parsing

Collection-time parsing flow

Useful LogListener processors for mixed logs

Dropping unneeded fields before upload

File-path metadata enrichment

Nested parsing for delimiter, timestamp, and key-value fields

Wrapper removal and embedded JSON expansion

Common pitfalls when building LogListener parsing pipelines

FAQ

When should I use `processor_json` in a LogListener pipeline?

How do I remove fields from logs before they reach CLS?

Can LogListener parse timestamps during collection?

How can I add file path context to logs?

Final checklist

Top comments (0)

Best fit for LogListener composite parsing

Collection-time parsing flow

Useful LogListener processors for mixed logs

Dropping unneeded fields before upload

File-path metadata enrichment

Nested parsing for delimiter, timestamp, and key-value fields

Wrapper removal and embedded JSON expansion

Common pitfalls when building LogListener parsing pipelines

FAQ

When should I use processor_json in a LogListener pipeline?

How do I remove fields from logs before they reach CLS?

Can LogListener parse timestamps during collection?

How can I add file path context to logs?

Final checklist

When should I use `processor_json` in a LogListener pipeline?