DEV Community

Cover image for Parsing Mixed-Format Logs with CLS LogListener Pipelines
Tencent Cloud -Cloud Log Service
Tencent Cloud -Cloud Log Service

Posted on • Edited on

Parsing Mixed-Format Logs with CLS LogListener Pipelines

Mixed-format logs are hard to query when collection treats each line as one raw string. A single record can include a timestamp, delimiter-separated fields, embedded JSON, key-value text, wrapper markers, and metadata hidden in the file path. Tencent Cloud Log Service (CLS) uses LogListener composite parsing to structure those logs during collection. You define a processor pipeline, run extraction and transformation steps in order, and upload cleaner structured logs to a CLS log topic.

Best fit for LogListener composite parsing

Use composite parsing when one parsing mode is not enough for the log line.

Scenario What happens in the raw log Pipeline operation
Multiple formats in one line A delimiter split reveals a field that contains JSON or key-value text Split first, then parse the nested field
Fields need cleanup Parsed fields include data that should not be uploaded Drop, rename, or enrich fields before upload
A full record needs several stages Extraction and field transformation must happen together Chain processors in the LogListener settings

The practical result is lower query-time cleanup. The log topic receives fields that are already closer to the structure operators need.

Collection-time parsing flow

The workflow is a three-step pipeline:

  1. Split the original log into segments.
  2. Apply different processors to the segments that need additional parsing.
  3. Assemble the final structured fields for upload to CLS.

For example, a log can be split into four sections, with each section processed by a different plugin before the final field set is uploaded.

Useful LogListener processors for mixed logs

The source workflow lists extraction processors and field-processing processors.

Function Processor Use case
Extract fields processor_log_string Parse advanced single-line logs with multiple characters
Extract fields processor_multiline Parse multiline logs with first-line regex
Extract fields processor_multiline_fullregex Combine first-line regex with full multiline regex extraction
Extract fields processor_fullregex Extract fields from a single-line value with regex
Extract fields processor_json Expand a field value as JSON
Extract fields processor_split_delimiter Split fields by delimiter characters
Extract fields processor_split_key_value Extract key-value pairs
Process fields processor_drop Remove selected fields
Process fields processor_timeformat Parse a source time field and set the log timestamp

Dropping unneeded fields before upload

Use processor_drop when a parsed log contains fields that do not need to be reported.

Input:

key1:value1
key2:value2
key3:value3
Enter fullscreen mode Exit fullscreen mode

Goal:

key2:value2
Enter fullscreen mode Exit fullscreen mode

Configuration:

{
  "processors": [
    {
      "type": "processor_drop",
      "detail": {
        "Sourcekey": ["key1", "key3"]
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Problem -> operation -> result:

Problem Operation Result
The log contains fields that are not important for search or analysis Drop key1 and key3 Only key2:value2 is uploaded

File-path metadata enrichment

Sometimes the log body lacks context that exists in the file path. For example, a path can include application, version, directory, and log name details.

Input log:

value1,value2
Enter fullscreen mode Exit fullscreen mode

File path:

/usr/local/loglistener-2.7.4/testdir/test.log
Enter fullscreen mode Exit fullscreen mode

Goal:

msg1:value1
msg2:value2
__TAG__.app: loglistener
__TAG__.ver: 2.7.4
__TAG__.logname: test
__TAG__.logdir: testdir
Enter fullscreen mode Exit fullscreen mode

Configuration:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["msg1", "msg2"]
      }
    },
    {
      "type": "meta_processor",
      "detail": {
        "ExtractKeys": ["FILENAME"]
      },
      "processors": [
        {
          "type": "processor_fullregex",
          "detail": {
            "KeepSource": false,
            "SourceKey": "FILENAME",
            "ExtractRegex": "/\\w+/\\w+/(\\w+)-([^/]+)/(\\w+)/(\\w+).*",
            "ExtractKeys": ["app", "ver", "logdir", "logname"]
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The metadata processor requires LogListener 2.7.4 or later in this workflow.

Nested parsing for delimiter, timestamp, and key-value fields

Use nested processors when one delimiter split exposes fields that need their own parsing.

Input:

1571394459,http://127.0.0.1/my/course/4|10.135.46.111|200,status:DEAD,
Enter fullscreen mode Exit fullscreen mode

Goal:

time: 1571394459
submsg1: http://127.0.0.1/my/course/4
submsg2: 10.135.46.111
submsg3: 200
status: DEAD
Enter fullscreen mode Exit fullscreen mode

Configuration:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["time", "msg1", "msg2"]
      },
      "processors": [
        {
          "type": "processor_timeformat",
          "detail": {
            "KeepSource": true,
            "TimeFormat": "%s",
            "SourceKey": "time"
          }
        },
        {
          "type": "processor_split_delimiter",
          "detail": {
            "KeepSource": false,
            "Delimiter": "|",
            "SourceKey": "msg1",
            "ExtractKeys": ["submsg1", "submsg2", "submsg3"]
          }
        },
        {
          "type": "processor_split_key_value",
          "detail": {
            "KeepSource": false,
            "Delimiter": ":",
            "SourceKey": "msg2"
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This pipeline first splits the line by comma, parses time, splits msg1 by |, and extracts status from msg2.

Wrapper removal and embedded JSON expansion

Consider an access log wrapped with markers:

2016-01-02 12:59:59/log_start/{"remote_ip":"10.135.46.111","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}/log_end/
Enter fullscreen mode Exit fullscreen mode

Goal:

  • Keep the timestamp.
  • Remove log_start, log_end, and the empty tail field.
  • Expand the JSON object into searchable CLS fields.

Configuration:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "KeepSource": false,
        "Delimiter": "/",
        "ExtractKeys": [
          "time",
          "msg2",
          "msg3",
          "msg4",
          "msg5"
        ]
      },
      "processors": [
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg2"
          }
        },
        {
          "type": "processor_json",
          "detail": {
            "KeepSource": false,
            "SourceKey": "msg3"
          }
        },
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg4"
          }
        },
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg5"
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Structured output:

time: 2016/01/02 12:59:59
agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
body_sent: 23
http_host: 127.0.0.1
method: POST
referer: http://127.0.0.1/my/course/4
remote_ip: 10.135.46.111
request: POST /event/dispatch HTTP/1.1
response_code: 200
responsetime: 0.232
upstreamhost: unix:/tmp/php-cgi.sock
upstreamtime: 0.232
url: /event/dispatch
xff: -
Enter fullscreen mode Exit fullscreen mode

Common pitfalls when building LogListener parsing pipelines

  • Parsing JSON before isolating the JSON field.
  • Keeping wrapper fields such as log_start and log_end even though they have no search value.
  • Forgetting KeepSource: false when the raw field should not remain in the final log body.
  • Using one parser for a log line that clearly contains several nested formats.
  • Using metadata extraction without checking the LogListener version.

FAQ

When should I use processor_json in a LogListener pipeline?

Use processor_json after the JSON content has been isolated into a field. In the wrapped access-log example, the pipeline first splits the record by /, then applies JSON parsing to msg3.

How do I remove fields from logs before they reach CLS?

Use processor_drop and point it to the field that should not be uploaded. This is useful for wrapper markers, empty fields, and values that do not help search or analysis.

Can LogListener parse timestamps during collection?

Yes. Use processor_timeformat with the source time field and the expected time format, such as %s for a Unix timestamp.

How can I add file path context to logs?

Use metadata extraction on FILENAME, then run regex extraction to populate fields such as app, version, log directory, or log name.

Final checklist

  • Identify the first split that separates the main log parts.
  • Apply nested processors only to the fields that need them.
  • Drop wrapper or empty fields before upload.
  • Parse time fields with the expected format.
  • Expand JSON only after isolating the JSON field.
  • Verify the final CLS log record contains searchable fields instead of raw mixed-format text.

Top comments (0)