DEV Community

Parse Complex Logs at Collection Time with CLS LogListener Pipelines

Complex production logs rarely arrive in one clean format. A single line may start with a timestamp, continue with delimiter-separated fields, embed a JSON object, and end with extra wrapper text. If the collector can only apply one parser, the downstream log platform receives either a lossy structure or a raw string that operators must clean later.

The source article presents Tencent Cloud CLS LogListener composite parsing as a collector-side pipeline for this situation. LogListener can run one or more processors in sequence, so a log line can be split, decoded, filtered, enriched, and reassembled before it is uploaded to CLS.

When composite parsing is useful

Composite parsing is designed for three source-backed scenarios:

Scenario What happens in the log line What the pipeline does
Multiple parsing modes are needed One part is delimiter-separated, while another part is JSON or key-value text Apply different processors to different fields after the first split
Some fields need post-processing Parsed fields include values that should be dropped, renamed, or supplemented Run processors such as field dropping or metadata extraction
Both patterns appear together A line needs multiple extraction steps and field-level transformation Chain processors in order inside the LogListener configuration

The flow is simple: split the original log into segments, process each segment with the right processor, and output only the fields that should become structured log content.

Processor map from the source article

The source screenshot includes the available processors for this workflow:

Function Processor Source-backed use
Extract fields processor_log_string Multi-character line parsing, usually for advanced single-line logs
Extract fields processor_multiline First-line regex parsing for multiline logs
Extract fields processor_multiline_fullregex First-line regex plus full multiline regex extraction
Extract fields processor_fullregex Regex extraction for a single-line field
Extract fields processor_json Expand a field value as JSON
Extract fields processor_split_delimiter Split fields by one or more delimiter characters
Extract fields processor_split_key_value Extract key-value pairs
Process fields processor_drop Drop selected fields
Process fields processor_timeformat Parse a source time field, convert the time format, and set the log timestamp

Pattern 1: drop fields before upload

If a raw log contains three key-value pairs but only key2 is useful, the source article uses processor_drop to remove key1 and key3.

Input:

key1:value1
key2:value2
key3:value3
Enter fullscreen mode Exit fullscreen mode

LogListener configuration:

{
  "processors": [
    {
      "type": "processor_drop",
      "detail": {
        "Sourcekey": ["key1", "key3"]
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Output:

key2:value2
Enter fullscreen mode Exit fullscreen mode

This is the cheapest kind of log optimization: reduce payload size and storage cost by removing fields that do not need to be indexed or analyzed.

Pattern 2: enrich logs from metadata

The article also shows a metadata-enrichment case. A log body such as value1,value2 is collected from a file path, and the collector extracts ownership fields from that path. The source notes that meta_processor requires LogListener 2.7.4 or later.

Input:

value1,value2
Enter fullscreen mode Exit fullscreen mode

Path:

/usr/local/loglistener-2.7.4/testdir/test.log
Enter fullscreen mode Exit fullscreen mode

Configuration shape:

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["msg1", "msg2"]
      }
    },
    {
      "type": "meta_processor",
      "detail": {
        "ExtractKeys": ["FILENAME"]
      },
      "processors": [
        {
          "type": "processor_fullregex",
          "detail": {
            "KeepSource": false,
            "SourceKey": "FILENAME",
            "ExtractRegex": "/\\w+/\\w+/(\\w+)-([^/]+)/(\\w+)/(\\w+).*",
            "ExtractKeys": ["app", "ver", "logdir", "logname"]
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Output fields shown by the source:

msg1:value1
msg2:value2
__TAG__.app: loglistener
__TAG__.ver: 2.7.4
__TAG__.logname: test
__TAG__.logdir: testdir
Enter fullscreen mode Exit fullscreen mode

Pattern 3: parse nested fields with child processors

The custom parsing example starts with one comma-separated line:

1571394459,http://127.0.0.1/my/course/4|10.135.46.111|200,status:DEAD,
Enter fullscreen mode Exit fullscreen mode

The pipeline first splits the line into time, msg1, and msg2. Child processors then convert the Unix timestamp, split msg1 by |, and parse msg2 as key-value content.

{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["time", "msg1", "msg2"]
      },
      "processors": [
        {
          "type": "processor_timeformat",
          "detail": {
            "KeepSource": true,
            "TimeFormat": "%s",
            "SourceKey": "time"
          }
        },
        {
          "type": "processor_split_delimiter",
          "detail": {
            "KeepSource": false,
            "Delimiter": "|",
            "SourceKey": "msg1",
            "ExtractKeys": ["submsg1", "submsg2", "submsg3"]
          }
        },
        {
          "type": "processor_split_key_value",
          "detail": {
            "KeepSource": false,
            "Delimiter": ":",
            "SourceKey": "msg2"
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Output:

time: 1571394459
submsg1: http://127.0.0.1/my/course/4
submsg2: 10.135.46.111
submsg3: 200
status: DEAD
Enter fullscreen mode Exit fullscreen mode

Pattern 4: unwrap a real access log with JSON inside

The final source example is a slash-wrapped log line:

2016-01-02 12:59:59/log_start/{"remote_ip":"10.135.46.111","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}/log_end/
Enter fullscreen mode Exit fullscreen mode

Expected behavior:

  1. Split the log by / into five segments.
  2. Keep the first segment as time.
  3. Drop the wrapper fields.
  4. Expand the JSON segment.
{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "KeepSource": false,
        "Delimiter": "/",
        "ExtractKeys": ["time", "msg2", "msg3", "msg4", "msg5"]
      },
      "processors": [
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg2"
          }
        },
        {
          "type": "processor_json",
          "detail": {
            "KeepSource": false,
            "SourceKey": "msg3"
          }
        },
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg4"
          }
        },
        {
          "type": "processor_drop",
          "detail": {
            "SourceKey": "msg5"
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The final structured fields include time, agent, body_sent, http_host, method, referer, remote_ip, request, response_code, responsetime, upstreamhost, upstreamtime, url, and xff.

Practical checklist

  • Use composite parsing when a single parser cannot express the source log format.
  • Keep the first split simple, then apply child processors to specific fields.
  • Drop wrapper or low-value fields before upload when they are not needed for search or analysis.
  • Convert image-only configuration examples into selectable JSON so future operators can copy and review them.
  • Keep the pipeline order explicit; LogListener executes processor configuration in sequence.

Top comments (0)