DEV Community

Robin Moffatt
Robin Moffatt

Posted on • Originally published at rmoff.net on

Viewing Kafka messages bytes as hex

I’ve been playing around with the new SerDes (serialisers/deserialisers) that shipped with Confluent Platform 5.5 - Protobuf, and JSON Schema (these were added to the existing support for Avro). The serialisers (and associated Kafka Connect converters) take a payload and serialise it into bytes for sending to Kafka, and I was interested in what those bytes look like. For that I used my favourite Kafka swiss-army knife: kafkacat.

Here’s a message serialised to JSON Schema:

$ kafkacat -b kafka:29092 -t pageviews-js -C -c1

{"viewtime":1,"userid":"User_9","pageid":"Page_57"}
Enter fullscreen mode Exit fullscreen mode

Looks just like a message from another topic serialised as regular JSON, right?

$ kafkacat -b kafka:29092 -t pageviews-j -C -c1

{"viewtime":1,"userid":"User_3","pageid":"Page_77"}
Enter fullscreen mode Exit fullscreen mode

Except it’s not! We can confirm this by looking at the raw bytes on the message itself by piping the output from kafkacat into hexdump.

Check out these magical, pesky, bytes on the front of the JSON Schema-encoded message, and note that they’re not there on the JSON message:

$ kafkacat -b kafka:29092 -t pageviews-js -C -c1 | hexdump -C

00000000 00 00 00 00 02 7b 22 76 69 65 77 74 69 6d 65 22 |.....{"viewtime"|
00000010 3a 31 2c 22 75 73 65 72 69 64 22 3a 22 55 73 65 |:1,"userid":"Use|
00000020 72 5f 39 22 2c 22 70 61 67 65 69 64 22 3a 22 50 |r_9","pageid":"P|
00000030 61 67 65 5f 35 37 22 7d 0a |age_57"}.|
00000039

$ kafkacat -b kafka:29092 -t pageviews-j -C -c1 | hexdump -C

00000000 7b 22 76 69 65 77 74 69 6d 65 22 3a 31 2c 22 75 |{"viewtime":1,"u|
00000010 73 65 72 69 64 22 3a 22 55 73 65 72 5f 33 22 2c |serid":"User_3",|
00000020 22 70 61 67 65 69 64 22 3a 22 50 61 67 65 5f 37 |"pageid":"Page_7|
00000030 37 22 7d 0a |7"}.|
00000034
Enter fullscreen mode Exit fullscreen mode

The five extra bytes (00 00 00 00 02) are defined in the wire format used by the Schema Registry serdes:

  • Byte 0 : Magic Byte - Confluent serialization format version number; currently always 0.

  • Bytes 1-4 : 4-byte schema ID as returned by Schema Registry.

Top comments (0)