DEV Community

Pavel Sanikovich
Pavel Sanikovich

Posted on

High-Performance Marshaling Strategies in Go — What Actually Works at Scale

Marshaling sounds like a solved problem.
You call json.Marshal or proto.Marshal, send the bytes across the wire, and move on with your life.

But once you hit real load — tens of thousands of messages per second, strict p95 budgets, or aggressive CPU constraints — marshaling becomes one of the biggest sources of latency, garbage, and inefficiency.

I didn’t believe it at first either.
Then I profiled a production system and saw 20–40% of CPU time spent on serialization alone.

In this final article of the series, I’ll walk through every marshaling strategy that actually matters, why it works, where it fails, and how to choose the right approach depending on your system’s requirements.

Let’s get into it.


1. The Truth About Marshaling: It’s Always on the Hot Path

You can usually optimize:

  • DB queries
  • cache lookups
  • goroutine pools
  • handlers

…but marshaling happens every single time you:

  • respond to a client
  • publish an event
  • log structured data
  • serialize to Redis
  • write to Kafka or Redpanda
  • store snapshots

Marshaling is unavoidable — which makes it one of the highest-ROI optimizations.

When we optimized it in a highload Go service, we saw:

  • p95 latency: −40%
  • CPU: −22%
  • GC pauses: much smoother
  • memory footprint: reduced
  • node count: went from 10 → 7

All from marshaling improvements.


2. Strategy #1 — “Faster JSON” (But Still JSON)

A lot of teams start here because:

  • JSON is universal
  • JSON is easy
  • JSON works
  • JSON has tooling everywhere

And yet, encoding/json is painfully slow.
Reflection-heavy. Alloc-heavy. Predictably unpredictable.

If you must keep JSON, these are your options:


Option A: jsoniter (drop-in, fastest JSON)

import jsoniter "github.com/json-iterator/go"

var json = jsoniter.ConfigCompatibleWithStandardLibrary

data, _ := json.Marshal(v)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • drop-in replacement
  • easy adoption
  • faster than stdlib

Cons:

  • not the fastest JSON possible
  • still allocates
  • still string-based

Good for: APIs, moderate load.


Option B: easyjson (codegen, zero reflection)

Generate struct-specific serializers:

easyjson -all model.go
Enter fullscreen mode Exit fullscreen mode

Pros:

  • massive speed increase (up to 2–3x)
  • near-zero reflection
  • fewer allocations
  • stable, widely used

Cons:

  • code generator boilerplate
  • must remember to re-generate

Good for: High-performance JSON systems.


Option C: gojay (streaming-based, insane speed)

For very large JSON payloads, gojay outperforms everyone.

Good for: Huge arrays, logs, bulk data.


3. Strategy #2 — Switch to Binary Formats

JSON is text. Text is slow.

Binary formats solve the problem from both ends:

  • smaller payload
  • faster encode/decode

The two strongest contenders:


MessagePack

import "github.com/vmihailenco/msgpack/v5"

data, _ := msgpack.Marshal(v)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • 3× faster than JSON
  • schema-flexible
  • smaller payloads
  • drop-in over JSON

Cons:

  • still not the fastest
  • still allocates
  • requires consumer compatibility

Protobuf

data, _ := proto.Marshal(msg)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • 5–10× faster than JSON
  • smallest payloads
  • strongly typed
  • versioning support
  • industry standard

Cons:

  • requires .proto
  • must maintain schemas
  • learning curve

Good for: Microservices, highload, RPC, messaging.


$ 4. Strategy #3 — Code Generation

Codegen is the “safe” way to get the performance of unsafe/low-level code without using unsafe.

Types of codegen serializers:

  • easyjson (JSON)
  • ffjson (JSON)
  • msgp (MessagePack codegen)
  • protoc (Protobuf)
  • flatbuffers
  • capnproto

Advantages:

  • no reflection
  • predictable performance
  • fewer allocations
  • static typing
  • extremely fast

Disadvantages:

  • codegen step
  • build complexity
  • more generated code to audit

If you want maximum speed without unsafe, this is the way.


5. Strategy #4 — Buffer Reuse and Preallocation

Reflection is slow — but allocation is worse.

Most serialization libraries allocate temporary buffers on each call.

You can eliminate that by reusing buffers.

Example:

buf := make([]byte, 0, 1024)
encoder := json.NewEncoder(bytes.NewBuffer(buf))
encoder.Encode(v)
Enter fullscreen mode Exit fullscreen mode

Results:

  • fewer allocations
  • fewer GC cycles
  • more stable tail latencies

This is a free optimization that many teams miss.


6. Strategy #5 — Zero-Copy Techniques (Unsafe, Advanced)

This strategy delivers the most brutal performance improvements.

Example: convert string <-> []byte without copying:

func BytesToString(b []byte) string {
    return unsafe.String(&b[0], len(b))
}
Enter fullscreen mode Exit fullscreen mode

Or reinterpret struct data:

h := (*Header)(unsafe.Pointer(&buf[0]))
Enter fullscreen mode Exit fullscreen mode

You eliminate entire memory copies.
But you pay with safety.

Use only when:

  • data is immutable
  • you understand Go’s memory model
  • you control the lifecycle of buffers
  • you benchmarked gains

Unsafe can reduce CPU by 10–40% in serialization-heavy paths.


7. Strategy #6 — Custom Marshaling (Manual or Semi-Manual)

Sometimes you need full control.

Example: hand-written binary marshaler:

func (t *Trade) MarshalBinary() []byte {
    b := make([]byte, 16)
    binary.LittleEndian.PutUint64(b[0:], math.Float64bits(t.Price))
    binary.LittleEndian.PutUint64(b[8:], uint64(t.Quantity))
    return b
}
Enter fullscreen mode Exit fullscreen mode

Advantages:

  • fastest possible
  • smallest payload
  • fully deterministic
  • zero reflection

Disadvantages:

  • extremely verbose
  • must maintain manually
  • brittle
  • requires deep understanding

Use only for ultra-critical hot paths.


8. Strategy #7 — Streaming Marshaling for Large Payloads

For large responses (10 KB+), use streaming.

enc := json.NewEncoder(w)
enc.Encode(v)
Enter fullscreen mode Exit fullscreen mode

This avoids:

  • giant temporary buffers
  • multi-step copying
  • unnecessary heap pressure

Streaming regularly reduces large-response latency by 10–25%.


9. Strategy #8 — Removing Fields (Payload Hygiene)

The easiest and most underrated optimization:

Reduce payload size.

When we audited our payloads, we found:

  • 25–30% of fields were unused
  • 10–15% were redundant
  • 5–10% could be computed on client side

Removing junk:

  • shrank payloads
  • reduced CPU
  • reduced latency
  • improved UX

This is the highest ROI improvement after switching formats.


10. Real Benchmarks from Production

When we implemented these techniques (in stages), total improvements were:

  • CPU: −28%
  • p95 latency: 6.7ms → 2.4ms
  • p50 latency: 2.4ms → 1.1ms
  • allocs/op: down 40–70%
  • network cost: −20–60% depending on format

All without touching the database or main logic.

Serialization alone made these gains.


11. Choosing the Right Strategy

Here’s the decision tree we use in real projects.


If you must keep JSON

Use jsoniter or easyjson.

If you want a fast drop-in binary format

Use MessagePack.

If you're designing microservices

Use Protobuf (ideal).

If you want maximum control

Use codegen or manual marshalers.

If you need extreme performance

Use unsafe zero-copy + custom binary format.

If payloads are massive

Use streaming.

If everything is slow

Start by removing fields.


12. Senior-Level Takeaways

  • Marshaling is a major performance bottleneck in most systems.
  • JSON is the slowest option, but JSON libs vary drastically in speed.
  • Binary formats are the fastest real-world choice.
  • Codegen is the sweet spot between safety and speed.
  • Buffer reuse is essential for stable latency.
  • Unsafe can be used safely, but only by experts.
  • Manual marshaling is unbeatable when done well.
  • Reducing payload size improves everything across the board.

High-performance marshaling is the combination of:

the right format + the right strategy + the right level of control.


13. Want to Go Deeper?

These Educative courses shaped the way I think about performance, distributed systems, and low-level engineering:

Highly recommended for backend engineers working at scale.

Top comments (0)