Marshaling sounds like a solved problem.
You call json.Marshal or proto.Marshal, send the bytes across the wire, and move on with your life.
But once you hit real load — tens of thousands of messages per second, strict p95 budgets, or aggressive CPU constraints — marshaling becomes one of the biggest sources of latency, garbage, and inefficiency.
I didn’t believe it at first either.
Then I profiled a production system and saw 20–40% of CPU time spent on serialization alone.
In this final article of the series, I’ll walk through every marshaling strategy that actually matters, why it works, where it fails, and how to choose the right approach depending on your system’s requirements.
Let’s get into it.
1. The Truth About Marshaling: It’s Always on the Hot Path
You can usually optimize:
- DB queries
- cache lookups
- goroutine pools
- handlers
…but marshaling happens every single time you:
- respond to a client
- publish an event
- log structured data
- serialize to Redis
- write to Kafka or Redpanda
- store snapshots
Marshaling is unavoidable — which makes it one of the highest-ROI optimizations.
When we optimized it in a highload Go service, we saw:
- p95 latency: −40%
- CPU: −22%
- GC pauses: much smoother
- memory footprint: reduced
- node count: went from 10 → 7
All from marshaling improvements.
2. Strategy #1 — “Faster JSON” (But Still JSON)
A lot of teams start here because:
- JSON is universal
- JSON is easy
- JSON works
- JSON has tooling everywhere
And yet, encoding/json is painfully slow.
Reflection-heavy. Alloc-heavy. Predictably unpredictable.
If you must keep JSON, these are your options:
Option A: jsoniter (drop-in, fastest JSON)
import jsoniter "github.com/json-iterator/go"
var json = jsoniter.ConfigCompatibleWithStandardLibrary
data, _ := json.Marshal(v)
Pros:
- drop-in replacement
- easy adoption
- faster than stdlib
Cons:
- not the fastest JSON possible
- still allocates
- still string-based
Good for: APIs, moderate load.
Option B: easyjson (codegen, zero reflection)
Generate struct-specific serializers:
easyjson -all model.go
Pros:
- massive speed increase (up to 2–3x)
- near-zero reflection
- fewer allocations
- stable, widely used
Cons:
- code generator boilerplate
- must remember to re-generate
Good for: High-performance JSON systems.
Option C: gojay (streaming-based, insane speed)
For very large JSON payloads, gojay outperforms everyone.
Good for: Huge arrays, logs, bulk data.
3. Strategy #2 — Switch to Binary Formats
JSON is text. Text is slow.
Binary formats solve the problem from both ends:
- smaller payload
- faster encode/decode
The two strongest contenders:
MessagePack
import "github.com/vmihailenco/msgpack/v5"
data, _ := msgpack.Marshal(v)
Pros:
- 3× faster than JSON
- schema-flexible
- smaller payloads
- drop-in over JSON
Cons:
- still not the fastest
- still allocates
- requires consumer compatibility
Protobuf
data, _ := proto.Marshal(msg)
Pros:
- 5–10× faster than JSON
- smallest payloads
- strongly typed
- versioning support
- industry standard
Cons:
- requires .proto
- must maintain schemas
- learning curve
Good for: Microservices, highload, RPC, messaging.
$ 4. Strategy #3 — Code Generation
Codegen is the “safe” way to get the performance of unsafe/low-level code without using unsafe.
Types of codegen serializers:
- easyjson (JSON)
- ffjson (JSON)
- msgp (MessagePack codegen)
- protoc (Protobuf)
- flatbuffers
- capnproto
Advantages:
- no reflection
- predictable performance
- fewer allocations
- static typing
- extremely fast
Disadvantages:
- codegen step
- build complexity
- more generated code to audit
If you want maximum speed without unsafe, this is the way.
5. Strategy #4 — Buffer Reuse and Preallocation
Reflection is slow — but allocation is worse.
Most serialization libraries allocate temporary buffers on each call.
You can eliminate that by reusing buffers.
Example:
buf := make([]byte, 0, 1024)
encoder := json.NewEncoder(bytes.NewBuffer(buf))
encoder.Encode(v)
Results:
- fewer allocations
- fewer GC cycles
- more stable tail latencies
This is a free optimization that many teams miss.
6. Strategy #5 — Zero-Copy Techniques (Unsafe, Advanced)
This strategy delivers the most brutal performance improvements.
Example: convert string <-> []byte without copying:
func BytesToString(b []byte) string {
return unsafe.String(&b[0], len(b))
}
Or reinterpret struct data:
h := (*Header)(unsafe.Pointer(&buf[0]))
You eliminate entire memory copies.
But you pay with safety.
Use only when:
- data is immutable
- you understand Go’s memory model
- you control the lifecycle of buffers
- you benchmarked gains
Unsafe can reduce CPU by 10–40% in serialization-heavy paths.
7. Strategy #6 — Custom Marshaling (Manual or Semi-Manual)
Sometimes you need full control.
Example: hand-written binary marshaler:
func (t *Trade) MarshalBinary() []byte {
b := make([]byte, 16)
binary.LittleEndian.PutUint64(b[0:], math.Float64bits(t.Price))
binary.LittleEndian.PutUint64(b[8:], uint64(t.Quantity))
return b
}
Advantages:
- fastest possible
- smallest payload
- fully deterministic
- zero reflection
Disadvantages:
- extremely verbose
- must maintain manually
- brittle
- requires deep understanding
Use only for ultra-critical hot paths.
8. Strategy #7 — Streaming Marshaling for Large Payloads
For large responses (10 KB+), use streaming.
enc := json.NewEncoder(w)
enc.Encode(v)
This avoids:
- giant temporary buffers
- multi-step copying
- unnecessary heap pressure
Streaming regularly reduces large-response latency by 10–25%.
9. Strategy #8 — Removing Fields (Payload Hygiene)
The easiest and most underrated optimization:
Reduce payload size.
When we audited our payloads, we found:
- 25–30% of fields were unused
- 10–15% were redundant
- 5–10% could be computed on client side
Removing junk:
- shrank payloads
- reduced CPU
- reduced latency
- improved UX
This is the highest ROI improvement after switching formats.
10. Real Benchmarks from Production
When we implemented these techniques (in stages), total improvements were:
- CPU: −28%
- p95 latency: 6.7ms → 2.4ms
- p50 latency: 2.4ms → 1.1ms
- allocs/op: down 40–70%
- network cost: −20–60% depending on format
All without touching the database or main logic.
Serialization alone made these gains.
11. Choosing the Right Strategy
Here’s the decision tree we use in real projects.
If you must keep JSON
Use jsoniter or easyjson.
If you want a fast drop-in binary format
Use MessagePack.
If you're designing microservices
Use Protobuf (ideal).
If you want maximum control
Use codegen or manual marshalers.
If you need extreme performance
Use unsafe zero-copy + custom binary format.
If payloads are massive
Use streaming.
If everything is slow
Start by removing fields.
12. Senior-Level Takeaways
- Marshaling is a major performance bottleneck in most systems.
- JSON is the slowest option, but JSON libs vary drastically in speed.
- Binary formats are the fastest real-world choice.
- Codegen is the sweet spot between safety and speed.
- Buffer reuse is essential for stable latency.
- Unsafe can be used safely, but only by experts.
- Manual marshaling is unbeatable when done well.
- Reducing payload size improves everything across the board.
High-performance marshaling is the combination of:
the right format + the right strategy + the right level of control.
13. Want to Go Deeper?
These Educative courses shaped the way I think about performance, distributed systems, and low-level engineering:
Highly recommended for backend engineers working at scale.

Top comments (0)