Pavel Sanikovich

Posted on Nov 20

Reducing Serialization Lag in Highload Go Systems — What Actually Works in Production

#go #performance

There’s a special kind of pain only backend engineers know: everything in your service looks perfectly optimized — goroutines tight, DB tuned, caches warm — yet your p95 latency stubbornly stays above the SLA budget.

That was me two years ago.

We had a Go service pushing ~40k responses/sec at peak. CPU usage was rising faster than traffic. Latency graphs showed small but consistent spikes around serialization. At first I ignored them — “JSON is fine”, I told myself. “Encoding isn’t that expensive.”

I was wrong.

Serialization turned out to be one of the most underestimated sources of latency in Go systems. Fixing it didn’t just remove a bottleneck — it reduced our infrastructure cost by ~28%.

This article is the deep dive I wish I had back then.

🟢 1. The Silent Killer: Serialization on the Hot Path

Serialization is always on the critical path:

reading from DB → serialize to cache
writing to Kafka → serialize message
sending HTTP response → serialize to JSON
distributed systems → serialize across RPC
snapshotting → serialize to disk

Even small inefficiencies compound massively under load.

In our service, each serialization step was adding:

+0.4 ms p50
+1.2 ms p95

Multiply that by thousands of calls per second → you get a CPU furnace.

🟡 2. Where Serialization Lag Comes From (Real Causes)

Let’s break it down.

1) Reflection Overhead (JSON)

Go's encoding/json is fantastic for convenience… and terrible for performance.

Reflection dominates the flamegraph:

reflect.Value.Interface
reflect.Value.Field
encodeState.reflectValue

2) Excessive Allocations

Dynamic encoding = tons of small allocations → GC pressure → latency spikes.

3) String Encoding Costs

Converting everything to string (numbers, booleans, timestamps) is expensive.

4) Deep Struct Trees

Nested structs = recursive reflection = death by a thousand cuts.

5) Repeated Schema Discovery

JSON repeatedly re-discovers field names and tags.

6) Large Payloads

Serialization grows linearly with size — and sometimes worse.

🔵 3. Flamegraph Example: What We Saw in Production

This is a simplified mock of the actual flamegraph segment:

45% CPU: encoding/json.Marshal
    28% reflect.Value.Field
    7%  encodeState.string
    5%  encodeState.structEncoder
    4%  map iteration

This explains EVERYTHING:

Almost half of CPU load was just serializing.
Nothing else in the system consumed more CPU.

When serialization dominates, optimizing queries or handlers is meaningless.

🟣 4. Step-by-Step: How We Reduced Serialization Lag

Now — the real meat.
Here’s exactly what worked (in descending order of impact).

🔥 Step 1 — Swap JSON for a Faster Library

This alone gave us 40–55% improvement.

Options (best → good):

jsoniter (drop-in replacement, faster)
EasyJSON (code generation, zero reflection)
GoJay (stream-based, absurdly fast)

Why it works

Because you eliminate reflection or reduce it significantly.

Example

import jsoniter "github.com/json-iterator/go"

var json = jsoniter.ConfigCompatibleWithStandardLibrary

data, _ := json.Marshal(payload)

Latency drop: ~35–45%
CPU drop: ~30%

Almost no code changes.

🟢 Step 2 — Use Binary Formats (MsgPack / Protobuf)

The biggest structural win.

Switching to MessagePack or Protobuf improves:

encode speed
decode speed
payload size
GC pressure
memory locality

Gains we measured:

Format	Improvement
MessagePack	~3× faster serialization
Protobuf	~6× faster serialization

When to choose what:

MessagePack: quick win, minimal friction
Protobuf: long-term maximum performance

🟡 Step 3 — Preallocate Buffers

This is a trick few Go engineers use.

Avoid:

json.Marshal(v)

Prefer:

buf := make([]byte, 0, 1024)
enc := json.NewEncoder(bytes.NewBuffer(buf))

Or for protobuf/msgpack:

buf := make([]byte, 0, 512)
data, _ := codec.MarshalToSizedBuffer(buf, payload)

Why it works:
You remove reallocations → fewer copies → fewer GC cycles.

Typical improvement: 5–15%.

🔵 Step 4 — Remove Unnecessary Fields (The Ruthless Pass)

We audited the payload.

We asked:

“Does the client actually need this?”

Turns out 20–35% of fields were never used.

Removing unused data reduced encoding time by ~12% and network weight by ~25%.

This is the easiest optimization mentally — the hardest politically.

🟣 Step 5 — Avoid Encoding Large Collections Repeatedly

If a payload contains:

a big list
nested structures
computed metadata

…cache the encoded version.

Example:

Instead of computing:

for each request:
    json.Marshal(bigList)

Do:

cachedBytes.Store(key, encodedValue)

We saw 20% latency reduction on endpoints using heavy lists.

🔥 Step 6 — Use Streaming Encoders

For large responses (10KB+), use streaming:

encoder := json.NewEncoder(w)
encoder.Encode(object)

This prevents huge temporary buffers → improves memory locality.

Improvement: 10–20% on large payloads.

🧠 Step 7 — Flatten Structs

Deep nested structures cause:

recursive reflection
many allocations
excessive pointer chasing

Flattening them (denormalizing slightly) improved performance by 5–10%.

🧪 Step 8 — Avoid Interfaces in Hot Structs

Go’s serialization treats interface{} as a wildcard → slow path.

If your struct has:

Meta map[string]interface{}

You are sabotaging your own performance.

Replace with generics or typed maps.

📉 Actual Final Results After Applying All Steps

Our real improvements after a 3-week serialization refactor:

Metric	Before	After
p50 latency	~2.4 ms	~1.1 ms
p95 latency	~6.7 ms	~2.4 ms
CPU usage	baseline	−28%
GC pauses	frequent	rare
Node count	12	8

Those savings were worth thousands of dollars monthly.

Serialization was the hidden bottleneck all along.

🧠 Senior-Level Lessons Learned

Serialization lies on the exact hot path → always profile it.
JSON is never “free”.
Libraries matter more than you expect.
Buffer allocations are invisible killers.
Binary formats boost performance across the stack.
Schema control (protobuf) = speed + clarity + evolution.
Interfaces kill serialization speed.
CPU flamegraphs don’t lie.

🎯 Real Recommendations (Practical)

Best “drop-in” upgrade

→ jsoniter
Easiest, big gains, minimal risk.

Best long-term upgrade

→ Protobuf
For serious scalability.

Best mid-level win

→ MessagePack
Perfect when you want speed without .proto overhead.

Best structural improvement

→ removing fields, flattening structs, reducing nesting.

📚 If You Want to Go Deeper

These Educative courses shaped my understanding of performance and distributed systems:

Highly recommended (affiliate friendly).

DEV Community