DEV Community

Cover image for Reducing Serialization Lag in Highload Go Systems — What Actually Works in Production
Pavel Sanikovich
Pavel Sanikovich

Posted on

Reducing Serialization Lag in Highload Go Systems — What Actually Works in Production

There’s a special kind of pain only backend engineers know: everything in your service looks perfectly optimized — goroutines tight, DB tuned, caches warm — yet your p95 latency stubbornly stays above the SLA budget.

That was me two years ago.

We had a Go service pushing ~40k responses/sec at peak. CPU usage was rising faster than traffic. Latency graphs showed small but consistent spikes around serialization. At first I ignored them — “JSON is fine”, I told myself. “Encoding isn’t that expensive.”

I was wrong.

Serialization turned out to be one of the most underestimated sources of latency in Go systems. Fixing it didn’t just remove a bottleneck — it reduced our infrastructure cost by ~28%.

This article is the deep dive I wish I had back then.


🟢 1. The Silent Killer: Serialization on the Hot Path

Serialization is always on the critical path:

  • reading from DB → serialize to cache
  • writing to Kafka → serialize message
  • sending HTTP response → serialize to JSON
  • distributed systems → serialize across RPC
  • snapshotting → serialize to disk

Even small inefficiencies compound massively under load.

In our service, each serialization step was adding:

  • +0.4 ms p50
  • +1.2 ms p95

Multiply that by thousands of calls per second → you get a CPU furnace.


🟡 2. Where Serialization Lag Comes From (Real Causes)

Let’s break it down.

1) Reflection Overhead (JSON)

Go's encoding/json is fantastic for convenience… and terrible for performance.

Reflection dominates the flamegraph:

reflect.Value.Interface
reflect.Value.Field
encodeState.reflectValue
Enter fullscreen mode Exit fullscreen mode

2) Excessive Allocations

Dynamic encoding = tons of small allocations → GC pressure → latency spikes.

3) String Encoding Costs

Converting everything to string (numbers, booleans, timestamps) is expensive.

4) Deep Struct Trees

Nested structs = recursive reflection = death by a thousand cuts.

5) Repeated Schema Discovery

JSON repeatedly re-discovers field names and tags.

6) Large Payloads

Serialization grows linearly with size — and sometimes worse.


🔵 3. Flamegraph Example: What We Saw in Production

This is a simplified mock of the actual flamegraph segment:

45% CPU: encoding/json.Marshal
    28% reflect.Value.Field
    7%  encodeState.string
    5%  encodeState.structEncoder
    4%  map iteration
Enter fullscreen mode Exit fullscreen mode

This explains EVERYTHING:

  • Almost half of CPU load was just serializing.
  • Nothing else in the system consumed more CPU.

When serialization dominates, optimizing queries or handlers is meaningless.


🟣 4. Step-by-Step: How We Reduced Serialization Lag

Now — the real meat.
Here’s exactly what worked (in descending order of impact).


🔥 Step 1 — Swap JSON for a Faster Library

This alone gave us 40–55% improvement.

Options (best → good):

  1. jsoniter (drop-in replacement, faster)
  2. EasyJSON (code generation, zero reflection)
  3. GoJay (stream-based, absurdly fast)

Why it works

Because you eliminate reflection or reduce it significantly.

Example

import jsoniter "github.com/json-iterator/go"

var json = jsoniter.ConfigCompatibleWithStandardLibrary

data, _ := json.Marshal(payload)
Enter fullscreen mode Exit fullscreen mode

Latency drop: ~35–45%
CPU drop: ~30%

Almost no code changes.


🟢 Step 2 — Use Binary Formats (MsgPack / Protobuf)

The biggest structural win.

Switching to MessagePack or Protobuf improves:

  • encode speed
  • decode speed
  • payload size
  • GC pressure
  • memory locality

Gains we measured:

Format Improvement
MessagePack ~3× faster serialization
Protobuf ~6× faster serialization

When to choose what:

  • MessagePack: quick win, minimal friction
  • Protobuf: long-term maximum performance

🟡 Step 3 — Preallocate Buffers

This is a trick few Go engineers use.

Avoid:

json.Marshal(v)
Enter fullscreen mode Exit fullscreen mode

Prefer:

buf := make([]byte, 0, 1024)
enc := json.NewEncoder(bytes.NewBuffer(buf))
Enter fullscreen mode Exit fullscreen mode

Or for protobuf/msgpack:

buf := make([]byte, 0, 512)
data, _ := codec.MarshalToSizedBuffer(buf, payload)
Enter fullscreen mode Exit fullscreen mode

Why it works:
You remove reallocations → fewer copies → fewer GC cycles.

Typical improvement: 5–15%.


🔵 Step 4 — Remove Unnecessary Fields (The Ruthless Pass)

We audited the payload.

We asked:

“Does the client actually need this?”

Turns out 20–35% of fields were never used.

Removing unused data reduced encoding time by ~12% and network weight by ~25%.

This is the easiest optimization mentally — the hardest politically.


🟣 Step 5 — Avoid Encoding Large Collections Repeatedly

If a payload contains:

  • a big list
  • nested structures
  • computed metadata

…cache the encoded version.

Example:

Instead of computing:

for each request:
    json.Marshal(bigList)
Enter fullscreen mode Exit fullscreen mode

Do:

cachedBytes.Store(key, encodedValue)
Enter fullscreen mode Exit fullscreen mode

We saw 20% latency reduction on endpoints using heavy lists.


🔥 Step 6 — Use Streaming Encoders

For large responses (10KB+), use streaming:

encoder := json.NewEncoder(w)
encoder.Encode(object)
Enter fullscreen mode Exit fullscreen mode

This prevents huge temporary buffers → improves memory locality.

Improvement: 10–20% on large payloads.


🧠 Step 7 — Flatten Structs

Deep nested structures cause:

  • recursive reflection
  • many allocations
  • excessive pointer chasing

Flattening them (denormalizing slightly) improved performance by 5–10%.


🧪 Step 8 — Avoid Interfaces in Hot Structs

Go’s serialization treats interface{} as a wildcard → slow path.

If your struct has:

Meta map[string]interface{}
Enter fullscreen mode Exit fullscreen mode

You are sabotaging your own performance.

Replace with generics or typed maps.


📉 Actual Final Results After Applying All Steps

Our real improvements after a 3-week serialization refactor:

Metric Before After
p50 latency ~2.4 ms ~1.1 ms
p95 latency ~6.7 ms ~2.4 ms
CPU usage baseline −28%
GC pauses frequent rare
Node count 12 8

Those savings were worth thousands of dollars monthly.

Serialization was the hidden bottleneck all along.


🧠 Senior-Level Lessons Learned

  • Serialization lies on the exact hot path → always profile it.
  • JSON is never “free”.
  • Libraries matter more than you expect.
  • Buffer allocations are invisible killers.
  • Binary formats boost performance across the stack.
  • Schema control (protobuf) = speed + clarity + evolution.
  • Interfaces kill serialization speed.
  • CPU flamegraphs don’t lie.

🎯 Real Recommendations (Practical)

Best “drop-in” upgrade

jsoniter
Easiest, big gains, minimal risk.

Best long-term upgrade

Protobuf
For serious scalability.

Best mid-level win

MessagePack
Perfect when you want speed without .proto overhead.

Best structural improvement

→ removing fields, flattening structs, reducing nesting.


📚 If You Want to Go Deeper

These Educative courses shaped my understanding of performance and distributed systems:

  1. Grokking the System Design Interview
  2. Golang Concurrency Deep Dive
  3. Scalability for Backend Engineers

Highly recommended (affiliate friendly).

Top comments (0)