TL;DR for the impatient.
qdfis a schemaless Go serializer (struct tags, no.proto). On real batches it's up to 68% smaller than protobuf, decodes 4–9× faster thanencoding/json, ships hand-written AVX2/NEON bit-packing at ~50 GB/s, and does one thing no other mainstream Go serializer does: it can runSELECT … WHERE …over a[]byteand decode only the columns and rows you asked for. Pure Go, zero dependencies.github.com/alex60217101990/qdf
This is the engineering deep-dive, not the marketing page. We're going to look at actual hexdumps, the codec picker's never-larger guarantee, the twin-bitmask three-valued predicate engine, and a profiler-driven argument about why your decode path is slow for a reason you probably haven't measured. If you write Go services that serialize the same five shapes forever — logs, events, metrics, RTB bids, OTLP spans — this is for you.
The problem nobody's format actually solves
Every binary serializer makes you pick two of three:
| schemaless | small wire | fast / cheap | |
|---|---|---|---|
encoding/json |
✅ | ❌ | ❌ (allocates a mountain) |
| msgpack | ✅ | ⚠️ (per-record) | ⚠️ |
| protobuf / flatbuffers | ❌ (.proto + codegen) |
✅ | ✅ |
JSON is universal and schemaless and burns CPU and GC like it's free. msgpack is smaller but you still decode the whole blob to read one field. protobuf and flatbuffers are fast and compact — right up until you're maintaining .proto files and a codegen step for what used to be a plain struct.
qdf is an attempt to refuse the tradeoff: self-describing wire (decode straight into a struct, no schema), protobuf-class sizes on batches, genuinely extreme decode speed, and a columnar mode you can query. Let's see how, byte by byte.
type Event struct {
TS int64 `qdf:"ts"`
Level string `qdf:"level"`
Code int32 `qdf:"code"`
}
b, _ := qdf.Marshal(events, qdf.OptBalanced) // []Event -> []byte
var back []Event
_ = qdf.Unmarshal(b, &back)
Struct tags name fields, exactly like json:. No registry, no generated types to keep in sync. The decoder figures out mode, codecs and compression from the wire itself — you never pass options to Unmarshal.
1. The wire format in one look
A qdf buffer is a 5-byte header + a tagged body. That's the whole envelope.
51 44 46 01 XX [ tagged body … ]
'Q' 'D''F' ver flags bytes 5 … N
The flags byte is a tiny bitmap telling the decoder which dialect the body speaks, so it can fast-path or reject before parsing a single value:
-
FlagDense(0x01) — body uses the Dense intern dialect (back-reference tags). -
FlagQPack(0x02) — body may carry the QPack numeric/bool codec tags. -
FlagRANS(0x04) — body is rANS-compressed; decompress first. -
FlagColIndex(0x08) — a columnar payload carries a per-column length index (this is what makes selective decode an O(1) skip).
The base tag space is msgpack-shaped — fixint, fixstr, fixarr, typed scalars, str/bin/arr/map in 8/16/32 widths, negfixint. On top of that sit the Dense back-reference tags and the QPack codec tags. That base layer is why a Fast-mode qdf buffer is about as small as msgpack and just as quick; the extra tags are where qdf pulls ahead on batches.
An actual buffer, byte for byte
Encode one &Event{TS:7, Level:"ERR", Code:500} with OptSpeed → 29 bytes, every one accounted for:
51 44 46 01 00 QDF, ver 1, flags 0x00 (Fast)
d5 03 map, 3 fields
82 74 73 07 "ts" -> fixint 7
85 6c 65 76 65 6c 83 45 52 52 "level" -> fixstr "ERR"
84 63 6f 64 65 c4 f4 01 "code" -> uint16 0x01F4 (500)
Two details that tell you how the encoder thinks:
-
It picked the narrowest tag that holds the value.
500went out as a 2-byteuint16, not a 4-byteint32. The picker always reaches for the smallest tag, per value. - There's no schema anywhere. The keys
ts/level/codeare in the bytes. That's the cost of being schemaless on a single message — and exactly what Dense mode erases on a batch.
Flip to OptBalanced on a slice of these and the repeated keys (ts/level/code) and repeated values ("ERR") collapse to 1-byte back-references after first sight. Which brings us to the encoder.
2. Encode: it measures, then packs
qdf doesn't pick one scheme and pray. The encode pipeline:
value → typeDesc cache → columnar transpose → per-column codec picker
→ Dense intern → rANS (opt-in) → []byte
Reflection runs once per type, ever. The first call for a type builds a type descriptor — a flat array of encode/decode closures over unsafe field offsets — and caches it in a sync.Map. Every later call touches only those closures: no reflect.Value churn, no per-field type switch on the hot path.
The codec picker and the never-larger rule
For every numeric/bool slice the encoder runs a cheap bounded probe and emits the smallest of a family. The comparison includes the raw form, so if nothing wins it falls back — turning compression on can never inflate a slice. This "never-larger by construction" property is the whole reason you can flip OptBalanced on blindly.
| codec | idea | wins on |
|---|---|---|
| FOR | store value − min, bit-pack to width of max−min
|
bounded ranges (HTTP codes 200–504 → ~10 bits, not 32) |
| Delta+FOR | FOR over consecutive differences | monotonic-ish columns: timestamps, IDs, offsets |
| RLE |
(value, run-length) pairs |
long runs: status, enum, sparse flags |
| Dictionary | distinct table + bit-packed indices (ceil(log2 d) bits/row) |
low cardinality, incl. string columns (level, region) |
| Patched FOR | FOR + an exception list for outliers | mostly-narrow columns with a few spikes |
Delta+FOR, with the actual bytes
Take []int64{1000, 1001, …, 1009} — ten 8-byte integers, 80 bytes raw. Marshal(ints, OptQPack) gives 12 bytes total:
00000000 51 44 46 01 02 e6 07 00 d0 0f 02 0a |QDF.........|
Header is 5 bytes (flags 0x02 = QPack), so the body is 7 bytes for ten int64s. Codec 0xE6 = Delta+FOR: it stored the first value, the minimum delta, and the residual deltas bit-packed. Since every delta is exactly 1, the residuals collapse to almost nothing.
That's the mechanism behind the headline 512× compression on monotonic timestamp vectors — a clock column is the perfect case: large absolute values, tiny constant deltas.
SIMD bit-packing — same wire, faster code
The bit-pack/unpack kernels are hand-written assembly: AVX2 on amd64, NEON on arm64, and they emit byte-identical output to the scalar path. Tests assert scalar ≡ SIMD bit-for-bit. So -tags qdf_simd is purely faster, never a different wire — runtime CPUID gate, scalar fallback on anything without AVX2.
- 22–53× over scalar at byte-aligned widths
- ~50 GB/s unpack (memory-bound there, not compute-bound)
If you run OptBalanced/OptCompression over numeric data, this build tag is free money:
go build -tags qdf_simd ./...
Implementation note for the SIMD-curious: the decode kernels lean on
VPMOVZXwiden-loads andVPBROADCASTQ+VPSRLVQvariable-per-lane shifts (a per-offset shift table picks the bit offset for each lane); encode usesVPSHUFBbyte-gather andVPSLLVQ+lane-OR. On arm64, several of those have no direct Plan9 mnemonic and get hand-encoded viaWORD. It's the kind of code where "byte-identical to scalar" is a property you test, not hope for.
The four-layer Dense dialect (strings & structure)
Repeated strings and field names are where batch formats bleed. Dense mode stacks four mechanisms so the second occurrence of a value is nearly free. Take []string{"eu-west-1","eu-west-1","eu-west-1"} under OptBalanced — 19 bytes:
00000000 51 44 46 01 03 a3 e0 09 65 75 2d 77 65 73 74 2d |QDF.....eu-west-|
00000010 31 e8 e8 |1..|
| bytes | meaning |
|---|---|
51 44 46 01 03 |
header, flags 0x03 (Dense | QPack) |
a3 |
fixarr, 3 elements |
e0 09 65…31 |
1st value: intern declaration — tag + len 9 + "eu-west-1"
|
e8 |
2nd value: one-byte back-reference |
e8 |
3rd value: one byte again |
First "eu-west-1" costs 11 bytes; each repeat costs 1. That's the whole game on telemetry, where region/service/level repeat across thousands of rows. The four layers producing those one-byte refs:
- Intern table — first sight stored, assigned an id; later sights become a varint reference.
- Move-to-front — the hot set resolves in 1–2 bytes via a small MRU ring (recent values get the shortest codes).
-
Markov-0 "same as last" — a value equal to the previous one is a single repeat tag (the
e8above). -
Markov-1 pair predictor — if
"GET"is usually followed by"/health", the predicted successor collapses too.
Floats get Gorilla (lossless XOR coding over math.Float64bits — bit-exact for NaN/±Inf/−0.0, never ==) and ALP (decimal-mantissa for quantized metrics/prices, with an exception list for anything that doesn't round-trip exactly). The opt-in order-0 rANS pass is the final never-larger squeeze for cold storage.
The structural win (and the gotcha)
Here's why qdf lands smaller than protobuf on real batches: it dedups and compresses across records. protobuf, msgpack, json and flatbuffers encode each record independently, so a repeated string or a smooth float series re-pays its cost every single row. qdf pays once per batch.
Gotcha #1: that cross-record win needs a batch. On a single small message there's nothing to dedup, so OptBalanced ≈ OptSpeed ≈ msgpack in size — use OptSpeed there and skip the Dense bookkeeping.
Gotcha #2: the Dense wire embeds intern/shape ids that depend on emission order, so two semantically-equal payloads can differ byte-for-byte. If you hash or sign the bytes, encode with OptSpeed.
3. The headline: read less than the whole message
Hand qdf a []struct and it transposes rows into columns — think Parquet, but automatic and still self-describing. Each column then gets the codec that fits it: timestamps go Delta+FOR, an enum-ish level goes dictionary, a run-heavy code goes RLE.
rows ([]Event) columns (each its own codec)
┌────┬───────┬──────┐ ┌──────────┬────────┬──────┐
│ ts │ level │ code │ → │ ts ts ts │ level… │ code…│
│ … │ … │ … │ │ Delta+FOR│ dict │ RLE │
└────┴───────┴──────┘ └──────────┴────────┴──────┘
With OptColumnIndex the encoder also writes, right after the shape declaration, a fixed-width index: one uint32 byte-length per column. That index is the key — it lets the decoder compute each column's start offset and jump straight past any column it doesn't need, without parsing a byte of it.
Querying the bytes
buf, _ := qdf.Marshal(events, qdf.OptBalanced|qdf.OptColumnIndex)
// "SELECT ts, code WHERE level='ERROR' AND code>=500" — over a []byte.
type Hot struct {
TS int64 `qdf:"ts"`
Code int32 `qdf:"code"`
}
var hot []Hot
_ = qdf.Unmarshal(buf, &hot,
qdf.Where("level", func(s string) bool { return s == "ERROR" }),
qdf.Where("code", func(c int32) bool { return c >= 500 }))
What the decoder actually does, in order:
- Read the shape + column index. Now it knows where every column starts.
-
Filter columns — decode only the columns named in a predicate (
level,code). Run each predicate across its whole column to produce a per-row bitmask. - Combine the masks (AND here) into the surviving-row set.
-
Project — for the columns
Hotwants (ts,code), materialize values only at the surviving rows.levelwas read to filter, then dropped becauseHotdoesn't contain it. Every other column is skipped via the index — its bytes are never parsed.
The predicate engine: twin bitmasks + SQL three-valued logic
It isn't just AND-of-equals. And, Or, Not compose into a real predicate tree — and the tricky part is nullable columns: in SQL, a comparison against NULL is neither true nor false, it's UNKNOWN. qdf gets this right with twin bitmasks per node: a T mask (rows definitely true) and an F mask (rows definitely false). Anything in neither is UNKNOWN.
-
Leaf: run the predicate per present row → fills
T;F = present &^ T(present-but-not-true). Absent (nil) rows land in neither —UNKNOWN, for free. -
AND:
T = T₁ & T₂,F = F₁ | F₂(false if any child is false — even if another is unknown). -
OR:
T = T₁ | T₂,F = F₁ & F₂. -
NOT: swap
TandF(unknown stays unknown).
The final result keeps only rows in the root T mask — TRUE, never FALSE, never UNKNOWN — which is exactly SQL WHERE semantics.
A neat optimization: a subtree with no nullable leaves can't produce UNKNOWN, so qdf skips materializing its F mask entirely and treats "not true" as the complement — one fewer pass over the rows.
_ = qdf.Unmarshal(buf, &hot,
qdf.Or(
qdf.Where("level", func(s string) bool { return s == "ERROR" }),
qdf.And(
qdf.Where("code", func(c int32) bool { return c >= 500 }),
qdf.Not(qdf.Where("level", func(s string) bool { return s == "DEBUG" })),
),
))
The predicate is called once per row against the native typed value — func(int32) bool, func(string) bool — with zero interface boxing. Pure projection without a filter is just Select("ts","code").
No mainstream Go serializer does this. json, msgpack, protobuf, gob — all decode the whole message before you can read one field. For "store a wide batch, read a few columns or filter rows later," qdf is the only one that reads less than everything.
Concretely, on a wide batch at low selectivity (i7-9750H):
- ~5× faster than full decode (projection)
- ~5× less memory than full decode
- ~2.5× faster than decode-everything-then-filter
When it applies: you need OptColumnIndex at encode time, a []struct batch, and flat-ish fields. The bigger and wider the batch and the more selective the query, the larger the win. It's the columnar-warehouse pattern brought to a plain Go []byte — no database, no schema. (It is not for single messages or streaming — that's the row-by-row half of the design.)
4. Decode: the fastest work is the work you skip
Here's the claim that should change how you think about serializer performance:
Profile any serializer's decode and the truth is the same: it's allocation-bound, not CPU-bound.
Run go test -memprofile on a string-heavy decode and look at -alloc_objects. On qdf's row path it's almost entirely one call: (*Decoder).ReadString — copying string bodies out of the buffer into owned Go strings. Tag walking, bounds checks, type dispatch — rounding error. So the levers that matter aren't clever ALU tricks. They're don't allocate and don't decode.
Lever 1 · Zero-copy decode
var out []Event
_ = qdf.Unmarshal(data, &out, qdf.WithNoCopy()) // strings alias data, no copy
WithNoCopy returns strings and byte slices that point into data instead of copying out. On a string-heavy batch: ~1.7× faster, 7000+ allocations collapse to 3 (the only one left is the output slice). The decoder is already pooled and its scratch buffers reused, so with aliasing there's essentially nothing left to allocate per value.
The catch is honest and it's in the name. The returned values are valid only while data stays alive and unmodified. The footgun:
func handler(w http.ResponseWriter, r *http.Request) {
buf := pool.Get(); defer pool.Put(buf) // recycled!
io.ReadFull(r.Body, buf)
var msg Msg
qdf.Unmarshal(buf, &msg, qdf.WithNoCopy())
queue <- msg // msg.Field aliases buf … which is about to be reused → garbage
}
That's a use-after-free the race detector won't catch (it's not a data race — it's manual memory). So WithNoCopy is opt-in by design: perfect for read-and-discard over a buffer you own (a file, an mmap, a batch you process then drop), wrong for a pooled request body that outlives the call. Works on the reflection path, codegen, and streams.
Lever 2 · Decode in struct order
The encoder writes fields in struct-declaration order, so on decode the next wire field is almost always the next struct field. The decoder keeps a cursor and tries the expected field first — one string compare — before falling back to a map lookup. A profile of a wide-struct decode had ~40% of time in mapaccess1_faststr + the hash; the cursor removes that on the common path. The map stays as the fallback, so out-of-order, partial, and unknown fields still decode correctly — you just pay the lookup for the ones that actually arrive out of order.
Lever 3 · Lazy, pooled state
Decoders come from a sync.Pool, and their machinery — the intern table, scratch slices — allocates only on first use. A plain struct decode never touches the intern table, so it never pays for it. (Concretely: moving that table behind a lazily-allocated pointer cut a chunk of per-call overhead, because the codegen path builds a fresh decoder per nested value and was zeroing ~4 KiB of table it never used.)
Lever 4 · The biggest win: don't decode at all
Everything from §3 lands here too. Selective decode skips whole columns via the index and never rebuilds filtered rows. If your read pattern is "a few columns of a big batch," the fastest qdf decode is the one that touches almost none of the bytes. No micro-optimization beats not doing the work.
For the last drop: codegen
//go:generate qdfgen -type Event,Batch .
qdfgen emits concrete methods using only the public API — no reflect at runtime, no descriptor lookup. The generated decoder is a flat key switch (and it threads noCopy, so zero-copy works on generated types too):
func (v *Sample) UnmarshalQDFOpts(src []byte, noCopy bool) (int, error) {
d := qdf.NewDecoderOnBuf(src)
if noCopy { d.SetNoCopy(true) }
n, err := d.ReadMapHeader()
// …
for i := 0; i < n; i++ {
kb, _ := d.ReadStringBytes()
switch string(kb) { // no alloc: compiler special-cases switch string([]byte)
case "name": { rv, _ := d.ReadString(); v.Name = rv }
case "age": { rv, _ := d.ReadInt(); v.Age = int(rv) }
// …
}
}
}
On a fixed schema that's up to 8.5× faster decode than encoding/json.
And on encode, AppendMarshal hands you buffer ownership for zero per-call allocation:
out = out[:0]
out, _ = qdf.AppendMarshal(out, v, qdf.OptBalanced) // reuse your own buffer
The mental model: encode allocations are constant (a flat 3, output buffer pooled); decode allocations scale with how much you ask for. So the two levers that matter are alias-instead-of-copy (WithNoCopy) and ask-for-less (selective decode).
5. Benchmarks, and how they're measured
2019 i7-9750H, Go 1.26. Wire sizes are deterministic. Latencies are median of 6 runs; throughput claims use benchstat over ≥10 interleaved runs so a single warm/cold run can't lie. Everything reproducible from the bench/ module — a separate module so competitor deps (protobuf, vmihailenco/msgpack, flatbuffers) stay out of the core, which has zero dependencies:
cd bench
go test -run='^$' -bench Decode -benchmem -count=10 | tee new.txt
benchstat old.txt new.txt
Wire size vs the field (bytes, lower is better)
| fixture | json | msgpack | protobuf | qdf balanced | qdf compress |
|---|---|---|---|---|---|
| OTLP 4×512 | 1 027 033 | 793 192 | 561 860 | 240 686 | 179 181 |
| Logs 1024 | 245 037 | 193 476 | 156 479 | 89 631 | 62 149 |
| RTB 1024 | 559 294 | 428 404 | 327 700 | 258 167 | 203 360 |
| Events 1024 | 122 857 | 84 712 | 64 978 | 39 650 | 39 639 |
| IoT 32×256 | 469 058 | 224 534 | 207 562 | 158 474 | 148 177 |
Smaller than protobuf on every batch: OTLP −68%, Logs −60%, Events −39%, RTB −38%, IoT −29%. Because qdf compresses across records and protobuf doesn't. That's the entire gap.
Throughput
| workload | result |
|---|---|
Decode vs encoding/json
|
4–9× faster across payloads (2–7× vs msgpack) |
| Numeric/bool slices (QPack) | 5× smaller than json, 21× faster encode, 80× faster decode |
| SIMD bit-unpack (AVX2/NEON) | 22–53× over scalar, ~50 GB/s (memory-bound) |
| ~150 MiB realistic payload (Dense) | 7.5× faster encode, 8.1× faster decode than json |
| Encode (Fast, pooled) | ~1.1 GB/s, 3 allocs/op — vs ~1000 allocs/op for json & msgpack |
| Zero-copy decode (string batch) | 7002 → 3 allocs, −38% B/op, ~1.7× faster |
| Codegen decode | up to 8.5× over json on a fixed schema |
| Selective decode (few columns) | ~5× faster & ~5× less memory than full decode |
Note the asymmetry: encode is a flat 3 allocations no matter the payload; decode allocations scale with how much you ask for — which is exactly why WithNoCopy and selective decode matter.
6. Which knob, when
One Options bitmask on the encode side. You never pass options to Unmarshal — it reads the header and handles whatever it gets.
| Option | Reach for it when |
|---|---|
OptSpeed |
Hot path, single messages, sub-µs latency. msgpack-shaped. The drop-in encoding/json replacement. Also: use it if you hash/sign the bytes. |
OptBalanced |
Default for batches: Dense interning + adaptive numeric codecs. Big wire win, still fast. |
| `OptBalanced\ | OptColumnIndex` |
OptCompression |
Cold storage. Adds Gorilla/ALP + rANS. Smallest wire; encode slower — write-once-read-rarely. |
WithNoCopy() |
Read-mostly over a buffer you own and won't mutate. Near-zero-alloc decode. |
AppendMarshal |
Own the output buffer for zero per-call allocation. |
qdfgen |
Fixed schema, every nanosecond counts — reflection-free generated methods. |
The presets are just bundles of bits you'd compose by hand anyway:
const (
OptSpeed = 0 // Fast mode, nothing on
OptBalanced = OptDense | OptQPack | OptShapeIntern | OptPairPred | OptMTF
OptCompression = OptBalanced | OptGorillaFloat | OptRANS
)
One axis, left to right: lowest CPU → smallest bytes. And every step is never-larger, so moving right never inflates a buffer.
OptSpeed ──▶ OptBalanced ──▶ OptCompression
fastest −60% vs proto smallest
≈ msgpack still fast slower encode
The same Logs-1024 batch, measured: json 245 KB → msgpack 193 KB → protobuf 156 KB → OptBalanced 90 KB → OptCompression 62 KB.
Two build tags — free performance, off by default
Orthogonal to Options: these change the generated machine code, not the wire. Same bytes, faster processing.
-
-tags qdf_simd— AVX2 (amd64) / NEON (arm64) bit-pack kernels, byte-identical output, runtime CPUID gate + scalar fallback. 22–53× over scalar. If you runOptBalanced/OptCompressionon numeric data, turn it on — it's free. -
-tags qdf_reflect2— swapsreflect.MakeSlice/MakeMapWithSize/Newformodern-go/reflect2unsafe equivalents → smaller decode allocations on map/slice-heavy payloads. The one honesty note: this is the single opt-out from zero-dependency. Worth it if your data is map/slice-dense and you're not on codegen.
go build -tags "qdf_simd qdf_reflect2" ./... // combine freely
7. Streaming
enc := qdf.NewStreamEncoder(w, qdf.Dense)
for _, ev := range events { _ = enc.Encode(&ev) }
enc.Close()
dec := qdf.NewStreamDecoder(r)
for {
var ev Event
if err := dec.Decode(&ev); err == io.EOF { break } else if err != nil { return err }
}
The header is written once; the Dense intern table is shared across messages, so a 10k-row log pays for each distinct key once (the second message's "region":"eu-west-1" is a back-reference into the first). Each message is length-framed — a uvarint byte-count precedes its body — so a message of any size round-trips, even across a reader that hands you one byte per Read, and io.EOF marks the end cleanly. SetNoCopy works here too; aliases stay valid for the stream's lifetime because the window is never compacted.
QDF hdr │ len₁ · msg₁ │ len₂ · msg₂ │ … EOF
5B once │ uvarint+body│ uvarint+body│
Streaming and columnar are the two halves of the design: streaming is row-by-row for unbounded feeds; columnar is a complete batch you can query. So the whole-batch features — OptColumnIndex, Where/Select, OptRANS — aren't part of streaming, by design.
8. Where it doesn't win (the honest part)
-
OptSpeedwire ≈ msgpack — the speed tier skips columnar compression on purpose. UseOptBalancedwhen you want the bytes back. - The compression tier's encode is slower (Gorilla/ALP cost real CPU). It's a storage play, not a hot path.
- protobuf and flatbuffers still win raw single-message decode and single-tiny-message size — generated code and zero-copy field access are hard to beat when there's no batch to amortize over. Different tool for "one small message, decoded whole, hot."
qdf's sweet spot is batches of structured records you want small on the wire and partially readable later: telemetry, logging, metrics, analytics, event sourcing.
Try it
go get github.com/alex60217101990/qdf
Pure Go, zero dependencies — nothing to vendor, no schema compiler in your pipeline. Swap it in where you use encoding/json, flip a batch path to OptBalanced|OptColumnIndex, read back just the columns you need — then go stare at your allocation graph.
- Repo: github.com/alex60217101990/qdf
- Full API reference: pkg.go.dev/github.com/alex60217101990/qdf
If the query model or the codec picker is useful to you, a ⭐ on the repo helps others find it. And if you find a payload shape where qdf loses that it shouldn't — open an issue with the fixture. That's the most useful bug report there is.






![Transpose: []struct rows become per-column codecs plus a length index](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6i2pqt2lmefqfuxmum9e.png)






Top comments (0)