Cache-to-Cache: When LLMs Talk Without Words

#research #infra #ai #machinelearning

Originally published on AI Tech Connect.

What you need to know It is research, not a product. Cache-to-Cache (C2C) is a paper — arXiv 2510.03215, accepted at ICLR 2026 — with reference code at github.com/thu-nics/C2C from the THU-NICS group at Tsinghua. There is no production API to call. The core idea is direct semantic communication. Instead of one model writing a message in text and another reading it, C2C passes the source model's KV-cache straight into the target model's representation space. The reported numbers are real gains. 6.4 to 14.2 percent higher average accuracy than the individual models, around 3.1 to 5.4 percent over text communication, and an average 2.5x latency speedup. The trade-offs are equally real. Cache-passing demands access to both models' internals — tight coupling, the opposite of text's universal…

Read the full article on AI Tech Connect →

DEV Community

Cache-to-Cache: When LLMs Talk Without Words

Top comments (0)