TurboQuant and RaBitQ: What the Public Story Gets Wrong

#turboquant #ai #llm #rabitq

Hi everyone. My name is Jianyang Gao. I am currently a postdoctoral researcher at ETH Zurich, and I am the first author of the RaBitQ line of work.

In Google Research's paper accepted to ICLR 2026 in January 2026, "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate," there are serious problems in its description of the prior RaBitQ vector quantization method, its comparison of theoretical results, and its experimental comparison with RaBitQ. I will explain the details below. We explicitly pointed out these problems by email before the TurboQuant paper was submitted to ICLR 2026. The TurboQuant team explicitly acknowledged that they were aware of them, but chose not to correct them. The paper was then accepted to ICLR 2026 and subsequently promoted at large scale through Google's official channels, reaching tens of millions of views on social media.

We are speaking publicly now because once an inaccurate academic narrative spreads widely, the cost of correcting it only becomes higher.

Background: What is RaBitQ?

The RaBitQ papers listed below are the main outcome of my PhD research at Nanyang Technological University (NTU Singapore), under the supervision of Associate Professor Cheng Long. The work was published in 2024. It proposed a high-dimensional vector quantization method and theoretically proved that it achieves the asymptotically optimal error bound established in a top theoretical computer science paper (Alon-Klartag, FOCS 2017).

RaBitQ (arXiv:2405.12497, May 2024, later published at SIGMOD 2024)
Extended version (arXiv:2409.09913, September 2024, later published at SIGMOD 2025)

One of the key ideas of RaBitQ is to apply a random rotation to the input vector before quantization, that is, a random rotation / Johnson-Lindenstrauss transform. RaBitQ utilizes the properties of random rotation to perform vector quantization, and it achieves the optimal theoretical error bound.

Problem 1 in TurboQuant: Systematically avoiding the methodological similarity between TurboQuant and the prior RaBitQ method

RaBitQ and TurboQuant have a direct structural relationship at the method level. Both apply a random rotation (a Johnson-Lindenstrauss transform) to the input vector before quantization. This is the most central and closest overlap in the design of the two methods.

In their reply to a reviewer on the ICLR OpenReview platform, the TurboQuant authors described their own method as follows:

"We achieve this by first normalizing the vectors by their l2 norm and then applying a random rotation to ensure the entries of the vectors will have a beta distribution post rotation."

However, neither in that response, nor in the method description in the TurboQuant paper, nor anywhere else in the paper, do they directly state that this structure is the same as the one used in RaBitQ. This omission occurred in the following context:

In January 2025, several months before the TurboQuant paper appeared on arXiv, the second author of TurboQuant, Majid Daliri, proactively contacted us and asked for help debugging his own Python version translated from our RaBitQ C++ implementation. He described in detail the steps he had taken, the code snippets he used, and the specific errors he encountered. This shows that the TurboQuant team had a detailed understanding of the technical details of RaBitQ. Yet in the arXiv version they released in April 2025, and again in the version they submitted to ICLR 2026 in September 2025, they described RaBitQ as grid-based PQ while omitting the core random rotation step in RaBitQ. An ICLR reviewer independently pointed this out in the review, writing: "RaBitQ and variants are similar to TurboQuant in that they all use random projection," and explicitly requested a fuller discussion and comparison. Even so, in the final ICLR version, the TurboQuant authors not only failed to add any real discussion of RaBitQ, but actually moved their already incomplete description of RaBitQ out of the main text and into the appendix.

Because of this, in March 2026 we emailed all TurboQuant authors and raised the issue again, together with a request for correction. In response, the TurboQuant authors refused this request on the grounds that:

"The use of random rotation and Johnson-Lindenstrauss transformations has become a standard technique in the field, and it is not feasible for us to cite every method that employs them."

We believe this response deflects the real issue. RaBitQ is not just one of many unrelated methods using a generic idea. Under the same problem setting, it is the concrete prior work that first combined random rotations (Johnson-Lindenstrauss transforms) with vector quantization and established optimal theoretical guarantees. RaBitQ should therefore be described accurately in the paper, and its relationship to TurboQuant should be discussed explicitly.

Problem 2 in TurboQuant: Mischaracterizing RaBitQ's theoretical results

Without providing any supporting argument, the TurboQuant paper characterizes RaBitQ's theoretical guarantees as "suboptimal." The paper states:

"While the paper's theoretical guarantees are suboptimal, likely due to loose analysis -- as practical performance surpasses theoretical bounds"

This sentence directly labels RaBitQ's theoretical guarantees as "suboptimal" and attributes that to "loose analysis." But the paper provides no derivation, comparison, or evidence to justify this claim.

The fact is that in Theorem 3.2 of the extended RaBitQ paper (arXiv:2409.09913), we already gave a rigorous proof that RaBitQ achieves the asymptotically optimal error bound established in the top theoretical computer science paper of Alon and Klartag (FOCS 2017). Because of this result, we were invited to present it at a workshop affiliated with FOCS, one of the top conferences in theoretical computer science.

For this reason, in May 2025 we had multiple rounds of detailed technical email exchanges with the second author of TurboQuant, Majid Daliri, and clarified point by point where the TurboQuant team's reading of our theoretical result was wrong. In those emails, Majid Daliri explicitly stated that he had communicated these discussions to all co-authors.

However, throughout the later process in which TurboQuant was submitted to ICLR 2026, reviewed, accepted, and then broadly promoted, this incorrect characterization of RaBitQ's theoretical guarantee was never corrected.

An unsupported claim that remained in the formally published TurboQuant paper even after the original authors pointed out the error in detail, and even after the TurboQuant team explicitly knew about it, goes beyond the category of an ordinary mistake.

Problem 3 in TurboQuant: Deliberately creating an unfair experimental setup

The TurboQuant paper tested RaBitQ using a degraded implementation and a single-core CPU with multithreading disabled, while testing TurboQuant on an A100 GPU. The quantization speed reported for RaBitQ in TurboQuant is several orders of magnitude slower than the actual speed of our open-source implementation.

In an email from May 2025, Majid Daliri himself explained where this gap came from:

"we were using a single-core CPU instance, and multiprocessing was indeed disabled [...] we weren't fully utilizing parallelism, which explains why it was significantly slower"

Our official RaBitQ code was already publicly available when the paper first appeared on arXiv, both in May 2024 and in September 2024, and it used multithreaded parallelism by default. Moreover, in his January 2025 emails, Majid Daliri also stated that he had successfully run RaBitQ for testing, but the version he used for the experiments was still his own translated Python implementation. This means that the speed numbers reported for RaBitQ in the TurboQuant paper were built on top of two systematic sources of unfairness:

They used their own translated Python code instead of our open-source C++ implementation.
They evaluated RaBitQ on a single-core CPU with multithreading disabled, while evaluating TurboQuant on an NVIDIA A100 GPU.

Neither of these two points was fully disclosed in the paper. What readers see is the conclusion that RaBitQ is slower than TurboQuant by several orders of magnitude. What they are not told is that this conclusion is built on deliberately constructed unfair experimental conditions.

Full timeline of events

May. 2024: The RaBitQ paper was posted on arXiv, with source code released at the same time. It was later published at SIGMOD 2024.
Sep. 2024: The extended RaBitQ paper was posted on arXiv, with source code released at the same time. It was later published at SIGMOD 2025.
Jan. 2025: TurboQuant second author Majid Daliri contacted us and asked for help debugging a Python implementation of RaBitQ.
Apr. 2025: The TurboQuant paper was posted on arXiv.
May. 2025: We emailed Majid Daliri about the differences in experimental setup and clearly explained why RaBitQ's theoretical guarantees are optimal. Majid Daliri said he had informed all authors, but after we asked them to correct the factual errors in TurboQuant, he stopped replying.
Nov. 2025: We discovered that the TurboQuant paper had been submitted to ICLR 2026 and that the factual errors in the paper still had not been corrected. We therefore contacted the ICLR 2026 PC Chairs, but received no response.
Jan. 2026: The TurboQuant paper was accepted to ICLR 2026.
Mar. 2026: The TurboQuant team continued to promote the paper through Google's official channels, and related social media views reached tens of millions.
Mar. 2026: We formally emailed all TurboQuant authors, explained the three factual problems above, and requested corrections and clarifications. To date, we have only received a generic response from the TurboQuant first author, Amir Zandieh, who promised to address problems 2 and 3 but refused to address problem 1, namely the need to discuss the technical similarity between TurboQuant and RaBitQ. In addition, they were only willing to make any such corrections after the official ICLR 2026 conference had concluded.

What we have already done

Posted a public comment on ICLR OpenReview: https://openreview.net/forum?id=tO3ASKZlok
Submitted a formal complaint again to the ICLR General Chairs, PC Chairs, and Code and Ethics Chairs, together with a full evidence package

What we will do next

Release a detailed technical report on TurboQuant and RaBitQ on arXiv
Consider raising the matter further with relevant institutions

Final remarks

Our goal in raising these issues is to ensure that the public academic record accurately reflects the real relationship among these methods. Once a paper is pushed to the public by Google with tens of millions of impressions, the inaccurate narrative in that paper does not need to be actively propagated. If it is left uncorrected, it will become consensus by default. That is why we chose to document this publicly.

We also sincerely ask everyone to help more people understand the problems behind the TurboQuant paper. We believe that the truth becomes clearer through open debate.

Top comments (4)

Oisín • Apr 10

The wildly different setup for the benchmarks is quite shocking. It makes no sense to compare program X locked to a single CPU core against program Y running on a GPU -- particularly if the problem domain is very well-suited to parallelisation. Hopefully a serious correction can be published, or the paper retracted until the methodological issues are fixed.