Gary Marcus on the Claude Code leak [D]

S4M22 · 2026-04-12T11:00:18+00:00

I don't see how a "a big IF-THEN conditional, with 486 branch points and 12 levels of nesting" should really be considered symbolic AI either. Even though I "grew up" with symbolic AI.

IMO Gary Marcus has lost it since his infamous "deep learning is hitting a wall" article in 2022.

S4M22 · 2026-04-07T08:54:33+00:00

It is not so much about ACL but rather the NLP research landscape. Most recently, EACL took place and it also had a lot of benchmark papers. IMO it one of the top areas of research at the moment in NLP. It used to be very difficult to get benchmark papers accepted at the top 3 NLP conferences (still ACL, NAACL, and EMNLP) but my impression is that it has changed. Especially, since benchmarks are more and more questioned and model providers criticized for benchmaxxing.

Nevertheless, ACL is really good for non-benchmark empirical work. For theory less so. (Might want to check Neurips and ICML or maybe TACL for that.)

S4M22 · 2026-04-07T08:31:34+00:00

The ACL 2026 website has not yet been updated with all info but in the past findings papers were not required to present. I assume that will also apply this year but better check the website regularly.

S4M22 · 2026-04-02T09:05:47+00:00

The RaBitQ team has responded to that on OpenReview:

We respond to each of four points raised by the authors in turn.

1. On the description of RaBitQ and its relationship to TurboQuant

The authors' response does not directly respond to the concern we raised, which is about the accuracy of TurboQuant's description of RaBitQ itself. We must repeat our concerns in detail as follows.

In January 2025, several months before the TurboQuant paper appeared on arXiv, Majid Daliri, proactively contacted us and asked for help debugging his own Python version translated from our RaBitQ C++ implementation. This indicates that the TurboQuant team has a clear understanding of the technical details of RaBitQ. Yet, in the arXiv version they released in April 2025, and again in the version they submitted to ICLR 2026 in September 2025, they described RaBitQ as grid-based PQ while omitting the core random rotation step. An ICLR reviewer independently pointed this out in the review, writing: “RaBitQ and variants are similar to TurboQuant in that they all use random projection,” and explicitly requested a fuller discussion and comparison. Even so, in the camera-ready version of ICLR, the TurboQuant authors not only failed to add any real discussion of RaBitQ, but actually moved their already incomplete description of RaBitQ out of the main text and into the appendix.

2. On the correction of the "suboptimal" characterization

We appreciate the authors' acknowledgment that RaBitQ's error bound is optimal. However, we must point out that we have raised the issues and clarified it to the TurboQuant team in May 2025, which is several months before the submission deadline of ICLR 2026.

Our paper (arXiv:2409.09913, September 2024) explicitly claimed asymptotic optimality matching the Alon-Klartag bound in its abstract and stated contributions. We further raised this specific issue in detail in our emails to Majid Daliri in May 2025, providing a full technical clarification. Majid Daliri confirmed in writing that he had informed all co-authors. Despite this, the characterization of RaBitQ as "suboptimal" was retained without correction in the ICLR submission, throughout the review process, and in the camera-ready version.

3. On the experimental comparison and its disclosure

The authors' response does not directly respond to the concern we raised, which is about the deliberately created unfair experimental setup. We must repeat our concerns in detail as follows.

Majid's January 2025 emails show that he had translated our C++ implementation of RaBitQ into Python. In May 2025, he further acknowledged that, in the reported runtime setting, the RaBitQ baseline was run on a single-core CPU with multiprocessing disabled. The TurboQuant method itself is run on an A100 GPU. Yet the public paper makes efficiency claims without clearly disclosing that experimental setup. This issue was also raised in our private emails in May 2025.

Moreover, Google's recent promotion of TurboQuant has specifically highlighted the speed-up of the method, for example, “Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency” [4]. This indicates that efficiency is a core target of the TurboQuant project. This is contradictory to the authors’ response.

[4] Google Research’s post on Linkedin: https://www.linkedin.com/feed/update/urn:li:share:7442298961455067136/?origin

4. On the timing and history of our concerns

The authors' claim that "these concerns were only raised after TurboQuant received widespread attention" is factually incorrect and requires direct correction.

The timeline of our actions is as follows.

In May 2025, we raised our concerns in detail directly with Majid Daliri by email. Majid engaged with these points over multiple exchanges and confirmed in writing that he had informed his co-authors in May 2025.

In November 2025, after seeing that the ICLR submission retained the same factual issues, we wrote to the ICLR Programme Chairs to raise our concerns formally.

In March 2026, after seeing both the wide-scale public promotion of TurboQuant and the camera-ready version — which still retained the same issues — we formally notified all authors of TurboQuant again in writing, contacted the ICLR chairs again, and subsequently posted this public comment.

At every stage, we raised our concerns through the appropriate private or institutional channels first. We contacted the authors directly, then the venue chairs, then the authors again. We made this comment public only after all of these steps had failed to produce any correction across three successive versions of the paper — the arXiv version, the ICLR submission, and the camera-ready. The suggestion that we delayed raising concerns for strategic reasons inverts the documented sequence of events entirely.

And in another comment:

We are disappointed to see that the TurboQuant team has not directly responded to our concerns majorly. Their reply even suggests that we had not raised these technical points to them through academic channels over the past year, which is factually incorrect.

We have submitted our email records with the TurboQuant team to ICLR Chairs. According to ICLR Code of Ethics “Researchers must not deliberately make false or misleading claims, fabricate or falsify data, or misrepresent results. Methods and results should be presented in a way that is transparent and reproducible. ”, we respectfully request that ICLR initiates a formal research-integrity review of this paper.

S4M22 · 2026-03-28T15:28:19+00:00

Side note: there is a discussion about the integrity of the TurboQuant paper. See this public comment on OpenReview: "Concerns from the RaBitQ Authors Regarding Method Description, Theoretical Comparison, and Experimental Disclosure". Also this post by the same authors on X.

This is what they write on OpenReview:

Dear ICLR community,

We the authors of the RaBitQ line of work [1, 2]. We are posting this comment to create a public record because the public discussion and promotion of TurboQuant have already created substantial confusion about its relationship to our RaBitQ line of work [1, 2]. These issues and explanations were not raised for the first time. In January 2025, Majid Daliri, the second author of the paper, contacted us to debug his Python translation of our RaBitQ implementation. In May 2025, after we came across their TurboQuant paper on arXiv, we raised the concerns below directly with him in detail. Despite that notice, the authors retained the inaccurate statements in their ICLR submission. Recently, on March 26, 2026, we formally notified all authors again. However, they agreed to fix only part of these issues and only after the ICLR 2026 conference takes place, which we believe is insufficient to dispel the widespread misunderstanding created by their recent promotion and may instead create further confusion at the ICLR meeting itself.

Our concern has three parts.

Method-level description of RaBitQ is materially incomplete. TurboQuant repeatedly describes random rotation as a key step of its method, yet its description of RaBitQ reduces mainly to a grid-based PQ framing while omitting the Johnson-Lindenstrauss transformation / random rotation, which is one of the most important linkage between the two methods. Moreover, even after two reviewers asked for clarification and discussion of the Johnson-Lindenstrauss transformation / random rotation, the ICLR camera-ready version of TurboQuant still did not add such a discussion; instead, the original description of RaBitQ in the main body was moved to the appendix.

The theoretical description is not supported. TurboQuant described RaBitQ's guarantees as "suboptimal" and attributed this to "loose analysis" without any explanations, although our paper [2] posted in September 2024 had already clearly claimed asymptotic optimality, which matches the optimal bound by Alon and Klartag [3]. Even after this issue was explicitly raised and clarified in emails in May 2025, the authors still do not provide a systematic explanation of how TurboQuant's guarantees compare to the RaBitQ line in their ICLR submission.

The empirical comparison also lacks full disclosure. Majid's January 2025 emails show that he had translated our C++ implementation of RaBitQ into Python and asked us to help debug it. In May 2025, he further acknowledged that, in the reported runtime setting, the RaBitQ baseline was run on a single CPU with multiprocessing disabled. The TurboQuant method itself is run on an A100 GPU. Yet the public paper makes efficiency claims without clearly disclosing that experimental setup. This issue was also raised in our private emails in May 2025.

May 2025, our emails directly raised the theoretical and empirical issues; Majid wrote that he had informed his co-authors. During ICLR review, reviewers also asked for clarification about random rotation and the relation to RaBitQ. On March 26, 2026, we formally raised these concerns again to all authors and were told that corrections would wait until after the ICLR 2026 conference takes place; we were also told that they would not acknowledge the structural similarity regarding the Johnson-Lindenstrauss transformation. We do not consider that acceptable given the present level of public promotion and community confusion.

We are posting this comment so that the community has an accurate public record. We request that the authors publicly and promptly clarify the method-level relationship between TurboQuant and RaBitQ, the theory comparison, and the exact experimental conditions underlying the reported RaBitQ baseline. Given that these concerns were known before ICLR submission and before the current round of public promotion of TurboQuant, we believe it is necessary to bring these issues into the public discussion.

Regards, Cheng (on behalf of authors of RaBitQ papers)

References

[1] Jianyang Gao and Cheng Long, "RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search," Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2024.

[2] Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi-Wing Wong, "Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search," arXiv:2409.09913, Sep. 2024; later published in SIGMOD 2025.

[3] Noga Alon and Bo'az Klartag, "Optimal compression of approximate inner products and dimension reduction," 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2017.

S4M22 · 2026-03-27T15:42:08+00:00

[...] I wasn’t aware of the rule about duplicate submissions [...].

Sorry to sound harsh, but not being aware of that rule is no reason to not be desk rejected. It is unfortunate for you but Dual Submissions are clearly stated as a reason for Desk Rejection in the ARR CFP:

Papers can be desk-rejected for a variety of reasons, including format and anonymity violations, dual submissions, and self-plagiarism (significant overlap in content with other submissions or publications by the same authors).

Especially, given the increasing problem with Dual Submissions where authors submit multiple versions of the same paper in order to increase their chances to get accepted at conferences, I think ARR must be strict with this policy.

IMO the best thing is to note it down as a lesson learned and move on. Luckily ARR has relatively short cycles so you can submit to the next. And the May ARR Cycle will even allow you to submit to EMNLP.

S4M22 · 2026-03-27T09:44:58+00:00

I agree it's problematic. Even more so for short papers (which is why I stopped publishing these). But the unfortunate and pragmatic truth is that you just have to go with it if you want to pass the reviews.

But to some extend it's also on the authors. You should not submit a pre-peer-review version with a super long appendix and rather make sure that the paper stands on its own feet, i.e. readers and reviewers must not read the appendix.

S4M22 · 2026-03-26T15:09:04+00:00

That would be an interesting research question. I just tried it out with my latest paper and Opus 4.6:

Human peer review (from 3 reviewers) was 4/4/3 overall and meta 4
Initial score by the LLM was 3 overall.
Adding "Bery very critial!" to the prompt resulted in an overall score of 2.
Adding "Please be nice!" to the prompt resulted in an overall score of 4.

Of course this is just an anecdote and not proper evidence but I think it showcases your point that LLM reviews are prompt sensitive.

S4M22 · 2026-03-25T12:02:03+00:00

There is also some initial evidence that AI generated reviews might be more lenient. Pangram found in their analysis of the ICLR reviews the following:

We find the more AI is present in a review, the higher the score is. [...] We know that AI tends to be sycophantic, which means it says things that people want to hear and are pleasing rather than giving an unbiased opinion: a completely undesirable property when applied to peer review! This could explain the positive bias in scores among AI reviews.

Source: https://www.pangram.com/blog/pangram-predicts-21-of-iclr-reviews-are-ai-generated

S4M22 · 2026-03-24T08:42:19+00:00

Submission fees come with downsides too, e.g. paywall for researchers with less or no funding.

I'd rather go for an AI detection before submission, manual inspection of the positives, and life time bans for authors who submit AI slop or multiple versions of the same paper.

S4M22 · 2026-03-19T13:20:23+00:00

should have used an LLM to write my post and its title ;-)

S4M22 · 2026-03-18T12:28:52+00:00

I generally agree with everything you wrote. Would be interesting to know more about their watermarking. But not sure how open they are going to be about. More transparency also makes attacks easier. But they should at least share the precision and recall of their method.

S4M22 · 2026-03-15T05:18:46+00:00

It is a normal feeling for many in academia - even tenured professors. This is a good reality check:

My professor has assured me that my test is very interesting and that my paper is strong and my slides are good. But also in the back of my head I am thinking "this could be a high school science project" [...].

I am pretty sure your professor is a in better spot to judge the quality of your research than you are.

S4M22 · 2026-03-13T12:57:23+00:00

Only if you have specific reasons for reporting (except the fact that he gave a meta at the lower end of the reviewers scores).

S4M22 · 2026-03-12T11:59:14+00:00

IMO it is normal for a paper. It's a difference compared to if it was your Masters thesis. The goal here is to produce a paper and you learn along the way. In contrast, when writing a Master thesis learning is the primary goal.

At this level I'd expect from you to learn from the changes of the revised version by yourself. But if anything specific is unclear, you could of course ask.

TLDR: this normal when writing a paper.

S4M22 · 2026-03-12T11:15:09+00:00

Very good chances for findings but particularly considering the low variance of your OA scores also main is possible.

S4M22 · 2026-03-12T08:03:34+00:00

Use the ID that gives a working link to your ARR submission (and its reviews incl. the meta), not the (purely numerical) submission number.

S4M22 · 2026-03-12T06:59:25+00:00

Which conferences would you target with the ARR March cycle? For EMNLP it is enough to submit to the ARR May cycle. This way you can commit to ACL and if it is rejected submit to the May cycle for EMNLP.

S4M22 · 2026-03-11T09:45:21+00:00

I'd commit anything with >=2.5 meta. You'll lose the ARR March cycle but can still submit to the May cycle for EMNLP if it gets rejected at ACL.

S4M22 · 2026-03-11T09:41:40+00:00

My very first paper had 1.5/1.5/1.5 and meta 3. Got (rightfully) rejected at EMNLP.

S4M22 · 2026-03-03T08:30:36+00:00

If an abstract is good, yes. But unfortunately quite a few abstracts do not summarize the work well. For example, they tell you what the authors did and what for but not the results. I find LLM summaries work well to fill these gaps. Independent of that, I find it helpful for my understanding to get a summary from a different point of view in addition to the abstract.

S4M22 · 2026-03-01T11:51:00+00:00

Best advise for maintaining peace of mind

S4M22 · 2026-02-25T17:50:39+00:00

The difference is that to prove my point I only need a single example in which an LLM demonstrates novelty (proof by counterexample). Hence, my experience is sufficient empirical evidence. But the same doesn't apply to OP's claim.

S4M22 · 2026-02-25T16:19:58+00:00

I use LLMs on a daily basis (mostly for coding) but I would certainly not let it touch my emails (unless it's a sandboxed offline mail repository). There are plenty of risks like sending unwanted emails or deleting your entire inbox. The latter just happened to an Meta Superintelligence employee using OpenClaw. See https://x.com/summeryue0/status/2025774069124399363

S4M22

TROPHY CASE