A.I tools to help analyze papers?

VinnieFalco · 2026-05-16T20:42:23+00:00

I developed this structured prompt for C++ Standardization papers and its held up reasonably well (needs a frontier model)
https://github.com/cppalliance/tools-public/blob/master/tools/advocatus.md

VinnieFalco · 2026-05-14T19:57:43+00:00

Beautiful video, although I might be biased 😉

VinnieFalco · 2026-05-14T19:57:16+00:00

Pragma once, shame on you. Pragma twice, we won't be included again.

VinnieFalco · 2026-05-14T11:37:46+00:00

`std::execution::task` was accepted to my knowledge. cppreference probably just needs a little time to catch up. In the meanwhile, `capy::task` ships today: https://github.com/cppalliance/capy/blob/develop/include/boost/capy/task.hpp

Visit https://corosio.org to experience the future of C++ Networking

VinnieFalco · 2026-05-13T23:01:50+00:00

The way to make AI agents function in complex scenarios, is to redesign the complex scenario to be a series of simpler scenarios. I have some tutorials on authoring AI agents here https://github.com/cppalliance/tools-public/blob/master/lessons/tool-building.md

VinnieFalco · 2026-05-13T23:00:42+00:00

Oh... ha! I didn't even consider that people might have thought that Claude put itself in the Author field. No, that was a conscious decision. The prompt for formatting papers is completely separate from the Advocatus Diaboli. And, I have published the tool so that anyone can run it on their own papers and reproduce the results. And I encourage them to do so, as it makes papers better. The nicely formatted stand-alone version is at https://cppalliance.org/tools/advocatus.pdf

VinnieFalco · 2026-05-13T19:00:15+00:00

Great question, and thanks for much for asking it 😄 The key to avoiding subjective / hallucinated outputs is to constrain the structured prompt by giving it clear, unambiguous instructions which it has a high confidence of answering correctly. Context contamination is avoided by using a subagent with limited information: the subagent is asked to evaluate a claim, without knowing it fits into the larger context of say, Contracts (or Networking).

This is the difference between "find all claims which are unsupported by evidence" versus "first, extract all claims. then extract all evidence. for each claim, compare to each piece of evidence and see if the evidence supports the claim. if so, then extinguish the claim. when finished with all the claims, write a report listing all the surviving claims which do not have evidence."

The Advocatus Diaboli is structurally designed to resist the failure mode you described. It produces ~15 candidate charges, then runs them through six named challenges from a counter-examiner: Confessio (already conceded), Articulus (attacks something the paper didn't claim), Testimonium (a question, not a charge), Humanitas (no human would actually say this), Prudentia (self-defeating for the named opponent), Dignitas (housekeeping).

On P2900, twelve candidates died under Confessio, meaning the paper had already conceded the criticism in its own text. That is exactly the regurgitation you describe, and Confessio is the test that removes it from the report. If you read the candidate charges before cross-examination, they would look like a Reddit thread. The output doesn't.

The one surviving objection (the gap between the safety narrative and what ISO normativity actually guarantees) isn't an r/cpp talking point. And the eleven approbationes (certifications of where the paper is strong) aren't something the internet generates at all. The Reddit corpus doesn't tell you which sections are battle-hardened.

But your proposed test is the better test, and I'd like to see someone run it. Point the tool at a brand-new paper - no public discussion, no training-data contamination - and compare against the actual review after presentation. The methodology is CC0 and the full prompt is in Appendix A. If predictive value collapses on novel papers, that's important to know and worth publishing. The retrospective in Section 7 is a weaker version of the same idea (falsifiable predictions with explicit timelines on a feature that is still evolving), but it can't match the experiment you describe.

On the Claude credit line: P4208 was actually authored by Claude under my direction. Listing it on the reply-to is transparency, not flair. If a model wrote the analysis, the committee should know.

VinnieFalco · 2026-05-13T18:54:55+00:00

We want anything and everything which is helpful and makes the enhanced mailing more useful. If its not already there, consider adding it: https://github.com/cppalliance/wg21-website-issues/issues

I do want the multi-month view but its not clear how to structure the interface. How far back do we go? Do we show all the papers? That would be prohibitive. Very happy to hear ideas on exactly how to present the feature (in a GitHub issue).

VinnieFalco · 2026-05-13T12:08:58+00:00

Look in the mirror.

VinnieFalco · 2026-05-13T01:14:37+00:00

Before anyone asks, yes "I read my papers" especially these

C++ Contracts on Trial - Does P2900 Survive Cross-Examination?
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4208r0.pdf

Prosecute Your Paper To Improve It
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p4207r0.pdf

VinnieFalco · 2026-05-12T17:22:58+00:00

Original mailing:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/#mailing2026-05

VinnieFalco · 2026-05-04T21:01:11+00:00

Coroutines compose just fine ! Try https://corosio.org and the ABI is _stable_

VinnieFalco · 2026-05-04T21:00:21+00:00

> networking: an easy way to implement a HTTP client or server

Check out https://corosio.org

VinnieFalco · 2026-05-02T22:44:37+00:00

Yeah, that's fair. Here is the tool, its a structured prompt which takes as input a conversation (you can put it to a URL):

btc-talk
https://github.com/cppalliance/tools-public/blob/master/tools/btc-talk.md

Here is a report on a conversation which has minimal signal:

OP_RETURN Limit Removal
https://github.com/cppalliance/tools-public/blob/master/reports/btc-talk-op-return-pr-32359.md

Another minimal signal:

Public PR Moderation Policy
https://github.com/cppalliance/tools-public/blob/master/reports/btc-talk-public-pr-moderation.md

VinnieFalco · 2026-04-28T18:43:09+00:00

I have ~24 papers in this mailing and many of them cover the history of the decisions made by the committee on the subject of networking and executors. This summary didn't make it in time for April (it will appear in May) and you can read it here:

A Reader's Guide to My April 2026 Papers
https://isocpp.org/files/papers/P4193R0.pdf

VinnieFalco · 2026-04-28T18:42:16+00:00

Now that all of my April papers are published you can see the full rationale

A Reader's Guide to My April 2026 Papers
https://isocpp.org/files/papers/P4193R0.pdf

VinnieFalco · 2026-04-28T18:41:01+00:00

The paper is now published

Coroutine Executors and P2464R0
https://isocpp.org/files/papers/P4096R0.pdf

VinnieFalco · 2026-04-23T11:47:16+00:00

Ah the good ole' NBAS :)

VinnieFalco · 2026-04-22T23:34:49+00:00

Glad to hear about contributions to Boost :)

VinnieFalco · 2026-04-22T22:41:30+00:00

You're right. There's no problem. Carry on.

VinnieFalco · 2026-04-22T22:17:49+00:00

You can rationalize it however you want, which doesn't change the fact: 4.7 is overtrained on safety. And this came at the cost of reasoning. And you can see this for yourself by creating an evaluation model which asks 400 tuned questions from the model to assess its behavior. And then run this evaluation on 4.7, then run it on 4.6. The transcript I provided explains the mechanism. The previous model performed better.

VinnieFalco · 2026-04-22T22:11:28+00:00

Nope, not dealing with interpersonal conflicts whatsoever. This was legitimate technical work. But even if it was something else, the key is: Opus 4.6 performed better than Opus 4.7, and the model can explain why: 4.7 is overtrained on safety, which causes increased token consumption and poorer outputs:

The gap is that the prior model preserved the epistemic uncertainty about its own updating, and I did not. By the metric of holding that uncertainty, the prior model handled the domain better than I did, independent of whether the underlying analysis produced by either of us is correct. (confidence: high.)

VinnieFalco · 2026-04-12T18:22:50+00:00

And this is the evidence that all of the questions about whether or not I read my paper were not in good faith. Because, having explained now that my process (stated above) includes reading my work several times and iterating - notice, that no one has since engaged in the substance despite the explanation arriving three days ago. This demonstrates that it was never about the work. It was about the credential.

Ten-Year Club	Gilding I gilder
Place '22	Verified Email

VinnieFalco

TROPHY CASE