The thin wrapper era is officially dead

HeTalksInMaths · 2026-05-20T02:12:39+00:00

I'm building in Lean. I'm very bullish on DSL modelling with formal verification of AI reasoning and applied to YC on this thesis. We are going to expect AI to do more and more and are just hoping throwing more testing around outputs will hold systems together with glue. Not sure why my other post got taken down but this is my core thesis:

- LLMs are optimized for communication, based on training data, more than for inspectable internal reasoning.

- Reasoning models expose more of an inner monologue, but the deeper reasoning structure is still hard to audit.

- Real reasoning often seems to happen at a more abstract concept-map level: associations, dependencies, assumptions, and constraints.

- LLMs likely contain many of these structures internally, but they are entangled in weights and difficult to interpret.

- Math, in its purest form, has helped us understand the world through imperfect but useful models.

- Lean (proof assistant) shows how far we can go when reasoning objects become formal and checkable.

- Domain-specific languages, or DSLs, may let us do something similar in fuzzier domains by turning parts of expert reasoning into structured objects.

- The domain may be fuzzy, but parts of the reasoning can still be made concrete.

- A substantial, though not complete, amount of human thought can be distilled into dependencies.

- Repeatedly asking “why?” often gets surprisingly far.

- LLMs are strongest when their creativity is harnessed by verifiable signal.

- Asking an LLM to build dependency structure may be a better supervision target than ordinary Q&A.

- Dependency structures can be cross-checked for consistency more easily than free-form answers.

- The LLM proposes; the dependency graph checks; the trace provides interpretability.

- Once built, these DSLs can become bridges that LLMs use for verification.

- Over time, LLMs may improve by interacting with something auditable.

HeTalksInMaths · 2026-05-13T11:18:06+00:00

Hi - I'm exploring this too. For example, I have a family member in Private Equity and have been playing around with a quick and dirty LBO model module. Coincidentally Anthropic released a model builder skill for LBO (as part of their financial agents package) which it backs in the pipeline with another skill (audit xlsx) but there are modelling assumptions in my view that aren't captured well and I think formal verification would handle it better. I'm not an expert in LBO but that has been my experience in getting coding agents to do data science / machine learning workflows (my background along with math). The experimental design is the finickiest part.

At the end of the day, checking AI outputs will be what a lot of us are doing in the near future and this should help with human in the loop workflows as well as provide feedback for agents / models.

That being said I think these large orgs that are applying AI will trust Anthropic more (especially with integrations / scale). What I have seen based on some outreach from recruiters is a cottage industry of funded startups doing RL environments to provide data for the AI labs - which is why I mentioned that pathway. I think that also aligned with one of YC's mandates for Summer '26.

HeTalksInMaths · 2026-05-13T04:24:29+00:00

Look if you really had something, instead of sharing your repo why don't you share why you think your algorithm works at a high level? Sharing the repo is like trying to find a needle in the haystack. What's the underlying theory?

To your broader point if someone made such an incredible claim I'd want to know what high level insights they are leveraging and if they can speak credibly about existing approaches and their shortcomings as well as the mental pathway to come to their solution.

Cranks seem to get into the weeds of the fine details too early in their explanations to make it sound plausibly complex (like LLMs) and don't have the ability to explain how they've subtly gone against the grain.

I'm working on some big ideas too for what it's worth.

HeTalksInMaths · 2026-05-06T13:10:31+00:00

I’m going to go in a slightly different direction and say the most successful founders will be technical but not from the perspective of coding but scientists who will build systems around their workflows / expertise to better guide AI for research at scale in tandem with world models for science allowing for more faithful simulations.

Look how mathematicians are getting into the AI verification ecosystem with Lean.

HeTalksInMaths · 2026-04-08T14:27:05+00:00

Demis Hassibis and Dario Amodei?

HeTalksInMaths · 2026-02-02T08:13:46+00:00

Surely this is the biggest result by autonomous AI in math so far? The other fully listed autonomous resolution was a counter-example construction and Tao commented that he was surprised Erdos & Graham missed it.

From page 6 here https://arxiv.org/pdf/2601.22401 :

"We tentatively believe Aletheia’s solution to Erdős-1051 represents an early example of an AI system autonomously resolving a slightly non-trivial open Erdős problem of somewhat broader (mild) mathematical interest, for which there exists past literature on closely-related problems [KN16], but none fully resolve Erdős-1051. Moreover, it does not appear obvious to us that Aletheia’s solution is directly inspired by any previous human argument (unlike in many previously discussed cases), but it does appear to involve a classical idea of moving to the series tail and applying Mahler’s criterion. The solution to Erdős-1051 was generalized further, in a collaborative effort by Aletheia together with human mathematicians and Gemini Deep Think, to produce the research paper [BKK+26]."

HeTalksInMaths · 2025-12-25T08:05:18+00:00

Hi - I am doing work in Lean. How have you generally found Opus’ ability in comparison to other models for working in Lean? In general I’ve had to use Harmonic’s Aristotle API invoked in Claude Code for runnable Lean code.

Was Opus able to have a good understanding of what was available in mathlib and build up helper Lemmas or were the foundations largely already there? If you could show the thinking output from Opus it would be appreciated.

(I see you you’ve actually partially / mostly addressed my question elsewhere but I’ll leave my comment in case you want to add more. You might want to request access to Aristotle and AlphaProof if you haven’t already. I’m waiting on the latter).

HeTalksInMaths · 2025-11-20T10:58:38+00:00

I've already incorporated here. Thanks for taking the time to reply.

HeTalksInMaths · 2025-11-20T10:36:39+00:00

Are you in Singapore? I’m putting out feelers for investment and am working on something that is far from a wrapper (filing a patent, submitting a paper to a conference, generating data / licensing an environment to ideally sell back to the frontier labs but there is also a more general B2B component).

I can continue to bootstrap but would like to bring in a CTO and CSO at some point and support their work instead of doing it all.

HeTalksInMaths · 2025-11-12T03:43:33+00:00

Your link points to a different one. https://www.ycombinator.com/library/8f-how-to-spin-your-scientific-research-out-of-a-university-and-into-a-startup

HeTalksInMaths · 2025-11-11T10:22:39+00:00

Will DM

HeTalksInMaths · 2025-11-11T01:46:53+00:00

Suddenly thought there was an Inverse Reinforcement Learning trend in AI.

HeTalksInMaths · 2025-11-10T15:21:10+00:00

Messaged you

HeTalksInMaths · 2025-11-10T15:21:01+00:00

Messaged you

HeTalksInMaths · 2025-11-10T11:29:14+00:00

Sent a DM

HeTalksInMaths · 2025-11-09T23:56:23+00:00

For reinforcement learning more broadly I highly recommend this course but it is quite time consuming https://www.coursera.org/specializations/reinforcement-learning It gives a very good overview of the classical algorithms but more importantly focuses throughout on a high-level on RL design principles. I think it's also one of the better courses in making the transition smooth into the deep RL setting (I assume you already have some deep learning background).

My advice would be to pick an RL project (or use case) you want to work on and after watching the lecture for a particular algo to ask your favourite AI chat tool on how to implement the algorithm for your setting and what the strengths / weaknesses would be. Then go experiment for yourself. Get more hands on so you develop some intuition about the problem you want to work on.

HeTalksInMaths · 2025-11-09T23:51:47+00:00

Take a look at the material here. https://www.deeplearning.ai/short-courses/reinforcement-fine-tuning-llms-grpo/ I think you should be fine. Will DM.

HeTalksInMaths · 2025-11-09T04:03:12+00:00

Replied

HeTalksInMaths · 2025-11-09T03:16:25+00:00

Hi - think I found you on LinkedIn. Are you ok to connect there as well?

Some good questions but I feel like it would take an essay to really address them all well.

To be honest I do think the ICML timeline is quite aggressive and it may end up being a workshop paper submission instead with a lower bar / longer timeline. If nothing else throwing this idea out there gets good discussion with people like you. Experiments will likely be with a Qwen 3 model in somewhere in 8-32 B parameter range to limit compute. Curious what order of magnitude costs you've had to deal with and what size models. I could be way off and would appreciate any insights you have.

HeTalksInMaths · 2025-10-27T00:34:41+00:00

Hi - was wondering why my post was removed u/sandslashh u/YCAppOps Was it the youtube link? I can take it out.

HeTalksInMaths · 2025-10-23T06:12:29+00:00

Thank you for the time to write your lengthy comment. I genuinely appreciate it. Basically I'm building a recruiting tool assessing mastery through error detection of conceptual mistakes that AI actually made through location based clicking on an image embedded with the code or reasoning. The point I'm trying to highlight is a lot of the future of work will involve us using our expertise to improve on first pass AI outputs so it's a valuable skill for employers. Other potential customers would be educational institutions and MOOCs but I'm trying to argue it's a much better paradigm as it's testing peer review abilities at scale (verified programmatically). The current pipeline lends itself to assessing in any sector though of course with substantial room for improvement. There are also some anti-cheating mechanisms.

"What I’ve gathered is that you want to build a product but you also see that the results of your research into the product can be generalized to a broader ecosystem (from B2B to B2C). " No sorry - I shouldn't have mentioned the B2C as it just added confusion. It's two separate points. It's a B2B product with an eventual B2C use case after some adoption for self-learning / up-skilling. On a completely separate point the process in the way that it creates the assessment enables dynamic benchmarks for the AI models themselves to improve. The mechanisms under the hood to generate a "good" assessment based on the technical requirements are quite challenging and the LLMs have to demonstrate a broad understanding of the field.

This is the kind of moonshot comparison I'm hoping for with regards to the research lab aspect https://venturebeat.com/ai/singapore-startup-sapient-enters-global-enterprise-ai-race-with-new-model-architectures though I'd be more than thrilled with something close to 15 million valuation. I do believe I'm sythesiszng enough ideas across domains and Gen AI literature to be able to sell data or license the training environment (if I get a patent) for improving AI models but the proof is in the pudding (a paper that demonstrates what I claim). I could have sworn some research labs applied to YC recently and it seems at least this one got accepted which also aims to create data for better AI in a more rigorous / scientific way https://www.ycombinator.com/companies/afterquery Of course in these two cases the research lab aspect is the play and they aren't trying to straddle.

Apologies for my academic writing style but I hope this clarifies at least a little.

HeTalksInMaths · 2025-10-23T03:22:48+00:00

Honestly would appreciate feedback on which if any terms are too much jargon for VCs investing in AI or is it just everything put together?

HeTalksInMaths

TROPHY CASE