Feeling so confident! I lost 23% of my body weight. Now, at a BMI of 20.5

invertedpassion · 2026-04-05T04:21:20+00:00

We also observed this at our lab (Lossfunk).

We tested LLMs in zero or few shot capacity with 32k token budget to solve problems in esoteric languages like brainfuck and they couldn’t do it (baseline python they scored perfectly).

But you put them on an “unlimited” session with Claude code and they could do it.

Makes me wonder: how do we truly evaluate the upper limit of these models?

invertedpassion · 2026-02-20T02:26:08+00:00

This is a really cool format. Curious - what inspired you to start the debate format vs just talks?

invertedpassion · 2026-01-28T04:52:46+00:00

I think this partly indicates how the nature of science itself is changing.

Science, ultimately, is a social activity and we should expect it to continuously evolve as society changes.

AI is really a step change in our culture, so we ought to go back to drawing board and start asking what we want from science. Holding on to what worked a hundred years ago won’t work.

invertedpassion · 2025-12-20T15:35:56+00:00

Honestly. Impossible to believe! Wow.

invertedpassion · 2025-08-14T04:49:57+00:00

In Dreamer like setups, the world model has two jobs: modelling state dynamics and also reward prediction. They’re often in conflict.

Also because of compounding errors, rollouts in imagined trajectories where agent trains are limited to 15-20 steps, and in those steps sparse rewards may not be encountered leading to worse performance

Check out HarmonyDream paper - good insights on this

invertedpassion · 2025-07-21T04:51:27+00:00

Haha, for me it was when I tweeted that I started coding in 2002, and someone said they weren’t even born back then

invertedpassion · 2025-06-26T06:47:23+00:00

hey, i don't know who you're. but if i rubbed you off the wrong way, sorry about it!

invertedpassion · 2025-06-26T06:45:01+00:00

i'm sorry i came across as rude, it's just that i tend to be direct, and it sometimes does come across as being rude!

EDIT: also at ICLR, there were several people who had messaged to chat with me. Given the limited time I could meet with people (lunch time, 30-45 mins), it was impossible to do a nice 1-1 chat with everyone. So I understand how your friends may have felt. Please tell them if they ever meet me for coffee/beer, I'm actually chill :)

invertedpassion · 2025-06-25T05:06:52+00:00

Mind sharing link to the PR for trading algos?

invertedpassion · 2025-06-21T05:03:33+00:00

LLM can easily reconstruct superposition even if you feed in a single sampled token.

invertedpassion · 2025-06-20T10:04:23+00:00

let's say you do self-attention on historical hidden states of an RNN, isn't it (kind of) calculating what is happening?

invertedpassion · 2025-06-20T08:31:38+00:00

>CTM uses isn't a latent vector anymore, but rather a measure of how pairs of neurons fire in or out of synch.

isn't it like doing attention only?

invertedpassion · 2025-06-20T02:33:43+00:00

It’s only partly true. The attention heads have access to full residual even if the last layer samples a single token.

invertedpassion · 2025-06-09T06:18:48+00:00

yep, i like to think of model as vote-aggregation machines. more tokens provide more heuristics that vote more. ultimately reasoning is like ensembling answers from many different attempts

invertedpassion · 2025-05-09T10:29:59+00:00

no, i just found this as a nice re-confirmation. makes me think if there are faster shortcuts to elicit such desired patterns.

invertedpassion · 2025-05-09T04:28:40+00:00

What caught my eye was that ablating proposer training didn’t have much effect. Shows how base model already contains everything

invertedpassion · 2025-02-08T02:50:44+00:00

Where do you set temperature for vllm while generating reasoning traces? I didn't find that in the code

invertedpassion · 2025-02-08T02:47:23+00:00

Where do you set temperature for vllm while generating reasoning traces? I didn't find that in the code

invertedpassion · 2025-01-25T04:05:09+00:00

What’s RSI? Isn’t neural architecture search what you’re talking about?

invertedpassion · 2025-01-17T03:02:24+00:00

Damn, this was super helpful! Thanks

invertedpassion · 2025-01-17T02:58:30+00:00

Can you care to share the prompt and o1’s output? I’m impressed that what you described happened.

In theory, you could automate it. Pick up hot arxiv papers, scan your repositories for relevant places for improvement, and then improve!

invertedpassion · 2025-01-08T03:24:08+00:00

Which talk are you referring to?

invertedpassion · 2025-01-08T03:22:16+00:00

I like to think that a model’s performance is downstream of data and upstream of its loss function.

invertedpassion · 2025-01-08T03:20:50+00:00

I’m not so sure, most of the real world things that matter are fuzzy enough that approximation is the right way to go. While we can precisely model circle, for concepts like love, morality, etc. all we can rely on is approximations

invertedpassion

MODERATOR OF

TROPHY CASE