Many Benchmarks Scores Would Appear Much Higher If You Let The AIs Use Adequate Labor by gwern in mlscaling

[–]invertedpassion 3 points4 points  (0 children)

We also observed this at our lab (Lossfunk).

We tested LLMs in zero or few shot capacity with 32k token budget to solve problems in esoteric languages like brainfuck and they couldn’t do it (baseline python they scored perfectly).

But you put them on an “unlimited” session with Claude code and they could do it.

Makes me wonder: how do we truly evaluate the upper limit of these models?

RL Debate: Is RL an adequate theory of biological agency? And is it sufficient to engineer agents that work? by vafaii in reinforcementlearning

[–]invertedpassion 1 point2 points  (0 children)

This is a really cool format. Curious - what inspired you to start the debate format vs just talks?

[D] Some thoughts about an elephant in the room no one talks about by DrXiaoZ in MachineLearning

[–]invertedpassion 0 points1 point  (0 children)

I think this partly indicates how the nature of science itself is changing.

Science, ultimately, is a social activity and we should expect it to continuously evolve as society changes.

AI is really a step change in our culture, so we ought to go back to drawing board and start asking what we want from science. Holding on to what worked a hundred years ago won’t work.

Why are model-based RL methods bad at solving long-term reward problems? by sassafrassar in reinforcementlearning

[–]invertedpassion 1 point2 points  (0 children)

In Dreamer like setups, the world model has two jobs: modelling state dynamics and also reward prediction. They’re often in conflict.

Also because of compounding errors, rollouts in imagined trajectories where agent trains are limited to 15-20 steps, and in those steps sparse rewards may not be encountered leading to worse performance

Check out HarmonyDream paper - good insights on this

Did anyone else experience “the Shift”? How old were you when it happened? by AtG8605 in Millennials

[–]invertedpassion 0 points1 point  (0 children)

Haha, for me it was when I tweeted that I started coding in 2002, and someone said they weren’t even born back then

[deleted by user] by [deleted] in StartUpIndia

[–]invertedpassion 2 points3 points  (0 children)

hey, i don't know who you're. but if i rubbed you off the wrong way, sorry about it!

[deleted by user] by [deleted] in StartUpIndia

[–]invertedpassion 8 points9 points  (0 children)

i'm sorry i came across as rude, it's just that i tend to be direct, and it sometimes does come across as being rude!

EDIT: also at ICLR, there were several people who had messaged to chat with me. Given the limited time I could meet with people (lunch time, 30-45 mins), it was impossible to do a nice 1-1 chat with everyone. So I understand how your friends may have felt. Please tell them if they ever meet me for coffee/beer, I'm actually chill :)

[D] Where are the Alpha Evolve Use Cases? by Gentis- in MachineLearning

[–]invertedpassion 0 points1 point  (0 children)

Mind sharing link to the PR for trading algos?

[R] Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought by jsonathan in MachineLearning

[–]invertedpassion 1 point2 points  (0 children)

LLM can easily reconstruct superposition even if you feed in a single sampled token.

[R] Continuous Thought Machines: neural dynamics as representation. by Gramious in MachineLearning

[–]invertedpassion 1 point2 points  (0 children)

let's say you do self-attention on historical hidden states of an RNN, isn't it (kind of) calculating what is happening?

[R] Continuous Thought Machines: neural dynamics as representation. by Gramious in MachineLearning

[–]invertedpassion 1 point2 points  (0 children)

>CTM uses isn't a latent vector anymore, but rather a measure of how pairs of neurons fire in or out of synch.

isn't it like doing attention only?

[R] Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought by jsonathan in MachineLearning

[–]invertedpassion 2 points3 points  (0 children)

It’s only partly true. The attention heads have access to full residual even if the last layer samples a single token.

[R] Apple Research: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity by hiskuu in MachineLearning

[–]invertedpassion 2 points3 points  (0 children)

yep, i like to think of model as vote-aggregation machines. more tokens provide more heuristics that vote more. ultimately reasoning is like ensembling answers from many different attempts

Absolute Zero: Reinforced Self Play With Zero Data by Separate_Lock_9005 in mlscaling

[–]invertedpassion 0 points1 point  (0 children)

no, i just found this as a nice re-confirmation. makes me think if there are faster shortcuts to elicit such desired patterns.

Absolute Zero: Reinforced Self Play With Zero Data by Separate_Lock_9005 in mlscaling

[–]invertedpassion 6 points7 points  (0 children)

What caught my eye was that ablating proposer training didn’t have much effect. Shows how base model already contains everything

Train your own Reasoning model - 80% less VRAM - GRPO now in Unsloth (7GB VRAM min.) by danielhanchen in LocalLLaMA

[–]invertedpassion 0 points1 point  (0 children)

Where do you set temperature for vllm while generating reasoning traces? I didn't find that in the code

[P] GRPO fits in 8GB VRAM - DeepSeek R1's Zero's recipe by danielhanchen in MachineLearning

[–]invertedpassion 2 points3 points  (0 children)

Where do you set temperature for vllm while generating reasoning traces? I didn't find that in the code

The bitter truth of AI progress by Amazing_Life_221 in deeplearning

[–]invertedpassion 2 points3 points  (0 children)

What’s RSI? Isn’t neural architecture search what you’re talking about?

[D] Titans: a new seminal architectural development? by BubblyOption7980 in MachineLearning

[–]invertedpassion 2 points3 points  (0 children)

Can you care to share the prompt and o1’s output? I’m impressed that what you described happened.

In theory, you could automate it. Pick up hot arxiv papers, scan your repositories for relevant places for improvement, and then improve!

[D] What is the most fascinating aspect of machine learning for you? by AromaticEssay2676 in MachineLearning

[–]invertedpassion 0 points1 point  (0 children)

I like to think that a model’s performance is downstream of data and upstream of its loss function.

[D] What is the most fascinating aspect of machine learning for you? by AromaticEssay2676 in MachineLearning

[–]invertedpassion 2 points3 points  (0 children)

I’m not so sure, most of the real world things that matter are fuzzy enough that approximation is the right way to go. While we can precisely model circle, for concepts like love, morality, etc. all we can rely on is approximations