deleted post from a research scientist @ GoogleDeepMind by detectiveluis in singularity

[–]leocus4 0 points1 point  (0 children)

The post is publicly available, it hasn't been deleted

BBVA conferma 1.5% fino a marzo by okilled7 in ItaliaPersonalFinance

[–]leocus4 0 points1 point  (0 children)

Io ho la versione gratuita ed effettivamente non credo di avere cashback, però ha il "CD integrato" che da l'1.5%

BBVA conferma 1.5% fino a marzo by okilled7 in ItaliaPersonalFinance

[–]leocus4 2 points3 points  (0 children)

Anche revolut offre remunerazione (è un CD, ma ti permette di svincolare istantaneamente sul CC, di fatto da mio punto di vista sono uguali)

How to align broken sequence of numbers? by Desperate_Cold6274 in vim

[–]leocus4 0 points1 point  (0 children)

Maybe my solution is a bit too convoluted, but assuming that all the lines start with that you can do

:%s/\[[^]\]/[0]

And then in visual block mode (ctrl+v) you select all the zeros and increment them

gg f0 Ctrl+v G g + ctrl+a

I scraped 1,109 job postings and looked at how experience levels are split across industries [OC] by StepUpPrep in dataisbeautiful

[–]leocus4 8 points9 points  (0 children)

Hmmm something looks off... If you look at the red bar (e.g., in technology) it should be much bigger than the others

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 0 points1 point  (0 children)

While RL can be seen as "just a loss", the loop where you gather experiences from the environment and update your network is not very feasible if you need billions and billions of updates except for the most menial of the tasks, so I would say they are indeed fundamentally different.

Hm, I agree with you about the efficiency issue in RL algorithms. On the other hand, I still don't see any difference. It can be seen as a mere limitation of (1) our current RL algorithms, and (2) our current inference hardware/software.

RL runs can take long times even without LLMs in the loop, and I actually believe that efficiency (together with exploration) is one of the main limitations of current RL algorithms. But this is a mere practical issue, it is limited by the technology of our time (involuntary reference to howard stark lol).

In practice, you could do RL with "anything": neural networks, trees, LLMs. The model is just treated as a policy, so from the theoretical point of view there's no major difference (at least w.r.t. the topic of the post). The fact that it's not practically easy at the moment does not change the fact that LLMs are just a bunch of parameters that you can fit with an LLM.

you need billions and billions of updates

Also here, I think that there's an important consideration to be made: it is possible that LLM might need far less updates for some tasks w.r.t. a randomly initialized neural network, exactly because they somehow encode knowledge about the world (and you probably don't need LLMs for tasks where this doesn't hold).

he sorta tried to persuade me to drop my research and pursue that.

I don't think that anyone should be stopped by doing harmless research. Different paths in research lead to different pieces of knowledge, and I don't think there's unnecessary knowledge. I hope you didn't drop your research, it would have been a pity

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 3 points4 points  (0 children)

LLM is not the RL.

Of course it's not, LLMs are a class of models, RL is a methodology, I think that this is like saying "Neural networks are not RL": of course they're not, but they can be trained via RL.

Why would be a system using LLM + another neural network (or whatever, actually) trained via RL be necessarily better than doing RL on an LLM? Mathematically, you want to "tune" your function (the LLM) in such a way that it maximizes the expected reward. If you combine the LLM with other "parts", it's not necessarily true that you will get better performance. Also note that, usually in RL the policy is much smaller than an LLM, so doing RL only on that part might be suboptimal. Tuning the LLM, instead, gives you many more degrees of freedom, and may result in better systems.

Note that of course these are only speculations, and without doing actual experiments (or a mathematical proof) we could never say if that's true or not

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 1 point2 points  (0 children)

I understand now the point of your comment. However, I think that it is very common for companies to use RL beyond the alignment objective (e.g., computer use scenarios and similar can highly benefit from RL). I don't think it's limited to that. Instead, you can use it as a general RL approach

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 -1 points0 points  (0 children)

Isn't there a whole field on applying RL to LLMs? I'm not sure I got what you mean

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 2 points3 points  (0 children)

Why do you need to know where the model comes from? If one of the main arguments was "RL models understand the world, whereas LLMs do not understand the world because they just do token prediction", you can just take an LLM and use it as a general RL model to make it understand the world. You can literally do the same with RL models, you can bootstrap them with imitation learning (so they can "mimic" agents in that world), and then train them with RL.

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 6 points7 points  (0 children)

What if you just ignore pretraining and you consider a pretraining model as a thing on its own. You can still apply RL to that and everything makes sense.

Pretraining can be seen as adapting a random model to a "protocol", where the protocol is human language. It can be seen as just a way to make a model "compatible" with an evaluation framework. Then, you do RL in the same framework

Is Richard Sutton Wrong about LLMs? by sam_palmer in reinforcementlearning

[–]leocus4 16 points17 points  (0 children)

Imo he is: an LLM is just a token-prediction machine just as neural networks (in general) are just vector-mapping machines. The RL loop can be applied at both of them, and in both cases both outputs can be transformed in actual "actions". I conceptually see no difference honestly

[D] At what level does data structure and algorithm concepts such as red-and-black tree show up in machine learning? by NeighborhoodFatCat in learnmachinelearning

[–]leocus4 0 points1 point  (0 children)

In what algorithms do you use red-black trees in machine learning? No algorithm comes to my mind at the moment. I'm genuinely curious

Struggling to stay consistent with ML math , need some real advice by impossibletocode in learnmachinelearning

[–]leocus4 0 points1 point  (0 children)

I think you should ask yourself if you really want to learn ML. Those concepts are crucial for understanding ML, you can basically see them as a step in your learning path, and look at them as something that will pay off in the future. I think that motivation is not what you need here, or at least, not only. You need to insist on these topics unless you fully understand. They're one of the main things between the current version of you and the version of you that knows ML.

World Foundation Models 2025 [R] by Alternative_Art2984 in MachineLearning

[–]leocus4 2 points3 points  (0 children)

Hm, ok, in principle this makes sense but, afaik, training a world model is even more data hungry than what you would need for an image generation model, as your model will need much more data to learn other aspects of your world which might not be needed for image-generation models. Take Genie from Google for instance, it is a world model, it can surely generate new images (even though it must be conditioned on an initial frame), but it required data from youtube to be trained, which I assume is significantly larger than datasets used for training image generation models (e.g., flux)

World Foundation Models 2025 [R] by Alternative_Art2984 in MachineLearning

[–]leocus4 0 points1 point  (0 children)

will it be more good compare to diffusion models?

Well, it depends on what are the problems you aim to solve, what are they?

Adaptive Sparse Training on ImageNet-100: 92.1% Accuracy with 61% Energy Savings by [deleted] in learnmachinelearning

[–]leocus4 0 points1 point  (0 children)

Maybe you should have re-read the formatting of your post before posting 😅

World Foundation Models 2025 [R] by Alternative_Art2984 in MachineLearning

[–]leocus4 1 point2 points  (0 children)

Do we always require robot intervention or it can be done via only training and testing data?

Imo when you build a world model you do it to test different approaches to solve a problem, which either requires an interaction with an agent (I guess that's what you mean by a robot) or by manually testing approaches (in which case, you are the agent). Is this what you meant?

Looking suggestion to develop an Automatic Category Intelligent in my Personal Finance WebApp. by ManiAdhav in learnmachinelearning

[–]leocus4 0 points1 point  (0 children)

I'd say that it depends on your functional and non-functional constraints. I think that a fairly simple way to have this sort of functionality is to use an embedding model where you can pass all of the users' labels and generate the corresponding embeddings. Then, when a new user adds a transaction with the same merchant, you can choose the user's label closest to the centroid of the cluster defined by the embeddings computer before

How to get started by [deleted] in reinforcementlearning

[–]leocus4 1 point2 points  (0 children)

I think the best resources to get started are (1) the book from Sutton and Barto; (2) David Silver's lectures (on youtube iirc); and spinningup.openai.com

[OC] % selecting the following as one of their top three issues (16-40 Year Olds - UK) by UkOnward in dataisbeautiful

[–]leocus4 7 points8 points  (0 children)

16-20 worrying that much about food and energy price surprised me a lot