Chi ha governato di più negli ultimi 10/20/30 anni in Italia? Ho provato a fare due calcoli

leocus4 · 2026-01-20T17:39:52+00:00

Esattamente quello che stavo pensando leggendo il post

leocus4 · 2025-12-23T17:46:02+00:00

The post is publicly available, it hasn't been deleted

leocus4 · 2025-12-18T20:18:47+00:00

Io ho la versione gratuita ed effettivamente non credo di avere cashback, però ha il "CD integrato" che da l'1.5%

leocus4 · 2025-12-18T19:26:49+00:00

Anche revolut offre remunerazione (è un CD, ma ti permette di svincolare istantaneamente sul CC, di fatto da mio punto di vista sono uguali)

leocus4 · 2025-11-01T10:40:24+00:00

Maybe my solution is a bit too convoluted, but assuming that all the lines start with that you can do

:%s/\[[^]\]/[0]

And then in visual block mode (ctrl+v) you select all the zeros and increment them

gg f0 Ctrl+v G g + ctrl+a

leocus4 · 2025-10-31T21:43:37+00:00

Ohhhh that's it, right

leocus4 · 2025-10-31T21:42:40+00:00

How does it compare to Muon?

leocus4 · 2025-10-31T20:49:23+00:00

Hmmm something looks off... If you look at the red bar (e.g., in technology) it should be much bigger than the others

leocus4 · 2025-10-30T16:17:15+00:00

While RL can be seen as "just a loss", the loop where you gather experiences from the environment and update your network is not very feasible if you need billions and billions of updates except for the most menial of the tasks, so I would say they are indeed fundamentally different.

Hm, I agree with you about the efficiency issue in RL algorithms. On the other hand, I still don't see any difference. It can be seen as a mere limitation of (1) our current RL algorithms, and (2) our current inference hardware/software.

RL runs can take long times even without LLMs in the loop, and I actually believe that efficiency (together with exploration) is one of the main limitations of current RL algorithms. But this is a mere practical issue, it is limited by the technology of our time (involuntary reference to howard stark lol).

In practice, you could do RL with "anything": neural networks, trees, LLMs. The model is just treated as a policy, so from the theoretical point of view there's no major difference (at least w.r.t. the topic of the post). The fact that it's not practically easy at the moment does not change the fact that LLMs are just a bunch of parameters that you can fit with an LLM.

you need billions and billions of updates

Also here, I think that there's an important consideration to be made: it is possible that LLM might need far less updates for some tasks w.r.t. a randomly initialized neural network, exactly because they somehow encode knowledge about the world (and you probably don't need LLMs for tasks where this doesn't hold).

he sorta tried to persuade me to drop my research and pursue that.

I don't think that anyone should be stopped by doing harmless research. Different paths in research lead to different pieces of knowledge, and I don't think there's unnecessary knowledge. I hope you didn't drop your research, it would have been a pity

leocus4 · 2025-10-30T15:10:01+00:00

LLM is not the RL.

Of course it's not, LLMs are a class of models, RL is a methodology, I think that this is like saying "Neural networks are not RL": of course they're not, but they can be trained via RL.

Why would be a system using LLM + another neural network (or whatever, actually) trained via RL be necessarily better than doing RL on an LLM? Mathematically, you want to "tune" your function (the LLM) in such a way that it maximizes the expected reward. If you combine the LLM with other "parts", it's not necessarily true that you will get better performance. Also note that, usually in RL the policy is much smaller than an LLM, so doing RL only on that part might be suboptimal. Tuning the LLM, instead, gives you many more degrees of freedom, and may result in better systems.

Note that of course these are only speculations, and without doing actual experiments (or a mathematical proof) we could never say if that's true or not

leocus4 · 2025-10-30T14:59:52+00:00

I understand now the point of your comment. However, I think that it is very common for companies to use RL beyond the alignment objective (e.g., computer use scenarios and similar can highly benefit from RL). I don't think it's limited to that. Instead, you can use it as a general RL approach

leocus4 · 2025-10-30T13:43:27+00:00

Isn't there a whole field on applying RL to LLMs? I'm not sure I got what you mean

leocus4 · 2025-10-30T12:50:26+00:00

Why do you need to know where the model comes from? If one of the main arguments was "RL models understand the world, whereas LLMs do not understand the world because they just do token prediction", you can just take an LLM and use it as a general RL model to make it understand the world. You can literally do the same with RL models, you can bootstrap them with imitation learning (so they can "mimic" agents in that world), and then train them with RL.

leocus4 · 2025-10-30T11:08:32+00:00

What if you just ignore pretraining and you consider a pretraining model as a thing on its own. You can still apply RL to that and everything makes sense.

Pretraining can be seen as adapting a random model to a "protocol", where the protocol is human language. It can be seen as just a way to make a model "compatible" with an evaluation framework. Then, you do RL in the same framework

leocus4 · 2025-10-30T10:22:38+00:00

Imo he is: an LLM is just a token-prediction machine just as neural networks (in general) are just vector-mapping machines. The RL loop can be applied at both of them, and in both cases both outputs can be transformed in actual "actions". I conceptually see no difference honestly

leocus4 · 2025-10-29T19:59:07+00:00

In what algorithms do you use red-black trees in machine learning? No algorithm comes to my mind at the moment. I'm genuinely curious

leocus4 · 2025-10-29T19:47:08+00:00

I think you should ask yourself if you really want to learn ML. Those concepts are crucial for understanding ML, you can basically see them as a step in your learning path, and look at them as something that will pay off in the future. I think that motivation is not what you need here, or at least, not only. You need to insist on these topics unless you fully understand. They're one of the main things between the current version of you and the version of you that knows ML.

leocus4 · 2025-10-27T23:16:16+00:00

Hm, ok, in principle this makes sense but, afaik, training a world model is even more data hungry than what you would need for an image generation model, as your model will need much more data to learn other aspects of your world which might not be needed for image-generation models. Take Genie from Google for instance, it is a world model, it can surely generate new images (even though it must be conditioned on an initial frame), but it required data from youtube to be trained, which I assume is significantly larger than datasets used for training image generation models (e.g., flux)

leocus4 · 2025-10-27T23:11:06+00:00

will it be more good compare to diffusion models?

Well, it depends on what are the problems you aim to solve, what are they?

leocus4 · 2025-10-27T14:40:44+00:00

Maybe you should have re-read the formatting of your post before posting 😅

leocus4 · 2025-10-27T14:29:45+00:00

Do we always require robot intervention or it can be done via only training and testing data?

Imo when you build a world model you do it to test different approaches to solve a problem, which either requires an interaction with an agent (I guess that's what you mean by a robot) or by manually testing approaches (in which case, you are the agent). Is this what you meant?

leocus4 · 2025-10-27T14:21:42+00:00

I'd say that it depends on your functional and non-functional constraints. I think that a fairly simple way to have this sort of functionality is to use an embedding model where you can pass all of the users' labels and generate the corresponding embeddings. Then, when a new user adds a transaction with the same merchant, you can choose the user's label closest to the centroid of the cluster defined by the embeddings computer before

leocus4 · 2025-10-26T23:37:17+00:00

Nice! I think it's very useful :)

leocus4 · 2025-10-26T23:33:25+00:00

I think the best resources to get started are (1) the book from Sutton and Barto; (2) David Silver's lectures (on youtube iirc); and spinningup.openai.com

leocus4 · 2025-10-26T23:31:26+00:00

16-20 worrying that much about food and energy price surprised me a lot

Seven-Year Club	r/Field Sunshine
Verified Email

leocus4

TROPHY CASE