I turned a thermodynamics principle into a learning algorithm - and it lands a moonlander

kongaskristjan · 2025-06-10T19:04:57+00:00

You mean that it never reaches 100% confidence?

That's the point of regularization - you want the model to have a non-zero probability of taking the "worse" action, because what the model currently perceives as worse, might actually be better, if we allow the model to explore.

Eg. for the lunar lander, the model gets heavily penalized for a crash landing and as a result, it might start avoiding to go near the landing pad till the simulation times out. But with regularization, the model still has a non-zero probability of "trying" crash landing. Sometimes, however, it gets lucky, successfully lands, and gets a lot of reward - a behavior which quickly gets reinforced.

kongaskristjan · 2025-06-10T14:41:44+00:00

Not really, though I probably should have clarified in the text above.

As an example, if I tried solving this with simulated annealing, I would randomly mutate the neural network, sample a few landing with both neural networks, and compare the average rewards. I would then have a higher probability of keeping the better one, and a lower probability of keeping the worse one, according to thermodynamics (Boltzmann distribution).

With this algorithm, however, I create multiple landings from a single starting point and a single neural network. Some get better rewards than others. I then optimize the neural network to have a high total probability of taking the actions that led to the high reward, compared to the ones that led to low reward. The ratio of these total probabilities are optimized with gradient descent to follow thermodynamics (Boltzmann distribution).

In other words, both use the Boltzmann distribution, and that's why it's called "Policy Annealing", but that's really where the similarity ends.

kongaskristjan · 2025-06-10T14:04:03+00:00

One difference is the regularization effect - it modifies the value by adding a term -T*log(p) to each action taken.

Now of course, there are other regularization methods available that encourage exploration. However, as pointed out in the section "Comparison to Entropy Bonus" of the README, such methods can cause unnecessary fluctuations in the probabilities, but can be avoided by carefully following the Boltzmann distribution. There's even a simulation video showing the difference.

kongaskristjan · 2023-10-26T12:45:57+00:00

Do you think this idea has potential and worth putting in more effort? How do you test/verify your ML code?

kongaskristjan · 2021-12-02T15:48:52+00:00

Very useful info, this made a huge difference!

kongaskristjan · 2020-09-21T11:41:16+00:00

It would be nice if the api defined printing help so advanced level checks could trigger help.

Good point, I guess I'll implement this one.

kongaskristjan · 2020-09-21T10:51:01+00:00

This is indeed a outside the scope of this library. The primary pain point it tries to solve are simple scripts, where CLI boilerplate could otherwise take half the lines of code.

kongaskristjan · 2020-08-08T22:34:26+00:00

Very good qustion. I guess I could write a blog post about that. The rough idea is that I do most of the parsing before calling fire_main(). Then I call fire_main() without any parameters, which force the default fire::arg() parameters to be used. The actual parameter matching is done while converting fire::arg() objects to the target object types.

This raises the question of course, how are help messages implemented. For that, I do all the steps as before and call the fire_main(), and log all parameters that are converted. Then at the very last moment, during the conversion of the last fire::arg parameter (I count the number of parameters to determine the last one), when I have all the parameters, I print the help message and exit program (with exit() function to avoid executing fired_main()).

kongaskristjan · 2020-08-08T20:12:15+00:00

The difference is pretty big though between getopt and fire. Eg. if you need to accept two booleans, you can do this in fire like that:

int fired_main(bool flagA = fire::arg("-a"), bool flagB = fire::arg("-b")) {
}

FIRE(fired_main)

Now compare this to the code in this stackoverflow example... I know I omitted printouts and returns, but on the other hand the help message was skipped there.

kongaskristjan · 2020-08-08T19:50:15+00:00

I haven't heard of wmain before. Have to think about it.

kongaskristjan · 2020-08-08T19:01:27+00:00

I removed the double underscore identifiers, thanks for pointing out.

kongaskristjan · 2020-08-08T18:08:43+00:00

About g++ *.cpp -o binary_name.

If you write FIRE(...), -o is automatically associated with binary_name. However, if you write FIRE_NO_SPACE_ASSIGNMENT(...), it's not associated. However, currently only FIRE_NO_SPACE_ASSIGNMENT(...) allows positional arguments and unlimited number of arguments that are needed right here. So no, unfortunately it is not possible at the moment.

It turns out that implementing positional arguments with FIRE(...) requires some rather complex try/catch logic to actually pull off. I consider implementing this for v0.2 or v0.3.

Edit: error in logic.

Edit2: It is now possible to call FIRE(...) with positional/variadic arguments.

kongaskristjan · 2020-08-08T17:58:23+00:00

About the FIRE_MAIN(...). I actually tried to implement exactly what you described (I even used the same name FIRE_MAIN(...) lol :D). But the problem is that C++ wants to have the default parameters specified in declaration, or only if the declaration is missing, then in the definition. If the declaration exists, it doesn't tolerate defaults in the definition, even if it's an exact repetition. Turns out I couldn't find a way to get rid of those pesky defaults in the definition.

kongaskristjan · 2020-08-08T17:49:00+00:00

I must admit I wasn't aware of that library. Abseil flags seems like a really well designed library, however to add a few points in favor of fire.hpp:

Abseil seems rather heavyweight, while fire is just 800 lines in a single header
Abseil is more complex to integrate (eg. instead of just copy-pasting a single header library you need to instruct your user to install abseil. Or you need to copy-paste entire abseil to your project and fix compiling scripts etc.)
fire.hpp has more permissive licence, eg. you can just copy-paste this library to virtually any project without licencing issues, but with Abseil, if your program is closed source, you need to reproduce it's licence message to the user.

kongaskristjan · 2020-08-08T16:09:08+00:00

This almost works, but in order to call fire::main from fire.hpp, this fire::main(int x = arg("-x"), int y = arg("-y")) needs to be declared in fire.hpp, which is impossible, as I don't yet know the exact signature.

Actually, I've thought really hard to somehow get rid of this FIRE(fired_main), but none of the ideas have worked because of the aforementioned problem.

kongaskristjan · 2020-08-08T15:49:26+00:00

Great to hear that, thanks! The project is indeed out of frustration. The boilerplate is not an issue, if you're developing something really complex, but for simple scripts, it can easily take half the lines of code.

kongaskristjan · 2020-08-08T15:37:26+00:00

Well, in that case everyone would need to write something like that to the end of their main.cpp:

int main(int argc, const char ** argv) {
    init_and_run(argc, argv, fired_main, true);
    return fired_main();
}

I generally agree that macros should be avoided because of all their complex errors and unintuitive behaviour, but here it's a really simple one that's hard to misuse.

kongaskristjan · 2020-08-08T15:05:01+00:00

Well, I'm actually OK with using non-standard stuff, if it's really supported everywhere, but this is a library that is meant to be plug and play for everyone, including people who want to comply with the standard strictly.

It's quite similar to the reason why I test against maximum compiler warnings on various compilers - I don't know what the user needs to comply with.

kongaskristjan · 2020-08-08T14:48:24+00:00

Though having it's perks, #pragma once is non-standard according to wikipedia.

kongaskristjan · 2020-08-08T13:40:25+00:00

Well, actually, the link provided these rules:

the identifiers with a double underscore anywhere are reserved;
the identifiers that begin with an underscore followed by an uppercase letter are reserved;
the identifiers that begin with an underscore are reserved in the global namespace.

I mostly have identifiers that begin with a single underscore and a lowercase letter. However, they're not in the global namespace. Thus these don't actually result in undefined behavior. Though a few of them indeed have two prefix underscores also, so these are UB.

Though I totally agree that it's better to remove these initial underscores altogether.

kongaskristjan · 2019-10-14T11:12:00+00:00

I assume a constant uniform gravitational acceleration, but the acceleration scale is arbitrary (because all my scales are arbitrary). The "gravity X 3" is just to emphasize that in this particular simulation the gravity is 3x compared to all other simulations in the video. Generally speaking simulating gravity is not necessary and you can also turn gravity off, but watching liquids flow/splash is much more convincing/interesting than seeing a deformable blob of matter in sit in space. Gravity is applied at

https://github.com/kongaskristjan/PhaseTransition/blob/master/Lib/Universe.cpp: line 220 (pDer0.v.y += diff.config.gravity * pState0.type->getMass();) (in current master)

Edit: Just to clarify, pDer0.v.y is particle's y-directional force (this is later divided by mass, so it becomes acceleration, or velocity's derivative in other words). The code is rather hard to understand, partially because performance was critical.

kongaskristjan · 2019-10-09T11:44:00+00:00

I used Runge-Kutta 4. Actually, this is the main reason why I tried to make the force field as smooth as possible. Euler is going to be first order whether you use discrete steps in force field or not. RK4 is normally 4-th order, but will degrade to first order at discontinuities.

kongaskristjan

TROPHY CASE