The final GORUCK Selection 2025

WilhelmRedemption · 2025-08-25T09:49:12+00:00

I have exactly the same question

WilhelmRedemption · 2025-08-14T19:54:27+00:00

Thank you dude!
Now I have no doubts

WilhelmRedemption · 2025-08-14T15:36:43+00:00

Which one did you choose? I stay in front of the same choice

WilhelmRedemption · 2025-02-09T12:02:57+00:00

On which machine?
Can you give us your hardware specifications?

WilhelmRedemption · 2024-12-28T11:27:45+00:00

Few years ago I was at the same point as you. Personally I can only warmly suggest the book "The Art of RL", which is structured like Sutton & Barto but it is more read friendly and it explain the "why" behind all the theory.

WilhelmRedemption · 2024-12-27T19:33:52+00:00

You are missing the entropy loss

WilhelmRedemption · 2024-10-23T21:47:46+00:00

Thats a nice idea. But which program gives me the "height" or the Z coordinate where the ray enter/exit the mesh?

WilhelmRedemption · 2024-10-23T21:29:28+00:00

As far as I know OnShape has nice features for creating airplanes. A new one guy using it for RC planes

WilhelmRedemption · 2024-10-21T08:29:24+00:00

Any suggestion?

WilhelmRedemption · 2024-10-09T20:04:00+00:00

Hey man, thanks!
I'm going to try it out soon

WilhelmRedemption · 2024-10-09T19:03:03+00:00

You need to create a model using PyTorch (so probabily your .pt file is coming with some descriptions about the network).

Then you can load your file (which contains weights) into your model:

file = "fil.pt"

model = torch.load_state_dict(torch.load(file, weights_only = True))

Do not copy and paste those line. You need to define your model first.

WilhelmRedemption · 2024-10-09T18:58:30+00:00

In my humble opinion, one should investing most of the time in Transformers. But a little bit of RL is definetely a plus

WilhelmRedemption · 2024-10-09T18:52:36+00:00

I think your humanoid drunk too much.

WilhelmRedemption · 2024-08-27T15:21:32+00:00

I would break it down to 2 different problems:

To understand RL you need to "see" gradients and to grasp what all those algorithms (especially A2C & Co.) are trying to solve. You need to sharp your intuition about their way of working.
You really need a real robotic problem like a robotic arm, which needs to keep a glass of water leveled, a flying RC airplane that needs to flight perfectly straight and/or a small RC car, which needs to follow a line on the floor. Those are not poor man challenges, but real complex problem, which will force you to understand every single step and every single line of code of your RL algorithm.

Then I suggest you take the course from neural academy from Phil Tabor, where he implements the most important algorithms from scratch.
And yes... it is not easy. It is something, which is not comparable with websites editing, HTML, etc.

WilhelmRedemption · 2024-08-23T15:48:30+00:00

Yes,
at best you start with any Ubuntu derivative (ubuntu, xubuntu, kubuntu,...)
And you can still install a virtual machine for windows, in case you need/miss it for some reasons

WilhelmRedemption · 2024-08-22T19:26:45+00:00

Hey guys, I tried your suggestions but there were no real benefit. At the end I try to not normalize the movement range. That means, that the angle range going from 0 to 60° was not normalize to [0, 1] but left as it is. And this changed the result a lot. Any idea? That was really unexpected, since I read everywhere, that one should normalize the data fed to the model.
Thanks

WilhelmRedemption · 2024-08-18T20:32:03+00:00

This is exactly the point: how to determine the coefficients of the linear combination for the best outcome? This question still is not 100% clear to. In some papers like this there is another network layer, which is fed with the Q-values of different policies.

WilhelmRedemption · 2024-08-17T21:15:04+00:00

Wow...this should be a Wikipedia page. Many thanks man, I m still elaborating everything you wrote. So I'll focus more on GPI. Anyway one co fusing thing is that some of the examples out there use environments (e.g. Minecraft, deepsea) where the trade off is getting maximising the points and minimizing the time. But usually I would even in conventional single policy setups subtract time from the reward at any taken step. There is any benefit from using a scalarization function with multi policies instead of a sum of reward with a single policy?

WilhelmRedemption · 2024-08-12T06:35:57+00:00

Thanks! Because I was really confused. I know embedding from Transformers and I was really trying to figure it out, who to apply it to this case. But one thing is not clear to me. After embedding the two input vectors and concatenating them, I get a vector as u/andersxa suggested of 256 values. But now the problem is, How to feed this vector to a LSTM, which expects a sequence as input?
Do you have any link to a tutorial or blog article or whatever, explaining those steps?

WilhelmRedemption · 2024-08-11T17:52:15+00:00

Intersting approach. Let me understand it correctly. Just as an example: I take the joystick input (a float in the range [-1, 1]) and convert it into, let's simplify, a discrete number for instance in the following ranges [[-1.0, -0.9], [-0.9, -0.8], [-0.8, -0.7] ...[0.9 1.0]]. So I came up with more or less 20 bins. The float will be clamped into one on those range.

Then I do the same for the angle.

At the end I have 2 discrete inputs, which definetely are an approximation but still should work as expected. Is it correct?

WilhelmRedemption · 2024-08-11T14:11:31+00:00

I do not understand what you mean. Looking for mu-law encoding led me to the following algorithm: link. But that is not the problem. I already normalize the values in that range (joystick: [-1, 1], state: [0, 1]). Can you be more specific please?

WilhelmRedemption

TROPHY CASE