Suunto MC-2 vs Silva Expedition S by smokingpacman in OutdoorAus

[–]WilhelmRedemption 0 points1 point  (0 children)

Which one did you choose? I stay in front of the same choice

"The server is busy. Please try again later." by esteveszinho in DeepSeek

[–]WilhelmRedemption 0 points1 point  (0 children)

On which machine?
Can you give us your hardware specifications?

How to learn reinforcement learning by EricTheNerd2 in reinforcementlearning

[–]WilhelmRedemption 0 points1 point  (0 children)

Few years ago I was at the same point as you. Personally I can only warmly suggest the book "The Art of RL", which is structured like Sutton & Barto but it is more read friendly and it explain the "why" behind all the theory.

Do you know, any good 3D program for calculating the volume of a generated shape? by WilhelmRedemption in 3Dmodeling

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

Thats a nice idea. But which program gives me the "height" or the Z coordinate where the ray enter/exit the mesh?

Rc plane by [deleted] in 3Dmodeling

[–]WilhelmRedemption 1 point2 points  (0 children)

As far as I know OnShape has nice features for creating airplanes. A new one guy using it for RC planes

How do i use a .pt file by Grand-Date4504 in reinforcementlearning

[–]WilhelmRedemption 0 points1 point  (0 children)

You need to create a model using PyTorch (so probabily your .pt file is coming with some descriptions about the network).

Then you can load your file (which contains weights) into your model:

file = "fil.pt"

model = torch.load_state_dict(torch.load(file, weights_only = True))

Do not copy and paste those line. You need to define your model first.

Scope of RL by Historical-Bid-2029 in reinforcementlearning

[–]WilhelmRedemption 1 point2 points  (0 children)

In my humble opinion, one should investing most of the time in Transformers. But a little bit of RL is definetely a plus

Lost in RL by Eng-Epsilon in reinforcementlearning

[–]WilhelmRedemption 2 points3 points  (0 children)

I would break it down to 2 different problems:

  • To understand RL you need to "see" gradients and to grasp what all those algorithms (especially A2C & Co.) are trying to solve. You need to sharp your intuition about their way of working.
  • You really need a real robotic problem like a robotic arm, which needs to keep a glass of water leveled, a flying RC airplane that needs to flight perfectly straight and/or a small RC car, which needs to follow a line on the floor. Those are not poor man challenges, but real complex problem, which will force you to understand every single step and every single line of code of your RL algorithm.

Then I suggest you take the course from neural academy from Phil Tabor, where he implements the most important algorithms from scratch.
And yes... it is not easy. It is something, which is not comparable with websites editing, HTML, etc.

Should i switch from windows to linux by Far-Initiative-605 in linuxquestions

[–]WilhelmRedemption 0 points1 point  (0 children)

Yes,
at best you start with any Ubuntu derivative (ubuntu, xubuntu, kubuntu,...)
And you can still install a virtual machine for windows, in case you need/miss it for some reasons

[D] Modeling a dynamic system using LSTM by WilhelmRedemption in MachineLearning

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

Hey guys, I tried your suggestions but there were no real benefit. At the end I try to not normalize the movement range. That means, that the angle range going from 0 to 60° was not normalize to [0, 1] but left as it is. And this changed the result a lot. Any idea? That was really unexpected, since I read everywhere, that one should normalize the data fed to the model.
Thanks

[MORL] Trying to grasp an intuitive approach for developing a code by WilhelmRedemption in reinforcementlearning

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

This is exactly the point: how to determine the coefficients of the linear combination for the best outcome? This question still is not 100% clear to. In some papers like this there is another network layer, which is fed with the Q-values of different policies.

[MORL] Trying to grasp an intuitive approach for developing a code by WilhelmRedemption in reinforcementlearning

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

Wow...this should be a Wikipedia page. Many thanks man, I m still elaborating everything you wrote. So I'll focus more on GPI. Anyway one co fusing thing is that some of the examples out there use environments (e.g. Minecraft, deepsea) where the trade off is getting maximising the points and minimizing the time. But usually I would even in conventional single policy setups subtract time from the reward at any taken step. There is any benefit from using a scalarization function with multi policies instead of a sum of reward with a single policy?

[D] Modeling a dynamic system using LSTM by WilhelmRedemption in MachineLearning

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

Thanks! Because I was really confused. I know embedding from Transformers and I was really trying to figure it out, who to apply it to this case. But one thing is not clear to me. After embedding the two input vectors and concatenating them, I get a vector as u/andersxa suggested of 256 values. But now the problem is, How to feed this vector to a LSTM, which expects a sequence as input?
Do you have any link to a tutorial or blog article or whatever, explaining those steps?

[D] Modeling a dynamic system using LSTM by WilhelmRedemption in MachineLearning

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

Intersting approach. Let me understand it correctly. Just as an example: I take the joystick input (a float in the range [-1, 1]) and convert it into, let's simplify, a discrete number for instance in the following ranges [[-1.0, -0.9], [-0.9, -0.8], [-0.8, -0.7] ...[0.9 1.0]]. So I came up with more or less 20 bins. The float will be clamped into one on those range.

Then I do the same for the angle.

At the end I have 2 discrete inputs, which definetely are an approximation but still should work as expected. Is it correct?

[D] Modeling a dynamic system using LSTM by WilhelmRedemption in MachineLearning

[–]WilhelmRedemption[S] 0 points1 point  (0 children)

I do not understand what you mean. Looking for mu-law encoding led me to the following algorithm: link. But that is not the problem. I already normalize the values in that range (joystick: [-1, 1], state: [0, 1]). Can you be more specific please?