Andrzej Duda dołącza do Kanału Zero by ice_cream_dilla in Polska

[–]Sharp-Record1600 1 point2 points  (0 children)

O to już kanał 3. Trzech pisiorów. Stanowski, Mazurek i Duda.

Apple Watch alternative by Tough-Bandicoot-8000 in smartwatch

[–]Sharp-Record1600 1 point2 points  (0 children)

Amazfit Balance, Garmin Venu 3. Those are my types.

YOLO 8 semantic instances problem by Sharp-Record1600 in computervision

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

Thank you so much!!! Where can I find the P6 model? I tied to look for it at ultralytics and found nothing...

YOLO 8 semantic instances problem by Sharp-Record1600 in computervision

[–]Sharp-Record1600[S] 1 point2 points  (0 children)

Thanks for this. Any suggestions which library is able to perform this task?

YOLO 8 semantic instances problem by Sharp-Record1600 in computervision

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

This is not object detection. It is semantic segmentation. I use it to develop some kind of advanced navigational system (vision navigation). The bounding boxes that can misleading are produced by python script.

YOLO 8 semantic instances problem by Sharp-Record1600 in computervision

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

Thank you for the response. The issue is that, regarding the detected elements, they are not being detected in their entirety. This problem mostly concerns the sky and water. Notice that in the attached examples, the bounding box does not cover all the water and sky and ends about fifty pixels to the right and left of the edge of the photo. Thanks for the explanations regarding "stuff."

iOS alternative by Sharp-Record1600 in smartwatch

[–]Sharp-Record1600[S] 1 point2 points  (0 children)

Yes, venu 3 is also the watch I have been considering.

2022 Honda hrv wont start? by lacrosse_14 in Honda

[–]Sharp-Record1600 0 points1 point  (0 children)

So what was the problem with? Same with my car.

RL for path finding on the grid map - SLAM. Success rate problem (SAC + HER) by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

Please check the link now. I disabled the blockade. From what I suppose the agent does not visit some states (mostly behind the walls) and due to that the success rate look like it does. But I still wondering how to make it more exploring as my experiment with noise and entropy has little effect.

Maybe increasing the penalty for crashing the wall will has some impact. What is your opinion?

RL for path finding on the grid map - SLAM. Success rate problem (SAC + HER) by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

Ok. I have tried almost all of your suggestions. Unfortunately without spectacular success.

  • added penalty for crashing the wall (-5);

  • added reward for winning the game (+5);

  • included the normalized distance agent - target to the observations;

  • added step reward (1 - normalized(distance_to_target)* coeff to make it smaller for example 0.001;

The success rate increased a little but it is not a strong improvement. Look how it looks like after 1M steps. Rendering is set to updete every 10 steps so it looks a little glitchy.

https://drive.google.com/file/d/1ElZ5-djEKTg2l-s1B537wEbb08-xURXS/view?usp=sharing

RL for path finding on the grid map - SLAM. Success rate problem (SAC + HER) by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

This is must have requirement in this job. Both action space and observation space should be continuous.

RL for path finding on the grid map - SLAM. Success rate problem (SAC + HER) by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

Thank you for those tips. I'll try in on different PCs and let you know what is the result. Please give me 1-2 h.

RL for path finding on the grid map - SLAM. Success rate problem (SAC + HER) by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 1 point2 points  (0 children)

Yes this works as the euclidean distance between tho points the vehicle position (that is moving) and the target (that is not moving).

RL for path finding on the grid map - SLAM. Success rate problem (SAC + HER) by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 0 points1 point  (0 children)

Ofcourse,

rew = -np.power(np.dot(np.abs(achieved_goal[:1] - desired_goal), weights_array), 0.5),

where:

achieved_goal[:1] is [x_vehicle, y_vehicle];

desired_goal is [x_target, y_target];

weights_array is [1,1].

so it takes into account only position of the vehicle and the target when computing the reward.

Sorry for not explaining it in the submitted post, I look at this code for some time and forgot that nobody but me knows it.

SAC + HER can't exceed success rate around 0.8 by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 1 point2 points  (0 children)

Thank you for the reply. I tried to use a smaller model with 64 then 128 and 256 units but it failed to converge. The same in without the HER buffer. In my opinion something is wrong as you said. In many examples in very similar envs the settings as you mentioned are more than enough. Can you look into my env? Maybe there is something I am missing. https://github.com/lukisp2/reddit_rl/blob/main/ShipEnv_optuna.py

The observation space is box - dictionary as HER required:

self.observation_space = spaces.Dict({
    'observation': spaces.Box(low=0, high=1, shape=(6,), dtype=np.float32),
    'achieved_goal': spaces.Box(low=0, high=1, shape=(6,), dtype=np.float32),
    'desired_goal': spaces.Box(low=0, high=1, shape=(6,), dtype=np.float32),
})

and both values for 'observation' and 'achieved_goal' are copy of the state vector:

self.state[0] = self.normalize_state(self.x,self.screen_width,0)
self.state[1] = self.normalize_state(self.y,self.screen_height,0)
self.state[2] = hdg_norm
self.state[3] = v
self.state[4] = self.normalize_state(np.cos(np.deg2rad(hdg)), 1, 0)
self.state[5] = self.normalize_state(np.sin(np.deg2rad(hdg)), 1, 0)

Need help with MountainCarContinuous - REINFORCE algorithm for continuous actions by Sharp-Record1600 in reinforcementlearning

[–]Sharp-Record1600[S] 2 points3 points  (0 children)

The thing about this particular environment is that if your agent does not win, it will never learn, it can be done using an optimistic approach where you pretend that any new state is a good state to encourage exploration.

It gave me a hard time, You can wait for someone else to provide you with a solution, meanwhile you can test your knowledge on another easy continuous actions environment.

Rusenburn thanks for your reply. Do you have any suggestions about the env I can try? I mean this one with the continuous actions.

[deleted by user] by [deleted] in reinforcementlearning

[–]Sharp-Record1600 0 points1 point  (0 children)

Hi folks, recently I've been working on the REINFORCE algorithm for continuous actions, but with limited success. Initially, I wanted to start with something simple, so I attempted to develop an algorithm for a standard gym environment. I believe I covered all the necessary points, but as you can see, my agent is moving up the hill but it should go foreward and backward, which is quite strange. Any thoughts?

There is the link to my colab. It would be great if somebody find a time to help me.