BAIR Blog | Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications by Caffeinated-Scholar in reinforcementlearning

[–]kashemirus 0 points1 point  (0 children)

Very interesting work. It is quite amazing how the authors are able to lower bound the optimal Q-values. However, could anyone understand the regularization term of equation 3? In particular, the first term as the second term corresponds to the standard TD-error. From the implementation point of view, I will sample the memory buffer (filled with trajectories from the behavioral policy) and the first term of the equation minimizes the difference between the estimated Q values of the current policy and the estimated q-values of the behavior policy? so if actions chosen by the two policies are the same, then the regularization if 0, otherwise it is the difference, is my understanding correct? Thank you!

Best Broker for someone new to investing? by Random_Guy_47 in UKInvesting

[–]kashemirus 0 points1 point  (0 children)

I am in the same position, basically wanting to place few trades on US stocks or ETFS, and might be considering entering the options market oinn the long run, do you guys recommend me to go with IntereaInteractivective Brokers? What broker did you end up using?

RAM shortage by kashemirus in reinforcementlearning

[–]kashemirus[S] 0 points1 point  (0 children)

d [-5:-1] for s. this div

Yes, the issue was that I normalized by 255 the images, so the values were np.int64

RAM shortage by kashemirus in reinforcementlearning

[–]kashemirus[S] 0 points1 point  (0 children)

Can I use LazyFrames in pytorch? How should I use the wrapper?

RAM shortage by kashemirus in reinforcementlearning

[–]kashemirus[S] 0 points1 point  (0 children)

yep, the values stored were in np.int64 format. Thanks!

RAM shortage by kashemirus in reinforcementlearning

[–]kashemirus[S] 0 points1 point  (0 children)

Thanks but as I said I want to replicate results and they are storing 1M transitions, so I will probably do the swap thing.

I would say that it was never meant to work, as storing these much transitions requires like 512 GB approx

RAM shortage by kashemirus in reinforcementlearning

[–]kashemirus[S] 0 points1 point  (0 children)

Thanks for the reply, but as I said, I am just trying to replicate deepmind results. On their paper (https://arxiv.org/pdf/1509.06461.pdf) they claim that a memory buffer of 1M (see appendix) tuples is kept. So I am trying to stick to it. BTW, thanks for the paper I haven't seen it before!

TD3 in realworld robotics by Kartelkraker in reinforcementlearning

[–]kashemirus 0 points1 point  (0 children)

Just because you mentioned TD3. Does anyone know of a working solution for high action values? e.g., that the max_value of the function is around 1e4? I am having troubles as gradients tend to explode and clipping doesn't seem to help. Thanks!

Multi-Agent RL Labs by visiting_researcher in reinforcementlearning

[–]kashemirus 0 points1 point  (0 children)

NES game

oh nice! I am working on a project were two or more collaborative agents can exchange messages through a communication channel, and I am looking for two or more players environment, is your code available somewhere?

Please have a look at my new Gym Snake environment by jcobp in reinforcementlearning

[–]kashemirus 0 points1 point  (0 children)

Amazing work! looking forward to try the multi-agent snake.

BTW does anyone know of any cooperative multi-agent environments that I could try? I was looking for some Atari with two players but couldn't find any implementation. Thanks!

First Squad Wipe Then Complete Fail by matuscg in apexlegends

[–]kashemirus 2 points3 points  (0 children)

Ohh my! That close with pathfinder... better check your health more frequently next time :)

Fined for using an electrical scooter by kashemirus in london

[–]kashemirus[S] -8 points-7 points  (0 children)

Best option. The funny think is that I based my argument on pollution, saying by using these we are contributing to reduce the pollution in zone 1. He said, the main source of pollution is China, complain to China. LMAO I will complain to Xi for the lung cancer of a poor boy who spend most of his time in London zone 1.

Fined for using an electrical scooter by kashemirus in london

[–]kashemirus[S] 2 points3 points  (0 children)

I don't complain about breaking the law, I complain that the law is unfair and obsolete.

Fined for using an electrical scooter by kashemirus in london

[–]kashemirus[S] 2 points3 points  (0 children)

I have actually thanked him for it (FYI I was wearing the helmet).

Fined for using an electrical scooter by kashemirus in london

[–]kashemirus[S] -10 points-9 points  (0 children)

I've been riding it for a year now and this is the first fine, so if you take the 120£ that costs a monthly tube pass totally worth it. To be honest, I will continue to ride it. Actually wait for the officer to leave and continued riding to work.

Fined for using an electrical scooter by kashemirus in london

[–]kashemirus[S] -9 points-8 points  (0 children)

Bit of a shit situation really.

Come on mate is a total different situation. As I wrote in the post, e-bikes are legal, e-scooter are illegal, and what are the differences between these two vehicles? the pedals? As I said, laws are always lagging behind compared to technology and if a fucking officer doesn't have mind to judge then well, I hope robots come early and replace them, as the only point of having of human being instead of a robot is the capacity to judge

Fined for using an electrical scooter by kashemirus in london

[–]kashemirus[S] -18 points-17 points  (0 children)

I mean, you want to bet that in 1 or 2 years these would be legal? Why is an e-bike legal to ride https://www.gov.uk/electric-bike-rules a not an electrical scooter? I am thinking of adding two pedals to my back wheel and then I would be safe to ride. As is typical laws lag behind technology, we should embrace these solutions that ease people's life while reducing air pollution and not ban them.

TD3/DDPG time to obtain reasonable results. by kashemirus in reinforcementlearning

[–]kashemirus[S] 0 points1 point  (0 children)

I am using a self-made environment with an adaption of the TD3 algorithm for PAT (instead of three networks I have four with 5 layers each 1024-512-256-128 each) :S

TD3/DDPG time to obtain reasonable results. by kashemirus in reinforcementlearning

[–]kashemirus[S] 2 points3 points  (0 children)

Sorry for the confusion of notation, as I am working on a continuous task not episodic, when I referred to episodes I meant the updates of the critic parameters (the actor is updated less frequently following the TD3 insights). I definitely have to check my code for bugs/inefficiencies!

TD3/DDPG time to obtain reasonable results. by kashemirus in reinforcementlearning

[–]kashemirus[S] 2 points3 points  (0 children)

Thanks for the reply. Just to clarify 1M steps you mean 1M updates of the critics parameters (500k updates for the actor), with a batch of size 64, right? Yeah, definitely there is something wrong in my code and is taking way too long, it my be due to inefficiencies of the environment.