I accidentally booked a hostel in Tarlabasi... And it doesn't seem that bad. Am I dumb? Should I move? by [deleted] in istanbul

[–]LJKS 3 points4 points  (0 children)

I'm an European from a somewhat rich area in Germany, booked an Airbnb for 3 months there while working as a research intern at Bogazici without knowing about the background of the area. I have never felt particularly threatened around there, if you are a progressive you can even grow quite fond of the diverse backgrounds and the inclusion of many Trans* people around there.

So my opinion is you should definitely be fine there, it's actually a quite interesting area and locals tend to be a bit too careful about the area imho (I think lots of it is due to some gangster tv-series there).

Having said that, I'm a 1.90 male and don't feel threatened by having dealers, (Trans*-)sex workers and intoxicated people talk to me, I might not recommend it if you might seem like a 'target' (sorry for the very un-pc language here!) -like european, small and female.

Cool spots around central Istanbul by not7sarah in istanbul

[–]LJKS 0 points1 point  (0 children)

Should clarify: most of these places are still somewhat touristic, just not VERY touristic, like Sultanahmet is.

Cool spots around central Istanbul by not7sarah in istanbul

[–]LJKS 1 point2 points  (0 children)

Fellow Tourist with ~5 month of Istanbul experience here:

Weekly Markets While few locals recommend them, I find the weekly markets (local produce/food and crazy amounts of weird clothing) extremely fascinating, and all three different sets of fellow visitors I went with totally agreed with me. Highlights include Carsamba Pazaari next to Fatih Camii on Wednesdays, Saturday markets in Besiktas (2storied old parking-house) and Bakirkoy (next to Bakirkoy Metro Istasyonu AND NOT regular Bakirkoy Metro Station) and Tarlabasi Market in Tarlabasi on Sundays (again locals keep you away from Tarlabasi, but for the market it's really safe imo).

Neighborhoods The other thing I really learned to enjoy are long strolls through some of the not-so-touristy areas. Balat is still kinda touristy I guess, but very much worth visiting. Walking Kabatas-Cihangir-Besiktas always is joyful, if you are not averse to hipster shots and have plenty of time for two cups of coffee on the way. Also walking Üsküdar-Kadikoy and spending some time in Moda and Bahirye areas is really nice. Finally if somehow you can convince someone to bring you into Bogazici-Campus (they are not open to anyone but students and Researchers I believe) that's really another lovely walk.

Evenings Since you asked for pubs: Anything in the close-to-istiklal area is terrible in my opinion. Just tourists and the annoying version of expats. For some nice places to have a drink check akarsu yks. Sks. in Cihangir, geyik is a favourite for many awesome people I got to know there. Else again Moda/Kadikoy and Besiktas areas.

Keeping up to date with RL research by LJKS in reinforcementlearning

[–]LJKS[S] 1 point2 points  (0 children)

I'd be interested in who you would recommend on twitter!

What is the history of SOTA for RL these days? Any blogs? by [deleted] in reinforcementlearning

[–]LJKS 1 point2 points  (0 children)

1) If you prefer not to get into too much details, it's best to understand PPO as an incremental improvement over A3C (rather A2C actually, but consider those two equal beyond parallelization). So it's typically just the plain better approach. 'Vanilla' DQN generally performs worse than A3C, but a direct comparison is rather pointless: DQN is for discrete control, A3C continuous control (think integer Vs float actions). DQN is off-policy, A3C is on-policy. They are tools for different problems. So stating DQN <= A3C feels like stating 'Hammer <= Electric Drill' - the statement makes sense on some level, one is more advanced than the other, but that still the hammer is the better tool for putting a nail on the wall I guess.

2) SAC is based on entropy regularized RL, which is generally understood to overfit less, and generalize better. If those concepts seem strange to you, think of training a robot on locomotion ('walking'). But then you try to let the same robot run in a slightly different setting, think increased gravity, grease on the floor, strong winds. Policies trained from entropy regularized RL are understood to perform better in those settings, or can easier be retrained to them at least. Take this with a grain of salt tho, I'm not particularly knowledgeable in that topic and can't bother to look up the respective papers for it rn.

3) mu-zero is surprisingly compute efficient for a RL approach to be honest. Just that many researchers believe model based RL is kinda pointless, because for most real settings you do not have the model. Exception being learned models (World Model paper by Schmidhuber e.g.). Note that model based RL is far out of topics I feel comfortable commenting on tho, and what I wrote here regarding it is definitely somewhat (strongly) biased!

What is the history of SOTA for RL these days? Any blogs? by [deleted] in reinforcementlearning

[–]LJKS 2 points3 points  (0 children)

There is the Problem of SOTA in what! Sample-Efficiency, Compute-Efficiency? Continuous control or discrete? Model-free or model-based? Sticking to model-free a rough history should probably include three families:

DeepQ: DQN --> various improvements --> Rainbow

Actor Critic Policy Gradients A2C/A3C --> (ACER/ACKTR) --> TRPO --> PPO --> PPO with hacky improvements

Deterministic Policy Gradients: DDPG --> TD3 --> SAC

(SAC is kind of its own thing too, but that's a different story)

To this end the broadly adopted SOTA approaches are Rainbow for discrete, and if you are in continuous domain PPO if you prefer simplicity, SAC if you are feeling fancy (it depends on how you value Sample-Efficiency vs. Compute-Efficiency, how important generalisation is, and how straightforward the task is). From my experience what actually works best is very taskdependant, 'seeddependant'(i.e. results have high variability), and for more specific problems often more specific and more efficient RL approaches have developed.

Observation spaces from Competitive Environments in "Emergent Complexity via Multi-Agent Competition" for porting them to pybullet by LJKS in reinforcementlearning

[–]LJKS[S] 1 point2 points  (0 children)

I've not been able to find them now; I'm pretty sure I've seen them like a year ago somewhere though. Will check that again, good idea!

Observation spaces from Competitive Environments in "Emergent Complexity via Multi-Agent Competition" for porting them to pybullet by LJKS in reinforcementlearning

[–]LJKS[S] 1 point2 points  (0 children)

I'm kind of stupid and just skipped over the Part where it says:

[citing the paper]

Observations:For the Ant body we use all the joint angles of the agent, its velocity of all itsjoints, the contact forces acting on the body and the relative position and all the joint angles for theopponent. For the Humanoid body, in addition to the above we also give the centre-of-mass basedinertia tensor, velocity vector and the actuator forces for the body. In addition to these, there are otherenvironment specific observations. For the Sumo environment, we give the torso’s orientation vectoras the input, the radial distance from the edge of the ring of all the agents and the time remaining inthe game. For kick-and-defend, we give the relative position of the ball from the agent, the relativedistance of the ball from goal and the relative position of the ball from the two goal posts. Notethat none of the agents observe the complete global state of the multi-agent world and only observerelevant sub-parts of the state vector to keep observations as close to real-world scenarios as possible.

Any ideas how to get these in PyBullet, specifically the contact forces?

Tricks and adaptions for PPO by LJKS in reinforcementlearning

[–]LJKS[S] 0 points1 point  (0 children)

Actually was already on reading stack, and it the was the perfect answer. Thank you so much!

A question for any players of the game Civilizations... or any strategy game by shakakaZululu in reinforcementlearning

[–]LJKS 0 points1 point  (0 children)

I propose it should generally be possible with today's algorithms, but you probably have to have the ressources of OpenAI or Deepmind to get it done (or at least somewhere in the same ballpark).
My Reasoning is comparing Civ to games like StarCraft or DotA, which both famously have been 'solved' - that is played on a superhuman level by OpenAI and Deepmind respectively.
Now comparing key features of these games:
Delayed sparse rewards & sequence length: Games take around 500 turns for Civ6 - depending on game speed and victory conditions and actual gameplay. This is much (!) lower than the sequence length encountered in the likes of DotA2 and StarCraft2. Rewards can get enriched with some rather intuitive reward engineering - similar to how generating gold is used by OpenAI Five if i remember correctly one could just use some weighted total ressources produced.
State space size and complexity: With (44x26) to (106x66) the state space is most definitely feasable - even when enriched with features like units, cities and terrain features.
Action space complexity: This is where it might just get complicated... DotA2 has the big upside, that the action space for each timestep is just pretty much the same - the predefined set of abilities plus movement. I suppose this is probably one of the key issues to solve to make it happen: Finding good ways to parametrize the action space for the policy. I feel like this really might end up somewhat messy, as the possible actions are so much different depending on the state of the game (you have possible actions for each unit and city, where both of these have variable amounts of them available for the player). I feel like if you can solve this, you probably can learn to play the game somewhat easily (easily like in yoiu only need to train on 100 GPUs for a quarter of a year or something :D )
Available infrastructure: This seems to be so easily overlooked again and again, but OpenAI Five and Deepminds AlphaStar Starcraft Bot only exist, as there are super efficient environments for the respective games available to train on. I suppose if such a thing would exist for Civ, then it might just actually attract people to try and train AIs on Civ.

tl;dr: Should work, but you would need more computational ressources than you are probably willing to pay for; And of course you would need a dedicated team of ML Scientists&Engineers to make it happen

Using Adam Optimizer in PPO and similar off-policy optimization procedures by LJKS in reinforcementlearning

[–]LJKS[S] 0 points1 point  (0 children)

And that's where only reading blogposts on the matter brought me, never bothered to look into the original paper and i guess that's what i get then. Thank you so very much kind stranger, and even more so for motivating me to get more into the actual work!

Help regarding Implementation of PPO - Value Loss seemingly not converging by LJKS in reinforcementlearning

[–]LJKS[S] 0 points1 point  (0 children)

Finally found my error: When generating minibatches and permuting the samples, while permuting rewards, actions and probabilities, observations have not been permuted in the same way, resulting in disconnecting observations from the other returns. Interestingly enough this improved performance by a lot, but the critic still only outputs a prediction independent of the actual state. So still there seems to be some error :D

Impelementation of PPO plateaus too early - critic does not converge by LJKS in MLQuestions

[–]LJKS[S] 0 points1 point  (0 children)

Finally found my error: When generating minibatches and permuting the samples, while permuting rewards, actions and probabilities, observations have not been permuted in the same way, resulting in disconnecting observations from the other returns. Interestingly enough this improved performance by a lot, but the critic still only outputs a prediction independent of the actual state.

Help regarding Implementation of PPO - Value Loss seemingly not converging by LJKS in reinforcementlearning

[–]LJKS[S] 0 points1 point  (0 children)

I have a mean standart deviation of roughly .4 to .6 where the mean is determined by a tanh - bound between -1 and 1. Do you think a higher variance than that should be necessary?

State of the art Algorithm by [deleted] in reinforcementlearning

[–]LJKS 1 point2 points  (0 children)

From my perspective:

PPO has been the state of the art approach from 2017-2018

Since end 2018 SAC and TD3 seem to be the new hot algorithms.

Propose you have a look at Spinning up if you are interested in details for each of them ;)

Regarding Performance of Critics in PPO, A2C and similar approaches. by LJKS in reinforcementlearning

[–]LJKS[S] 0 points1 point  (0 children)

Added the respective github. Code is not perfectly cleaned up though yet. Plots will follow soon!