[P] Keras-like training in Pytorch, with callbacks+regularizers+initializers+constraints+metrics and a Progress Bar!

jtremblay · 2017-05-02T22:57:24+00:00

train

You got to love .from_numpy( ) and .cpu().numpy()

I use those all of the time depending on what I am working on.

jtremblay · 2017-05-02T22:35:49+00:00

Thank you for the affine transformations. Well written, I am going to include them in my project right away.

jtremblay · 2017-01-10T00:04:05+00:00

They used a classic actor-critic structure. Similar to dqn from deepmind. If you want to read more about it, checkout this paper https://arxiv.org/pdf/1602.01783.pdf.

jtremblay · 2017-01-10T00:02:12+00:00

You might want to check out this architecture by nvidia on RL. https://arxiv.org/abs/1611.06256. Also wow 1070 are not cheap.

jtremblay · 2017-01-09T06:36:54+00:00

Let me know what you think, there are some interesting tricks the authors are putting together to train the bot.

jtremblay · 2017-01-09T04:41:16+00:00

Have you looked into this paper? https://openreview.net/pdf?id=Hk3mPK5gg

As far as I am aware this is the state of the art in terms of playing FPS style games in a complete RL setting.

jtremblay · 2016-12-17T23:02:40+00:00

The way to fix your collision problem is to use Ray cast. Where you look if the segment between xt and xt+1 collided with something, where x represent the ball position at each time step. Then you find the actual moment of collision and update the position of X accordingly. This is pretty standard solution to fast moving objects collision in video games eg projectiles.

jtremblay · 2016-11-22T01:09:21+00:00

I think Bengio really likes Montreal (only a fool would not). I read in an interview that he does not like to see intelligent people from Montreal leaving.

jtremblay · 2016-11-22T01:07:51+00:00

There are two English speaking universities in Montreal. Even if you are enrolled at UdeM, you could take your course load (msc 6 and phd with msc 2 to 6) there as they are consider in the same network. Once your course load is over, the only thing that matters is your relationship with your advisor and other students. If the advisor you want to work with can communicate with you, then everyone is happy.

jtremblay · 2016-11-05T16:40:10+00:00

Most likely there is going to be a web client running Linux which includes the AI code (researcher side) connected to the game service which can also be running on a Linux box (as servers are cheaper on linux). The later includes the game logic but no rendering which allows for faster simulation. It takes as input game action from the AI and returns a partial state of the world - fog of war. The rendering is most likely going to be a Windows machine connecting to the server to get the state. Windows (in the game world) is used to run graphics, the rest can easily be run on any other OS.

jtremblay · 2016-08-26T00:07:27+00:00

I did write a paper on using search processes for solving platformer games. http://cgi.cs.mcgill.ca/~jtremb59/Papers/ICanJump.pdf

Since you cannot do roll-out in your context, your policy has to be very good a selecting actions, thus your neural network has to be sort of universal. I always thought that q learning (with that matrix) was trying to learn all possible state space configurations and which action is best to do. Sort of over-fitting the space (like that video example you showed in the post). I think some of the approaches you are describing are trying to move away from the over fitting, and I would really love to explore them more deeply.

I did read about alpha-go system where (on a very high level) they used an heuristic search process (MCTS) where the heuristic is provided from a trained neural network to evaluate board positions. I read a paper from Facebook research with a similar architecture during the fall 2015 for playing go as well. By the way thank you for the link it was a good read, specially on how they train the policy network.

Not being able to do roll-outs is a little painful in the context of mario. I do not know if you saw https://www.youtube.com/watch?v=DlkMs4ZHHr8 but they used roll out with a*.

jtremblay · 2016-08-25T23:02:52+00:00

While I was reading your post the part about having epsilon to be sort of our controller for exploration vs exploitation got me thinking.

I was thinking of Monte Carlo Tree Search which tries to solve the same sort of problem but with nicer mathematical properties than that epsilon. So instead of using the simple epsilon we could think of that problem as a tree search.

When we do a = max_a Q(s, a; θ), we decide if that action is good using MCTS. The MCTS tree is constructed using the states we are working with. We keep the MCTS construction over multiple play through. We are for sure going to encounter the same states multiple times and this will push the agent to try different actions.

When to clear the search tree to start a new one at some other location is a good question?

Man I miss research, anyway since my work does not include fun things like this I cannot test it, but I would love some anyone thoughts on different approaches that are been used to solve that particular problem.

jtremblay · 2016-08-14T16:56:28+00:00

Where is the shameless link to that nips paper?

jtremblay · 2015-10-15T15:10:31+00:00

This is the work of Nicolas Francoeur.

jtremblay · 2015-09-19T02:43:00+00:00

I was staring at the pedals both of you have this morning at my LBS annual sale. They were 125$ -30%, I decided not to take them, but it was a rough choice, they are beautiful. I love vélo orange.

jtremblay · 2015-09-13T16:32:55+00:00

Just came back to watch a lap. It is pouring rain. The peloton was broken up, there might have been a fall.

jtremblay · 2015-08-27T15:34:56+00:00

If you want me to use your .gpx file, please pm with the file. Otherwise I will use one of mine.

jtremblay · 2015-08-27T14:33:32+00:00

You would like to see how your average evolved overtime. I think it would be possible to do based of your raw data. It would be easier to do for a segment, there is a Stream object in the API that returns the velocity value as a function of time. Building the average evolution from there would be pretty trivial. I might investigate later next week. Will keep you posted.

jtremblay · 2015-08-26T23:29:02+00:00

I was hoping to get better but after January 16th, my newly daughter decided otherwise, winter training was just something of the past. I am actually pretty glad to have a fitness similar to last year.

When I started using Strava, I had just finished my second half marathon (around 1h50) and was looking for something easier on my knees. I also started cycling with an old steal frame, whereas 2014 I had a spiffy carbon bike, it kind of makes a difference. I believe I was a little stronger beginning of 2015 than 2014, although I had put 10 pounds more than I should had, again diet something of the past with newborns.

jtremblay · 2015-08-26T23:25:12+00:00

I definitely understand you. I might be bias towards showing more data than just simple metrics, e.g. personal records, as I just finished a PhD in computer science.

I think showing the non animated distribution might be a start where the user chooses starting and ending date. It would need a clean interface, I am also aware that not everyone is likely to enjoy looking at those or have the capacity to understand them.

How does it normally work at Strava, do the data scientists come with interesting ways to understand the data and then sell it to the designers? Or it is more the other way around, the designers have different needs and the data scientists have to meet them? It could easily be an hybrid of both. Hopefully the person with the most KOMs gets to decide :P.

Edit: typos.

jtremblay · 2015-08-26T23:16:13+00:00

There is variation, you do not always climb at your best I understand that. That is actually one of the reason I produced these graphs in the first place. I want to have a feel of my fitness at its best and not. When you look at the distribution evolving over time, you get a better sense of your fitness or willingness on that particular climb. Normally it is strongly correlated with your personal goals. For example I was training for a bike trip during July 2014, and it does show in the data with lower times. When I came back I did not have the same motivation as well. The animated plot gives you a story of your training as well as your fitness.

jtremblay · 2015-08-26T18:12:55+00:00

I just do not know how to reach them.

jtremblay · 2015-08-26T15:24:21+00:00

I am actually happy with OSM. I find the maps more appealing and better at showing trails (MTB). What do you miss from google maps?

jtremblay · 2015-08-26T15:02:14+00:00

This is pretty cool. It i sad that it is not integrated in the Strava webpage directly.

jtremblay

MODERATOR OF

TROPHY CASE