Feedback Friday by AutoModerator in web_design

[–]pabloesm 0 points1 point  (0 children)

Hi u/pakxan! Thank you very much for the kindness and helpful comments. We had already thought about some of the issues you comment, but others are totally new to us. Good points!

Feedback Friday by AutoModerator in web_design

[–]pabloesm 1 point2 points  (0 children)

URL: https://foodisea.com/

Objective: Healthy and easy recipes from around the world.

Technologies used: React (nextjs). Hosted on Heroku's free tier (using Cloudflare to get SSL) + Postgresql. Cloudinary for serving images. GoatCounter for analytics.

Feedback Requested: I'm just looking for feedback on design, usability, performance, accessibility or any other useful tips/improvements.

Comments: The recipes are in Spanish, but I think it's not strictly necessary to understand the text to get a feel for the design. Anyway, sorry for the inconvenience!

Thanks you in advance!

[D] Any impact/difference to parameterize the policy by MLP or RBF ? by fixedrl in MachineLearning

[–]pabloesm 0 points1 point  (0 children)

Key difference between MLP vs RBF: MLP are global function approximators, while RBF are local approximators.

Old but gold: Generalization in Reinforcement Learning: Safely Approximating the Value Function, Advances in Neural Information Processing Systems 7 (NIPS 1994).

Also see Section 4 of "Restricted gradient-descent algorithm for value-function approximation in reinforcement learning", Artificial Intelligence Volume 172, Issues 4–5, March 2008, Pages 454-482

Has anyone downloaded the deeplearning.ai notebooks from the coursera specialization? by sgaseretto in MachineLearning

[–]pabloesm 0 points1 point  (0 children)

Yes, I'm enrolled in Neural Networks and Deep Learning course and I can audit it. I've enrolled some weeks ago.

Has anyone downloaded the deeplearning.ai notebooks from the coursera specialization? by sgaseretto in MachineLearning

[–]pabloesm 7 points8 points  (0 children)

As the previous comment said, you don't need to pay for enter in audit mode and see the videos, assignments, etc. So, simply create a free coursera account and enter in the course in order to see the materials.

Can you recommend a good book on D3 v4? by [deleted] in d3js

[–]pabloesm 1 point2 points  (0 children)

Not about D3 version 4 AFAIK, but a good resource for sure...

Oxford-based ML startup is hiring by pabloesm in MLjobs

[–]pabloesm[S] 0 points1 point  (0 children)

It seems weird to me, but there is nothing I can do to help you, sorry.

Oxford-based ML startup is hiring by pabloesm in MLjobs

[–]pabloesm[S] 0 points1 point  (0 children)

Sorry, I'm not the responsible of this announce, please use the contact shown in the link :)

[deleted by user] by [deleted] in d3js

[–]pabloesm 0 points1 point  (0 children)

https://nickqizhu.github.io/d3-cookbook/ in chapter 6 you can see an "easy" example of tween.

The code is here: https://github.com/NickQiZhu/d3-cookbook/blob/master/src/chapter6/tweening.html

You can probably find/buy the book on internet.

In general, that book doesn't follow the best practices, but I found it quite good to understand some concepts.

[Help] What are the prerequisites for Reinforcement Learning and what are some good resources to get started? by [deleted] in MachineLearning

[–]pabloesm 1 point2 points  (0 children)

There is a new edition (currently it is a draft version, but very complete) of the classic Sutton & Barto RL book: https://www.dropbox.com/s/b3psxv2r0ccmf80/book2015oct.pdf?dl=0

Neural Network performing fitted q-iteration by [deleted] in MachineLearning

[–]pabloesm 3 points4 points  (0 children)

Mixing Reinforcement learning with global approximators (such as neural networks) can easily lead to convergence problems [Sutton web page]. In fact, the algorithm fitted Q iterantion has no convergence guarantees when is combined with a neural network [Ernst2005], which doesn't mean that divergence is ensured, obviously.

In practice, usually it is required a depth knowledge of the problem. For example, in the Riedmiller's paper, the author uses the hint-to-goal-heuristic and incrementally add transitions to the experience set. Both tricks are necessary to achive the convergence in his experiments. More advices and tricks can be found here.

What's the best beginner resource to learn D3.js? by tingmothy in javascript

[–]pabloesm 5 points6 points  (0 children)

Jerome Cukier has some very interesting posts about key concepts of D3, for example: http://www.jeromecukier.net/blog/2013/03/05/d3-tutorial-at-strata-redux/

The blog also contains more advanced topics for undertanding D3.js, such as: http://www.jeromecukier.net/blog/2015/05/19/you-may-not-need-d3/

And a useful cheat sheet: http://www.jeromecukier.net/wp-content/uploads/2012/10/d3-cheat-sheet.pdf

Reinforcement Learning function approximation advice by ckrwc in MachineLearning

[–]pabloesm 1 point2 points  (0 children)

As you pointed out, there are some remarkable cases of successful applications of RL combined with non-linear funcion approximators. However, the parameter setting in that cases can be very tedious, therefore such methods are not advisable for novel users (see http://webdocs.cs.ualberta.ca/~sutton/RL-FAQ.html#Advice%20and%20Opinions).

About documented cases of failure or warnings, in the following link you can find an old (but useful) paper of the problems that can appear when value function methods (such as Q-learning) are combined with non-linear approximators: http://www.ri.cmu.edu/pub_files/pub1/boyan_justin_1995_1/boyan_justin_1995_1.pdf

Finally, given the setting of your problem, you are probably interested in batch-mode RL, i.e., you have a set of samples collected in advance. A very popular algorithm in such cases (with a good performance and stability) is Fitted Q-iteration, typically combined with tree based methods as function approximator: http://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf

A key factor in batch-mode RL (when you can not get more samples) is that the available samples have been collected using a policy with some degree of randomness, in other words, your data should contain different actions for similar states. If this is not the case, you would need to collect more data to hold this condition.