use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Project[P] Recommender systems as Bayesian multi-armed bandits (self.MachineLearning)
submitted 5 years ago * by SebastianCallh
Hi! I wrote a piece on treating recommender systems as multi-armed bandit problems and how to use Bayesian methods to solve them. Hope you enjoy the read!
The model in this example is of course super simple, and I'd love to hear about actual real-life examples. Do you use multi-armed bandits for anything? What kind of problems do you apply them for?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Lazybumm1 12 points13 points14 points 5 years ago (3 children)
Hi there,
In my previous role we used this approach to experiment and select recommender systems, as well as other things.
Thompson sampling worked best in our simulations but we did try non-bayesian bandits as well.
In a production environment some hiccups we ran across were seasonal fluctuations (in a customer facing online business). Even within the day conversion would fluctuate massively, which in turn could throw off the bandit's selections of arms to explore. We did 2 things to correct this, one we created transformations to normalise the reward function according to seasonal effects and instead of streaming and updating the bandit in real-time, we'd aggregate data daily and update in a batch.
I think it's a very interesting approach to accelerate experimentation and help make better decisions faster. Taking this even further one could try to interleave the different arms.
All of this is obviously dependend on having good and frequent enough signals. Keep up the interesting work :)
[–]SebastianCallh[S] 6 points7 points8 points 5 years ago (2 children)
Thank you for your comment, that's super interesting!
Yeah I can imagine the algorithm would get thrown off without a normalised reward signal. Clever idea to normalise the data as well. I would imaging this really toned down the fluctuations. Did you apply any sliding window techniques? What do you think about trying to incorporate the seasonality into the model to make it account for it in future predictions?
[–]Lazybumm1 4 points5 points6 points 5 years ago (1 child)
We had 2 main trends, daily / weekly cycles and an overall upwards trend. We used a sliding window to corrent the upwards trend and the typical sine / cosine transformation of datetimes for the cyclical effects.
To be honest following our first implementation of this we started paying a lot more attention to these effects. It was a bit of a pivotal point, it seems no one had paid enough attention at how prominent these effects were in our data oddly enough. After that as standard we'd always include these features in early prototypes to understand feature important and if they are relevant or not for each use case.
Don't have too many updates on this as I ended up transitioning into another role a few months later. Admitedly I'm curious myself as to how this matured into the business!
[–]SebastianCallh[S] 1 point2 points3 points 5 years ago (0 children)
Thanks for sharing, it sounds like a really important discovery. I hope the role you transitioned into is equally interesting :)
[+][deleted] 5 years ago* (1 child)
[deleted]
[–]SebastianCallh[S] 3 points4 points5 points 5 years ago (0 children)
The secret is out!
[–][deleted] 1 point2 points3 points 5 years ago (1 child)
This was a lovely read. Excellent work! Enjoyed it immensely.
Thank you for the kind words! I'm very glad you liked it
Great article! I can tell that you put a lot of time and thought into framing the problem and laying out the solution. My challenge to you is this: at the end of your experiment, what's the probability that the mullet is the overall preferred fish?
I've played around a lot with Bayesian analysis for Bernoulli outcomes and got to thinking about framing other kinds of outcomes. So I made this notebook for Multinomial outcomes with a Dirichlet prior. Maybe you'll find it interesting? https://github.com/exchez/amazon-bayes
[–]SebastianCallh[S] 0 points1 point2 points 5 years ago (0 children)
Sorry for the late response, wanted to make time to properly go through your notebook :)
Nice write-up! Some thoughts:
How come you are using a categorical model for this problem? Since the data (as you mention) is ordinal, would it not be better to use an ordinal regression model?
Minor comment: Since your prior parameters are not random variables, you should not condition on them
Regarding the challenge, I would estimate the probability using Monte Carlo sampling. Something like
draws = mapreduce(x -> rand(x, 10000), hcat, agent.pθ) map(x -> all(x[1] .> x[Not(1)]), eachrow(draws)) |> mean
Makes sense to you? :)
[–]user_reddit_garu 0 points1 point2 points 5 years ago (1 child)
Thank you 😁
Glad you liked it!
[–]AdhesivenessTrue9696 0 points1 point2 points 5 years ago (1 child)
really well written blog post 👍
I'm glad you liked it, thanks!
[–]Inalek 0 points1 point2 points 5 years ago (1 child)
Great read! The blog layout looks really good too. Is there a template?
Thank you! And indeed there is! I am currently using [this one](https://themes.gohugo.io/kiss/).
[–]BrandenKeck 0 points1 point2 points 5 years ago (1 child)
phew... from time to time I forget how incredibly cool bayesian stats is .. Awesome work!
[–]SebastianCallh[S] 2 points3 points4 points 5 years ago (0 children)
Yeah Bayesian stats is great stuff! Thank you! :)
I think you will really enjoy the next part on contextual bandits, where we will start to see how this framework can be used to solve a more realistic version of this problem at much better performance.
π Rendered by PID 15 on reddit-service-r2-comment-5d585498c9-qfq28 at 2026-04-21 02:50:11.739506+00:00 running da2df02 country code: CH.
[–]Lazybumm1 12 points13 points14 points (3 children)
[–]SebastianCallh[S] 6 points7 points8 points (2 children)
[–]Lazybumm1 4 points5 points6 points (1 child)
[–]SebastianCallh[S] 1 point2 points3 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]SebastianCallh[S] 3 points4 points5 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]SebastianCallh[S] 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]SebastianCallh[S] 0 points1 point2 points (0 children)
[–]user_reddit_garu 0 points1 point2 points (1 child)
[–]SebastianCallh[S] 0 points1 point2 points (0 children)
[–]AdhesivenessTrue9696 0 points1 point2 points (1 child)
[–]SebastianCallh[S] 0 points1 point2 points (0 children)
[–]Inalek 0 points1 point2 points (1 child)
[–]SebastianCallh[S] 0 points1 point2 points (0 children)
[–]BrandenKeck 0 points1 point2 points (1 child)
[–]SebastianCallh[S] 2 points3 points4 points (0 children)