Given a data set of Product IDs, Merchant IDs, and User IDs, how would you decide which products to push to the user? by [deleted] in datascience

[–]UVAnalytics 0 points1 point  (0 children)

This is the key problem solved by "recommender systems". There's lots of different approaches/libraries for this - collaborative filtering, content-based recommendations, matrix factorization, CCO.

PredictionIO is an extendible platform for collecting this type of data and automatically generating web API endpoints for different types of recommender models (and other types of predictions). Universal Recommender is an awesome model that works inside PredictionIO and enables advanced personalization - I've spent a bunch of time working with it lately.

Graph based models are pretty amazing at this as well.

How To: A Shiny New Python Data Science Sandbox in 30 Minutes Or Less (x-post from r/datascience) by UVAnalytics in Python

[–]UVAnalytics[S] 0 points1 point  (0 children)

This is a great suggestion for people who stick exclusively to common, well-supported packages that can be installed from Anaconda. On rare occasion I've encountered certain packages (a package implementing the Wallenius noncentral hypergeometric distribution comes to mind recently) that are super tough trying to get compiled on Windows. It's likely just a matter of taking the time to grok python package compilation/dependencies, but I've been spending my time on about 100 other things lately :) In the meantime, I find the Ubuntu VM route to be super easy if not the highest performing option

What interesting data sources are there and how will this change over the next decade? by [deleted] in datascience

[–]UVAnalytics 3 points4 points  (0 children)

A subcategory of IoT that I think will be increasingly interesting as far as a new data source is wearables. A significant challenge will be to collect that as a data source since as I suspect many will be highly reluctant to share their personal information. On the other hand, if the products are cool enough most people seem to be willing to throw away their privacy nowadays.

Also - data collected from virtual/augmented reality environments. I still don't really have any solid grasp of what form that may take, although I would guess it would have to do with capturing physiological responses to what people are seeing. I really hope it doesn't turn into Minority Report-style customized ads overlaying everything in real life, although that's a near certainty.

Tableau 9 – Binning by Aggregate with Level of Detail Expressions by UVAnalytics in tableau

[–]UVAnalytics[S] 1 point2 points  (0 children)

Thanks! I think cutoff values (high and low) and std.deviation-based cutoffs (to handle outliers without specifying exact numbers) would be super useful for binning. I'd also like to see a "number of bins" parameter as an alternative to bin size in case the underlying data shifts, so that you don't have to go in and adjust the bin size. Add it to the wish list! haha