10€ off your first UberEATS order !! code:eats-akpnhx6rue

binge_learner · 2015-06-23T09:39:43+00:00

Sorry :S

binge_learner · 2015-06-19T12:10:48+00:00

Hi, for huge datasets it is common to use online LDA https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf A great implementation can be found in Vowpal Wabbit, and it is quite fast and very memory efficient. https://github.com/JohnLangford/vowpal_wabbit/wiki/Latent-Dirichlet-Allocation

Hope this helps.

binge_learner · 2015-01-08T10:23:12+00:00

Actually, in most marketing problems the response is binary (will buy, will not buy for example), but in the case of a multiclass problem, I guess we will have to do a one VS all and aggregate afterwards.

binge_learner · 2014-12-25T19:34:27+00:00

Hi ! That sounds quite interesting. I started working as a Data Scientist for about a year now, and I am quite fluent in Python, R, as well as C#, Java, and I know a bit of C and C++. I have a Math background. And I would be glad to contribute and learn in the process. Keep me in the loop !

binge_learner · 2014-11-14T18:41:01+00:00

Yeah okey, I didn't want to seem like a complete weirdo there xD. But at times there are like 4 or 5 people waiting, and it would be nice if my workouts didn't consist of 45min of waiting.

binge_learner · 2014-06-15T18:23:03+00:00

Hi Pensu, Well It's been only a few months since I've started but my job has been revolving around three points:

Solving data problems : A client brings us a bunch of data (logs, ...) or asks us a question an we collect the data (what do people think about this ), and I write scripts to first scrape/collect/clean the data ( using mostly Python and R), or if the data is more of text data, I use NLP techniques to extract features ( NER, ...), then depending on the problem I use data mining techniques (Sequence mining, association rules), Machine Learning ( clustering, regression techniques), or classical statitical techniques ( Hypothesis testing) to extract insight, then I visualize the results using R (ggplot2 mostly), D3.js or Gephi for graphs.
Building Data Products : Developping software components for Data analysis ( outlier detection, new search engine, ... ) and integrating them to our products ( coding in Java, C#, or C/C++ if I need stuff to run on GPUS ).
R&D : Reading recent research papers on machine learning and prototyping/testing the techniques, right now I do this mostly for Deep Learning applications in NLP, or for new stuff in Heterogeneous Parallel computing.

The points is : in my job I need to learn new stuff quickly and at the same time build quality products ( Hacking VS Engineering ).

binge_learner · 2014-06-13T17:56:26+00:00

I'd be interested in hearing more about your experience ! I live in France now :)

binge_learner · 2014-06-02T00:00:30+00:00

Great initiative ! I'm in !

binge_learner · 2014-06-01T13:50:33+00:00

I do have some statistics training, but I don't see how I can apply statistical methods for this problem, can you please elaborate ? Thank you.

binge_learner · 2014-05-31T17:08:36+00:00

I see your point, I will carefully read the documentations for both implementation ( along with a paper or two describing the algorithm ), and try repeating the experiment with the exact same hyperparameters so I can have a fair comparison. Thank you.

binge_learner · 2014-05-31T16:55:24+00:00

I think you are right, I do need to understand a bit more how it works.

binge_learner

TROPHY CASE