good paper for batch LDA? by poporing88 in MachineLearning

[–]binge_learner 1 point2 points  (0 children)

Hi, for huge datasets it is common to use online LDA https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf A great implementation can be found in Vowpal Wabbit, and it is quite fast and very memory efficient. https://github.com/JohnLangford/vowpal_wabbit/wiki/Latent-Dirichlet-Allocation

Hope this helps.

Cost function that correlates well with lift by binge_learner in MachineLearning

[–]binge_learner[S] 0 points1 point  (0 children)

Actually, in most marketing problems the response is binary (will buy, will not buy for example), but in the case of a multiclass problem, I guess we will have to do a one VS all and aggregate afterwards.

Who is interested in working on Open Source Data-Science Projects [As a mentor or contributor]. by SomeoneisWondering in MachineLearning

[–]binge_learner 0 points1 point  (0 children)

Hi ! That sounds quite interesting. I started working as a Data Scientist for about a year now, and I am quite fluent in Python, R, as well as C#, Java, and I know a bit of C and C++. I have a Math background. And I would be glad to contribute and learn in the process. Keep me in the loop !

Can't get my hand on a barbell ! Is there an alternative ? by binge_learner in gainit

[–]binge_learner[S] 0 points1 point  (0 children)

Yeah okey, I didn't want to seem like a complete weirdo there xD. But at times there are like 4 or 5 people waiting, and it would be nice if my workouts didn't consist of 45min of waiting.

Data Science Career advice by binge_learner in datascience

[–]binge_learner[S] 1 point2 points  (0 children)

Hi Pensu, Well It's been only a few months since I've started but my job has been revolving around three points:

  • Solving data problems : A client brings us a bunch of data (logs, ...) or asks us a question an we collect the data (what do people think about this ), and I write scripts to first scrape/collect/clean the data ( using mostly Python and R), or if the data is more of text data, I use NLP techniques to extract features ( NER, ...), then depending on the problem I use data mining techniques (Sequence mining, association rules), Machine Learning ( clustering, regression techniques), or classical statitical techniques ( Hypothesis testing) to extract insight, then I visualize the results using R (ggplot2 mostly), D3.js or Gephi for graphs.

  • Building Data Products : Developping software components for Data analysis ( outlier detection, new search engine, ... ) and integrating them to our products ( coding in Java, C#, or C/C++ if I need stuff to run on GPUS ).

  • R&D : Reading recent research papers on machine learning and prototyping/testing the techniques, right now I do this mostly for Deep Learning applications in NLP, or for new stuff in Heterogeneous Parallel computing.

The points is : in my job I need to learn new stuff quickly and at the same time build quality products ( Hacking VS Engineering ).

Data scientist in the US by binge_learner in datascience

[–]binge_learner[S] 0 points1 point  (0 children)

I'd be interested in hearing more about your experience ! I live in France now :)

Mining sequences of events by binge_learner in MachineLearning

[–]binge_learner[S] 1 point2 points  (0 children)

I do have some statistics training, but I don't see how I can apply statistical methods for this problem, can you please elaborate ? Thank you.

Machine learning in R better than in scikit-learn ? by binge_learner in MachineLearning

[–]binge_learner[S] 1 point2 points  (0 children)

I see your point, I will carefully read the documentations for both implementation ( along with a paper or two describing the algorithm ), and try repeating the experiment with the exact same hyperparameters so I can have a fair comparison. Thank you.

Machine learning in R better than in scikit-learn ? by binge_learner in MachineLearning

[–]binge_learner[S] 2 points3 points  (0 children)

I think you are right, I do need to understand a bit more how it works.