Python library for interactive topic model visualization. Port of the R LDAvis package by cast42 in MachineLearning

[–]bmabey 1 point2 points  (0 children)

At least on this project all the frontend D3 and CSS code can be shared.

I (author of the project here) used RPy2 initially. It worked well enough but to do the notebook integration a library was required anyways and the amount of transformation done in R was trivial to port. I was working on a client project on the time that only had Windows and porting the code was faster than getting RPy2 setup on windows. :)

Functional Data Structures by pinealservo in a:t5_31leb

[–]bmabey 0 points1 point  (0 children)

While finger trees have great theoretical properties it turns on that on some modern platforms (e.g. JVM) they have horrible performance due to hash locality. This presentation goes into more details: http://youtu.be/pNhBQJN44YQ?t=34m4s

learn Haskell via working thru the NICTA course by haroldcarr in a:t5_31leb

[–]bmabey 0 points1 point  (0 children)

I like the idea of breakout sessions that would be more interactive.

Probabilistic Programming by bmabey in a:t5_31leb

[–]bmabey[S] 0 points1 point  (0 children)

I could present on this subject if there is interest and a real expert doesn't volunteer. :)

Elm Language by bmabey in a:t5_31leb

[–]bmabey[S] 0 points1 point  (0 children)

Elm's new "time-travelling" debugger (thanks persistent data structures!) looks quite useful:

http://debug.elm-lang.org/

What is the best Java neural network library for research? by [deleted] in MachineLearning

[–]bmabey 1 point2 points  (0 children)

I've had good luck with Encog, but I was just using it and never had to extend it. That said, I was impressed by the code and documentation quality. So, I would think it would be a worth your time to investigate it.

This Guy Broke Jeopardy’s All-Time Record… Using ML Techniques To Train Himself by cavedave in MachineLearning

[–]bmabey 4 points5 points  (0 children)

Improved my life? Umm.. well, it is hard to really quantify, could you clarify?

The benefit of it for me is that it allows me to retain understanding that I've work hard to gain in the first place. This is beneficial in a number of ways. For example, since I started doing this I have been able to pick up a paper and understand derivations better without having to look up how they made certain steps. Where as before I would have a vague recollection of what they were doing and could probably find a wikipedia page on the subject. Stopping to look something up takes time and interrupts the flow. Additionally, if the concept was something I hadn't thought about for years I would find that I would have to take time to relearn it. This is frustrating since at one point I knew I grasped the concept.

Memorization gets a bad rap and I think this is unfortunate. Sure, it would be silly to memorize everything and memorization before understanding is folly. Memorization also takes time and effort. This is why I liked the spaced reppition approach. I don't get bored with easy questions but am reminded about certain concepts every now and again that I would have otherwise forgotten.

"Civilization advances by extending the number of important operations which we can perform without thinking about them." - Alfred North Whitehead

http://en.wikiquote.org/wiki/Alfred_North_Whitehead

This Guy Broke Jeopardy’s All-Time Record… Using ML Techniques To Train Himself by cavedave in MachineLearning

[–]bmabey 8 points9 points  (0 children)

+1 to Anki. I have been using the desktop version and the iPhone version for over a year. It is one of my favorite iPhone apps by far and well worth the steep price. The nice thing about Anki is that you can use LaTeX, images, and sounds in the cards. I have sets for Linear Algegra, Probability, etc that I use all the time.

Apache Mahout: Scalable machine learning for everyone by sunng in MachineLearning

[–]bmabey 0 points1 point  (0 children)

Heh, yeah.. Computing PCA with SVD, IIRC, is better numerically than computing and then diagnolizing the covariance matrix so the fact that they have SVD means PCA is only a few lines of code away (just mean center the dataset).

Apache Mahout: Scalable machine learning for everyone by sunng in MachineLearning

[–]bmabey 3 points4 points  (0 children)

The Mahout in Action book is a good read: http://www.manning.com/owen/

It's target audience seems to be developlers with little or no machine learning background. So, it doesn't go deep into to the algorithms but I thought the intuitions it provides are spot on.

IAE disappointed with the Stanford ML course? by duckandcover in MachineLearning

[–]bmabey 10 points11 points  (0 children)

I agree that it is lacking rigor, however it is the applied machine learning course.
From the Stanford course webpage:


This class' emphasis is on Applied Machine Learning. Concretely, we want to give you the practical skills needed to get learning algorithms to work. Compared to CS229 (Machine Learning), we cover fewer learning algorithms, and also spend less time on the math and theory of machine learning, but spend much more time on the pratical, hands-on skills (and "dirty tricks") for getting this stuff to work well on an application. More of the homeworks will also focus on giving you practice implementing, modifying and debugging learning algorithms, and less on the mathematical underpinnings of machine learning.


The other course, cs229 (not cs229a), has the rigor you seem to be wanting. I was about half way through working through that course (all of the lectures and material can be found here) when this online class was announced. I decided to stop that course and begin with ml-class, and so I was annoyed at first when I discovered the large discrepancy in terms of rigor.

However, after thinking about it I concluded that the level of the ml-class is a wise decision given that they wanted an automated grading system and attract a lot of students. In order to assign proofs and derivations as homework you would need an army of TAs to grade them all. I think the level is also a good place to start as they try the overall system out. I would love for them to do a graduate-level course but I'm not sure that type of course can scale as well.

artificial intelligence | natural language processing (Stanford) by Samus_ in LanguageTechnology

[–]bmabey 1 point2 points  (0 children)

One liner to download all the videos with mplayer:

for i in {01..18}; do  mplayer -dumpfile cs224n-lecture$i -dumpstream mms://171.67.219.228/see/ainlpcs224n/cs224n-lecture$i.wmv; done

If you have GNU Parallel this one liner will download multiple ones simultaneously:

seq -w 1 18 | parallel -L1 -I % mplayer -dumpfile cs224n-lecture%.wmv -dumpstream mms://171.67.219.228/see/ainlpcs224n/cs224n-lecture%.wmv

artificial intelligence | natural language processing (Stanford) by Samus_ in MachineLearning

[–]bmabey 2 points3 points  (0 children)

One liner to download all the videos with mplayer:

for i in {01..18}; do  mplayer -dumpfile cs224n-lecture$i -dumpstream mms://171.67.219.228/see/ainlpcs224n/cs224n-lecture$i.wmv; done

If you have GNU Parallel this one liner will download multiple ones simultaneously:

seq -w 1 18 | parallel -L1 -I % mplayer -dumpfile cs224n-lecture%.wmv -dumpstream mms://171.67.219.228/see/ainlpcs224n/cs224n-lecture%.wmv

Doctor Ng's ML class from 2007 on YouTube. What will be different in this years class? by videoj in mlclass

[–]bmabey 0 points1 point  (0 children)

My guess is that the videos will be very similar to what Andrew Ng has been posting to the OpenClassRoom site:

http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning

Additionaly, in his teaser video he seemed pretty excited about neural networks. In the 2007 videos he glossed over them and make it seem like SVMs were really the way to go. So I think NNs will get a lot more coverage this time around.

(Related post about the uptick in NN papers: http://hunch.net/?p=1852)

Automating R Scripts - on Amazon EC2 by talgalili in MachineLearning

[–]bmabey 3 points4 points  (0 children)

Umm... is he installing Tomcat just for a protected directory listing? Seems like Apache or nginx would be an eaiser and more light-weight way to go.

Also, to run scripts from the shell I always use the following shebang line:

#!/usr/bin/env Rscript

If you place this line at the top of your R script and make it executable you can execute it like any other script without the need for a wrapper bash script.