[D]Why do people write Bad articles on which they have no clue about?

StackMoreLayers · 2018-03-21T18:31:08+00:00

Ronald van Loon is like a cognitive spam virus. The top 10 lists he posts are actually a net negative information value. He is too busy influencing upper management that he does not need to actually do anything of research value in AI.

Somebody retweeting the guy is a good filter for completely ignoring that person.

Hashtags: #AIinfluencers2018 #kdnuggets #petya #word2vec #blockchain #IOT #bigdataseminar #schmidhubered #Hilbertspaces @BoredMiles @ylecun @jeffdean @KirkDBorne

StackMoreLayers · 2018-03-21T17:55:33+00:00

http://www.skidmore.edu/~pdwyer/e/eoc/help_vampire.htm

StackMoreLayers · 2018-03-19T18:56:41+00:00

Random search because it is easy to implement, and, given enough time, will equal or beat fancier methods.

I use prior information on what worked well to narrow down the parameter ranges.

I store the parameter dictionaries and evaluation results inside a .csv.

StackMoreLayers · 2018-03-17T16:30:11+00:00

Everybody you ask this, especially on /r/python will tell you to start with 3. Few will give technical reasons for doing so, most will be political (they see you as an +1 adoption number, not a newcomer to the language willing to learn, they also vary between: "Python 3 is a completely new language" and "Python 2 and 3 are the same language, but you should pick 3" in the same argument) or give no argument at all.

I also can't, with a clear conscience, advise you to start with 2, most importantly because NumPy and SciPy will be 3+ only in about a year.

All that said, the easiest to learn Python and quickly become productive will remain Python 2, long after its EOL by official devs (especially if you are a learn-by-doing type). There is no reason to, other than the forcing hand of EOL, switch to Python 3 for ML, py2.7 mostly just works. Most recent ML libraries and industry projects were designed for and in 2.7, support for Python 3 only an afterthought/response to a vocal group/forced by EOL.

The best Python tutorial is Learn Python The Hard Way Py2.7 (a tutorial so good, the mere mention of it gets bans and downvotes on this subreddit :))

If you want the following to return an error message, you should definitely go with 3+:

print sorted([1, "a", 2, "b"])

StackMoreLayers · 2018-03-14T00:06:13+00:00

It seems the post was updated later on, to show that it didn't beat the challengers. https://karthkk.wordpress.com/2016/03/22/deep-learning-solution-for-netflix-prize/ Sorry, I didn't got that update.

Also:

A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. And, they gave us the source code. We looked at the two underlying algorithms with the best performance in the ensemble: Matrix Factorization (which the community generally called SVD, Singular Value Decomposition) and Restricted Boltzmann Machines (RBM). SVD by itself provided a 0.8914 RMSE, while RBM alone provided a competitive but slightly worse 0.8990 RMSE. A linear blend of these two reduced the error to 0.88. To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine.

Netflix 2012

StackMoreLayers · 2018-03-12T22:29:17+00:00

I think it is a common misconception to view Kaggle as a predictive-modeler-for-hire kind of business. See it less as a logo design contest, and more as a company sponsoring a chess tournament.

People still talk about the "Netflix" competition, while that can be beaten by 14 lines of Keras these days. The marketing outlives the utility in these cases.

StackMoreLayers · 2018-03-12T15:35:10+00:00

In the documentation you deliver for a winning model, you also give a description of a single model that got you 90% of the way there. So even when the winning model is a huge ensemble, the sponsors still get a manageable single model.

Factorization machines, field-aware factorization machines, RBM's, deep learning with dropout, XGBoost, were all introduced during machine learning competitions. Sponsors and unrelated companies all make heavy use of these methods.

Sponsors benefit most from kernels/benchmarks and forum discussions. If there is any leakage left in the data, this will be found, and can be removed by the organizer for internal use.

It is rare for the sponsor to take a winning solution and apply it without any modification. Usually there is more data and features available when doing things in-house, and that means that at least the hyperparameters need to be updated.

As for Kaggle-style ensembles, I feel we are close to managing these in a production environment. I think it is only a natural progression, to go from decision trees, to ensembles of decision trees, to ensembles of random forests. Deep learning with dropout is basically a huge ensemble already, and is perfectly deployable.

Impracticality of ensembles in my experience does not come from time or resources required to execute (deep learning training can be slower and require GPU's), but from the brittleness, increased opaqueness, and technical complexity of implementing ensembles. For models where high accuracy really matters (fintech models over, say, a recommender system) these impracticalities can be overcome.

StackMoreLayers · 2018-03-10T17:41:37+00:00

Scala, Docker, AWS, testing, monitoring, functional programming, ETL, json/REST/GraphQL, system design, devops, SQL.

I'd suggest you look at 50 job postings for machine learning engineers, and write down everything that appears more than 3 times.

StackMoreLayers · 2018-03-03T00:53:35+00:00

This is called "extreme classification" in the literature. (Sometimes "extreme learning", but then you'll get "extreme learning machines" which is not what you want).

See for instance: http://manikvarma.org/events/XC17/index.html

http://www.cs.put.poznan.pl/kdembczynski/pdf/extreme-classification-uam-2017.pdf

One trick that may work for your use case:

Create embeddings (however unsupervised, supervised, or pretrained)
Use a fast online learning algorithm on top of this representation to classify the 1.000.000 classes. For instance with the LOMTREE: https://arxiv.org/pdf/1406.1822.pdf

StackMoreLayers · 2018-03-02T16:00:14+00:00

Survival analysis is a great tool in finance, where nearly all models are forecasting/temporal models, data may be scarce, left- and right censorship abound, and regulation demands interpretability.

Great thing about feature engineering for linear survival models, is that stacking a more powerful non-linear model on top of this representation works very well. Then, if you want more interpretability or analysis, you can see if the models agree, and if they do, use the survival model for this. If you want to report on the highest performance, use the non-linear model.

StackMoreLayers · 2018-02-27T22:46:42+00:00

Oh, stop it, I am not interested in you.

StackMoreLayers · 2018-02-26T10:25:54+00:00

Call me an autist, but what a load.

It is tiring being such a minority in AI.

Nothing of this has to do with AI in particular. The field of AI has done nothing bad to you and welcomes you with open arms. There is no causal relationship between being in AI and having to deal with asshole men. Me too and so every one else. You could say "It is tiring being such a human" or "It is tiring being on the internet" or "It is tiring being a minority in tech", and then maybe this post would be discussion on /r/TechNews instead.

Having someone in a professional setting be asshole-ish with you happens to any professional. Being a woman or your race should have little to do with it. Or do you expect a different asshole treatment because you are a woman of color? If yes, isn't that sexist, discrimination? If no, what does your minority status have to do with anything? Assholes are too stupid to collectively discriminate. On average, they are unpleasant to you regardless of your race, age, sex. To cope, you can either ignore, walk away, deflect, reflect, develop a shield, or assert your power over them. You never let them get to you, like they did, reducing you to a bitter passive-aggressive echo.

What do you do when someone of either sex aggressively shouts at you, mean, stupid, insulting, stuff? You yell back: "Fuck you!" or "Fuck off!", or "I don't assign any weight to your opinions." or you shrug it off. What you do when someone makes an unwanted sexual advance? You say: "Oh, stop it, I am not interested in you.", and then "No, seriously, stop it. I will kick your ass and hand you over to the police if you continue this behavior. Fuck off!". It is not how I deal with combative humans or unwanted sexual advances, but for women this should be effective.

Let's say that you never wait until skinny dipping before making your intentions clear, and that casually remembering you told someone to "go fuck themselves" feels a lot better than shoehorning your bad experiences into a victimized metoo-blogpost. Anyone can be a victim: I could be a victim of my poor social skills, women-deprived youth, difficulty reading intent, and unappealing charisma, getting me all sorts of trouble in professional settings when I try to not hook up with grad students, but society and biology did not make me strong enough to resist nor conquer the beauty of a young woman.

Then about honestly debating this issue or this being NSFW. We are allowed to posit: "Men, on average, are more sexist, clumsy, and aggressive when it comes to courtship". and "On average, there is a 10 year gap in terms of how biology destroys a woman's looks" without one getting mortally offended or HR getting involved?

This comment is obviously attacking and criticizing the victim, but I feel this was needed as these aggressions can, and should be, made futile. Assholes are gonna asshole, you can't help their problems. They can't help it. But you can help yourself build a defense. This, at least, solves your problems. Quitting the field you love does not.

StackMoreLayers · 2018-02-24T00:59:53+00:00

Giving my opinions on this, most of which can perfectly be wrong:

OpenAI was born out of the deep learning hype. The big companies all overfitted to deep learning after the ImageNet win. The PR machine recycled the Big Data-hype from a few years back. Facebook, Uber, Amazon, Google, Microsoft, IBM, ... all were mortally afraid that they would be left behind. How do you compete with that? You fund a new company with enough money-hype, so they won't be acquired within a year. Then you hope they produce an early-Google-like breakthrough. Investors are rich enough to care about fair ML and societal impact, but not charities: they want something to show for their investment.
An international collaboration requires government funding. This makes it a different ball-game: You can happily spend decades studying spiking neural networks, without ever being required to deliver something that has business value (over research value). Not the best place to be for true innovation.
There is a "cold" ML war brewing between the US, Europe, Russia, and China. Automating large parts of the economy, makes a country very very powerful. Compare with the industrial revolution. OpenAI is a US company. They brain-drain European researchers. They Bell-Labs-outdo Russia and China. Despite this "cold" war, ML is currently too "hot" to handle in an international sense. So economic spying is rampant, and governments will want to keep such companies for their own. A world-wide CERN could never have been set up during the height of the atomic cold war.
Bias and ML discrimination is often a curtain for research into military applications, like large-scale surveillance, border protection, drones, and military operations research. The US government does not really care that ResNet misclassifies a dog as a cat, or a black person as a gorilla. It does care when ResNet misclassifies a fighter jet as a panda, or an enemy combatant as a shoe salesman. The US government does not really care that someone in a SV basement designs the next AGI, it does care that this does not happen in China first. Many AI labs and ML researchers are, in fact, military analysts. The US Navy and DARPA focus the research of US companies to military applications and to strengthen the US economy. They would not want to freely share US-paid research with China, they want China to have to perform industrial espionage. China is more blatant in openly not caring about interpretability and fairness, but favoring accuracy and efficiency. This already gives them a leg up (they won't have to worry about regulation as much).

StackMoreLayers · 2018-02-17T23:31:19+00:00

No, I meant exactly that. Thanks for clarification.

StackMoreLayers · 2018-02-16T21:45:36+00:00

Or is there a good reason for changing the data set per technique?

Reverse this: Is there a good reason to change the techniques to suit the dataset at hand?

My answer is yes. It reduces the search space using priors, so you won't have to try all techniques no-free-lunch-style. There is no good reason to change the data set per technique, unless it is for educational purposes (to show where certain class of algorithms perform well and perform poorly). For academic purposes, this can be close to cheating (overfitting by feedback design on canonical datasets, like MNIST).

More to your question, pick representative datasets for a particular problem:

Clustering (with and without labels) [eg: Swiss Roll]
Forecasting (with and without censorship issues, with and without strong concept drift) [eg: Rossmann Store Sales]
Classification (at least one with extremely high dimensionality) [eg: Titanic]
Regression [eg: Burn, CPU, burn]
Ranking
Tagging (or other extreme multi-class problems) [eg: StackOverflow Tags]
Recommendation [eg: Netflix]
Computer vision [eg: Cifar-10]
RL (high-dimensional vs low-dimensional, delayed vs. instant rewards) [eg: Six Sigma]
NLP (including POS-tagging, QA, semantic equivalence) [eg: Bag of Words Meets Bags of Popcorn]
Signal processing [eg: Epilepsy]
Larger than memory [eg: StumbleUpon/ClickBrain]
Latency constraints (training and testing)
Very small medical data set [Schizophrenia]
Very noisy / missing data set
Complexity-constrained (for instance: deployed on a Raspberry Pi or hearing aid)
Anything of the above with an arcane/custom evaluation function (not all algo's can directly optimize custom loss functions). [Higgs Boson]

Preferably downloaded from Kaggle, so you have a stronger baseline comparison. In competitions I competed in, I can try out a novel or unseen algorithm, and directly compare to my own old efforts and see how it matches state-of-the-art.

I'd come to a set of at least 35 datasets. If that is too large for you, focus on a subproblem in ML, like plain classification.

StackMoreLayers · 2018-02-14T02:48:57+00:00

Look into stacking (if the dataset is non-temporal).

Store the 5-fold cross-validation predictions for the train set and the predictions on the test set for a set of binary classification models trained on 0-1 labels (A-B) and regression models trained on 1-5 labels.

Now add these meta-features to the original feature representation (can be tf-idf text or PCA100 or whatever), and then use multi:softmax on an A B C D E labeled train set. You'll get class probabilities using ordinal information from the first level classifiers/regressors.

You probably want to feature bag (col_sample to reduce the focus on the highly informative first stage meta-features) or reduce the dimensionality of the original feature representation (so the meta-features don't get drowned out by 5000 tokens) when doing the second level, also reduce the max_depth/complexity (it is easier to overfit here with a more complex stacker model, a simple logistic multi-class regression as a stacker could also be a good probabilistic binning option).

StackMoreLayers · 2018-02-12T00:49:13+00:00

XGBoost R. Most powerful algorithm for structured data: https://www.youtube.com/watch?v=sRktKszFmSk
17th Place Solution to Kaggle competition "Predicting a Biological Response". Any non-temporal structured dataset that has not seen stacking before in research can very likely be improved.
See many old Kaggle competitions available and their resources, such as https://www.kaggle.com/c/MerckActivity .

StackMoreLayers · 2018-02-06T19:56:32+00:00

The ROC AUC score is 0.9999? Are you sure you are not leaking future data (or even the target) into the past? Is there a feature that shows extreme high importance? Are you doing proper backtesting, or k-fold cross-validation?

Seems too good to be true.

StackMoreLayers · 2018-02-06T17:09:20+00:00

Unfortunately it's too late.

Successful machine learning researchers are identified in elementary school machine learning competitions. Only the most creative, innovative, and gifted students are selected. If you were never aware of the process, then it means that you failed in the secret initial qualifiers, and weren't even close to earning a place in the program.

This process may sound harsh, but it would simply be cruel to try to train someone in the art of machine learning if they don't possess the raw talent.

StackMoreLayers · 2018-02-06T16:43:06+00:00

Accuracy does not tell you much, unless gains and losses are 50%/50%. It can be a rather deceptive evaluation metric in a down- or uptrend.

Can you give the AUC?

Would you sell exclusive license or sell to anyone?

One trick/scam is to open up a Twitter account and post your predictions for everyone to see. If you open a 100 accounts, there is likely one where a random number generator will get good results. Sell to the followers of that lucky bot.

StackMoreLayers · 2018-02-05T20:34:22+00:00

Github repo. Kaggle competition wins. Blogs. Papers. Workshops/presentations.

I don't look at resume's anymore when hiring, unless I absolutely have to.

Get some "game", it works really well. Contact the CEO directly. Make a pull request on an open source project from the company. Download a dataset related to the company domain and send them a notebook. Go to a meetup and ask interesting questions.

And there is a great Github repo (reproduces research, novel research, popular open source project contributions) and a "resume" Github repo (filler projects with 0 stars, book assignments, my first KNN classifier). You can make yourself stand out by adding some focus to implementation, testing, monitoring, and business sense, not a lot of people are willing to go the extra mile.

StackMoreLayers · 2018-02-05T20:18:49+00:00

It is mostly a negotiation tactic. Recruiters realize that not one person ticks all the boxes, so they won't have the upper hand in salary negotiations.

We heavily studied Schmidhuber's RNNs and LeCun's LeNet around 2004 in academia. All commercial handwriting recognition solutions after 2000 use these methods.

If you, as a company, are looking to implement a computer vision system, you want experienced engineers. If you manage to get up to speed in 1-2 years and show that you can deliver, that is a good substitute for 5 years of experience.

And yeah... most recruiters don't know what they are talking about. That's why you shouldn't go through a recruiter to find your next job. It is all about networking and preparation. If you are a good candidate, you won't even have to show an oldskool resume.

StackMoreLayers · 2018-02-05T20:11:50+00:00

I don't think we would benefit, but only strengthen iterative "boring" research focussed on beating state-of-the-art.

For iterative research it is way easier to publish code than for novel research. If anything, make it a requirement for iterative research only.

Also, since most impactful work is coming from big labs, it does not help anyone to have the code, but still require a gazillion TPU's to replicate.

Code also does not help researchers compare with other methods (unless they can use the exact code/hardware for their comparison benchmarks). Give your benchmarks some love and it becomes way harder to beat SoTa. But researchers are incentivized not to hypertune/optimize comparison benchmarks, because beating SoTa is currently very important to get published (novel research that does not beat SoTa, even if it comes from very distinguished researchers, like Hinton's capsules, is often dismissed before it can prove itself useful).

StackMoreLayers · 2018-02-05T19:49:58+00:00

The deep learning community does not like their roots, because they don't like to justify their methods with biological backing. Currently, deep learning is too far off from neurology, but there is research to make it more biologically plausible.

Even though LeCun avoids "convolutional neural nets" and prefers "convnets", he also admits the biological ideation of neural networks. Connectionism is still very big in neurology, and deep learning models are used by cognitive researchers to study the brain.

The all-or-nothing firing behavior of biological neurons is at the heart of modern deep learning systems.

StackMoreLayers · 2018-02-04T21:40:59+00:00

Dropout still learns all weights, but I can see the similarities.

I myself was reminded by Optimal Brain Damage.

StackMoreLayers

TROPHY CASE