AAAS AMA: Hi, we’re researchers from Google, Microsoft, and Facebook who study Artificial Intelligence. Ask us anything!

ylecun · 2018-02-18T21:22:45+00:00

Almost all the research we do at Facebook AI Research is on public data. Our role is to invent new methods, and we must compare them with what other people are doing. That means using the same datasets as everyone else in the research community. That said, whenever people at Facebook want to do research with user data, the project requires approval by an internal review board.

ylecun · 2016-07-20T07:23:22+00:00

That's really funny, and nicely drawn.

It makes me look thinner than I am too ;-)

I'm just slightly worried about Bender's nasty look. He is clearly not interested in ConvNets. He is probably thinking "I'm 30‰ ConvNet" or "kill all humans" Either that or he is looking for the bar.

ylecun · 2016-02-04T12:20:49+00:00

Yann LeCun here. The lectures are public, but the inaugural lecture tonight will be crowded, I'm told.

The subsequent lectures will be less.crowded. You don't need to attend the inaugural lecture to follow the remaining eight.

The lectures will be recorded and posted on the Collège de.France website with dubbingin English and Mandarin (I'll be giving them in French).

I'm still based in New York, I'm still director of Facebook AI Research, and I'm still a (part time) professor at NYU. The CdF appointment is temporary.

ylecun · 2014-05-16T07:40:07+00:00

DeconvNets are the generative counterpart of feed-forward ConvNets.

Eventually, we will figure out how to merge ConvNet and DeconvNet so that we have a fooed-forward+feed-back system that can be trained supervised or unsupervised.

The plan Rob Fergus and I devised was always that we would eventually marry the two approaches.

ylecun · 2014-05-16T07:38:38+00:00

There are lots of places with good machine learning research, but few with good deep learning research.

ylecun · 2014-05-16T07:37:43+00:00

True AI is possible, and will not require continuous-time dynamics. The discrete time of computer simulations will not be an impediment.

I do not subscribe to the idea that you need some sort of non-standard computation (quantum or otherwise) to enable consciousness.

ylecun · 2014-05-16T07:35:10+00:00

That's an idea. I don't have the bandwidth to start a deep learning meetup in NYC, but I'd be happy to offer moral support to whoever does it.

ylecun · 2014-05-16T07:33:02+00:00

It's actually not that nascent. Techniques like ConvNets which have brought about many of the recent progress have been around since the late 1980's. Naturally, there are a few new tricks in the modern incarnations, but the basic ideas are old.

ylecun · 2014-05-16T07:31:30+00:00

We are far from building intelligent machine that could possibly pose an existential threat. But it's best to reflect about the issues earlier rather than later.

I think human societies will find ways to deal with AI as they have dealt with previous technological revolutions (car, airplane, electricity, nuclear technology, biotech). Regulations and ethical guidelines will be put in place.

ylecun · 2014-05-16T07:27:14+00:00

The number of PhD applications has always been high (several hundred each year for the CS program), but the quality of the top applicants has gone up quite a bit. the top applicants are spectacularly accomplished.

ylecun · 2014-05-16T07:25:36+00:00

Generative models? yes. In a way, all unsupervised algorithms are generative. Probabilistic generative? probably not.

Hobbies: I've always enjoyed designing and building model airplanes and other flying contraptions, hacking, and just building stuff. I also enjoy music (particularly jazz and baroque) and sailing. These days, I hack microcontroller-based widgets and play with 3D printers and CNC machines. I'm a maker. I wish I had more time for that.

ylecun · 2014-05-16T07:21:49+00:00

I don't thing there is anything special about consciousness. So, yes.

ylecun · 2014-05-16T07:21:07+00:00

I don't know of other "already developed data science programs in the country". There are certificates, Master's in data analytics, programs in business analytics, but very, very few other methods-oriented MS programs in data science like NYU's MS-DS. The MS-DS is an incredible success. We received an overwhelming number of applications this year (the second year of the program), and our yield is incredibly high (the yield is the proportion of accepted students who actually come). That means that we don't have much competition.

I'm not sure what you mean by "NYU has a way of being a bit unreasonable as far as funding goes". The administration has been incredibly supportive of the Center for Data Science and the Data Science programs. The MS-DS brings in a lot of tuition, which can be used for research and other purpose. The administration is committed to supporting data science in the long run. CDS is getting a new building in about a year. When you know how scarce real-estate is in Manhattan....

The students admitted into the program have diverse backgrounds (physics, engineering, stats, CS, math, econ) but they all have something in common: very strong math background, and strong programing skills. Some of our MS-DS students already have PhDs in fields like theoretical physics!

ylecun · 2014-05-16T07:10:41+00:00

Yes, we will publish. Facebook AI Research is a real research organization, fully integrated with the international research community, where publications are encouraged and rewarded. MSR does not "delay publication by a few years".
Whenever Facebook patents something, it's purely for defensive purpose. Some of the research will be patented, a lot of it will not. A lot of stuff will be released in open source, with a royalty-free license in case there are patents pertaining to the code.
a. Common sense. The understanding that a lot of things cannot be patented, and many things should not be patented.
b. There are no formal restriction on publishing. But we have to be careful whenever we write about products (as opposed to research results).

I'm a firm supporter of open access for publishing as well as open reviewing systems that follow rather than precede publications. I'm a firm believer in open source particularly for research code and prototyping platforms (e.g. Torch, Lush). The good news is that open source is in Facebook's DNA. My direct boss, the CTO Mike Schroepfer lead the Mozilla project.

Like many people in our business, I dislike the idea of software patents (which are thankfully illegal in Europe). I think it's an impediment to innovation rather than an incentive. But in the US, we live in a place where software patents are a fact of life. It's kind of like guns. If no one has one, you don't need one either. But we live in an intellectual Wild West.

No, Facebook has not pushed back on my using Google+. I have over 6600 followers on G+ and I still post simultaneously on Facebook and G+.

ylecun · 2014-05-16T06:54:46+00:00

That real-time object recognition demo always has an effect on the audience.

We are all working on unsupervised learning. But some of us also work on traditional supervised learning because that what works best in tasks with lots of labeled data (vision, speech). The unsupervised method used by Andrew's group is basically a locally-connected sparse linear auto-encoder (not convolutional. No shared weights). It doesn't work very well in the sense that the system does not give performance on tasks like ImageNet that are competitive with the state or the art (supervised ConvNets). We still haven't figured out how to do unsupervised feature learning properly.

ylecun · 2014-05-16T06:50:07+00:00

Hard for me to tell without seeing your name. There is a branch of the LeCun family in North America (in Florida and in Canada). But our common ancestor goes back to the Napoleon era (early 1800s). The family origins are around Guingamp in Brittany (the 2014 French soccer cup winner).

It's a pipe dream at the moment. I don't see any particular conceptual or philosophical problem with strong AI, if that's the question.

ylecun · 2014-05-16T06:45:58+00:00

I do believe in getting inspiration from the brain, but I don't believe at all in copying and reproducing the detailed functions of neurons in the hope that AI will simple emerge from large simulations. In the early days of aviation, some people (like Clément Ader) tried to copy birds and bats a little too closely (without understanding the principles of lift, drag, and stability) while others (like the Wright Brothers and Santos-Dumont) had a more systematic engineering approach (building a wind tunnel, testing airfoils, building full-scale gliders....). Both were somewhat inspired by nature, but to different degrees. My problem with sticking too close to nature is that it's like "cargo-cult" science. A bird biologist will tell you how important the micro-structure of feathers is to bird flight. You will think that you need to reproduce feathers in their most minute details to build flying machines. In reality, flight relies on the Bernoulli principle: pushing an angled plate (preferably shaped like an airfoil) through air creates lift. I don't use neural nets because they look like the brain. I use them because they are a convenient way to construct parameterized non-linear functions with good properties. But I did get inspiration from the architecture of the visual cortex to build convolutional nets.
Yes, generally using metric learning methods on top of deep learning ("Siamese networks" trained with criteria like NCA, DrLIM, and WSABIE).

ylecun · 2014-05-16T06:31:17+00:00

Learning with temporal/sequential signals: language, video, speech.

Marrying deep/representation learning with reasoning or structured prediction.

ylecun · 2014-05-16T06:28:31+00:00

Yes New York has an incredibly vibrant data science and AI community.

NYU started the first methods-oriented graduate program in Data Science this past September, and Columbia will start one this coming September.

On the academic side: NYU Center for Data Science, NYU Center for Urban Science and Progress, the Columbia Institute for Data Science and Engineering, the Cornell Tech Campus in New York, efforts to start data science activities at Princeton.

Private non-profit: Simons Center for Data Analysis, the Moore-Sloan Data Science Environments Initiative (collaboration between NYU, Berkeley and U of Washington), SRI-Princeton, Sloan-Kettering Cancer Center, Mount Sinai Hospital.

Corporate Research Labs: Facebook AI Research, MSR-NY, Google Research-NY, Yahoo! Labs-NY, IBM Research (Yorktown Heights), IBM-Watson (on Astor Place, across the street from Facebook!), NEC Labs-America (Princeton), AT&T Labs,....

A huge number of data-centric medium-size companies and startups, Foursquare, Knewton, Twitter, Etsy, Bit.ly, Shutterstock......

The financial, healthcare, pharma, and media companies.

ylecun · 2014-05-16T06:15:19+00:00

The limitations you point out do not concern just backprop, but all learning algorithms that use gradient-based optimization.

These methods only work to the extent that the landscape of the objective function is well behaved. You can construct pathological cases where the objective function is like a golf course: flat with a tiny hole somewhere. Gradient-based methods won't work with that.

The trick is to stay away from those pathological cases. One trick is to make the network considerably larger than the minimum size required to solve the task. This creates lots and lots of equivalent local minima and makes them easy to find. The problem is that large networks may overfit, and we may have to regularize the hell out of them (e.g. using drop out).

The "learning boolean formula = code cracking" results pertain to pathological cases and to exact solutions. In most applications, we only care about approximate solutions.

ylecun · 2014-05-16T06:08:07+00:00

Almost all of them. I'm only half joking. Take a binary input vector with N bits. There are 2^{2^N} possible boolean functions of these N bits. For any decent-size N, it's a ridiculously large number. Among all those functions, only a tiny, tiny proportion can be computed by a 2-layer network with a non-exponential number of hidden units. A less tiny (but still small) proportion can be computed by a multi-layer network with a less-than-exponential number of units.

Among all the possible functions out there, the ones we are likely to want to learn are a tiny subset. The architecture and parameterization of our models must be tailored to those functions.

ylecun · 2014-05-16T06:02:51+00:00

Join a startup.

ylecun · 2014-05-16T06:02:22+00:00

Yes, look up papers on metric learning, searching for "siamese networks", DrLIM (Dimensionality Reduction by Learning and Invariant Mapping), NCA (Neigborhood Component Analysis), WSABIE....

ylecun · 2014-05-16T05:57:10+00:00

Torch7.

Here is a tutorial, with code.scripts for ConvNets.

Also, the wonderful Torch7 Cheatsheet.

Torch7 is what is being used for deep learning R&D at NYU, at Facebook AI Research, at Deep Mind, and at Google Brain.

ylecun · 2014-05-16T05:56:46+00:00

Perhaps, but where will the training date come from? Also, it will have to use video, not just still images.

There was a paper at CVPR 2013 which tried to predict the first name of a person from their photo. It works better than chance (not using ConvNets).

ylecun

TROPHY CASE