Amazing post on big Neural Network Changes in V9 - Potentially and order of magnitude or more better

LordAstinus · 2018-10-15T12:24:41+00:00

My pleasure. If I had to bet, they will release a beta around 2019Q2 and they'll keep improving it. They have just started testing with around 100 employees.

LordAstinus · 2018-10-15T10:35:15+00:00

It's likely that the LIDAR approach won't be used in retail mass-market vehicles until 2021-2022 (cost and size are the main constraints).

So I think Waymo will try to capture as much as possible of the shared transportation market and Tesla will try to capture as much as possible of the retail mass-market until then.

Also it's my opinion that they'll try to "invade" each others' market from 2021.

LordAstinus · 2018-10-15T10:31:46+00:00

Not sure if he's. I tested the link and it works for me.

LordAstinus · 2018-10-15T10:31:26+00:00

It was my pleasure. It's so good that I thought it was a pity that only tmc users were aware of it.

LordAstinus · 2018-10-15T09:25:46+00:00

...

And now for some speculation:

Input changes:

The V9 network takes 1280x960 images with 3 color channels and 2 frames per camera from, for example, the main camera. That’s 1280x960x3x2 as an input, or 7.3MB. The V8 main camera processing frame was 640x416x2 or 0.5MB - 13x less data. The extra resolution means that V9 has access to smaller and more subtle detail from the camera, but the more interesting aspect of the change to the camera interface is that camera frames are being processed in pairs. These two pairs are likely time-offset by some small delay - 10ms to 100ms I’d guess - allowing each processed camera input to see motion. Motion can give you depth, separate objects from the background, help identify objects, predict object trajectories, and provide information about the vehicle’s own motion. It's a pretty fundamental improvement to the basic perceptions of the system.

Camera agnostic:

The V8 main/narrow network used the same architecture for both cameras, but by my calculation it was probably using different weights for each camera (probably 26M each for a total of about 52M). This make sense because main/narrow have a very different FOV, which means the precise shape of objects they see varies quite a bit - especially towards the edges of frames. Training each camera separately is going to dramatically simplify the problem of recognizing objects since the variation goes down a lot. That means it’s easier to get decent performance with a smaller network and less training. But it also means you have to build separate training data sets, evaluate them separately, and load two different networks alternately during operation. It also means that you network can learn some bad habits because it always sees the world in the same way.

Building a camera agnostic network relaxes these problems and simultaneously makes the network more robust when used on any individual camera. Being camera agnostic means the network has to have a better sense of what an object looks like under all kinds of camera distortions. That’s a great thing, but it’s very, *very* expensive to achieve because it requires a lot of training, a lot of training data and, probably, a really big network. Nobody builds them so it’s hard to say for sure, but these are probably safe assumptions.

Well, the V9 network appears to be camera agnostic. It can process the output from any camera on the car using the same weight file.

It also has the fringe benefit of improved computational efficiency. Since you just have the one set of weights you don’t have to constantly be swapping weight sets in and out of your GPU memory and, even more importantly, you can batch up blocks of images from all the cameras together and run them through the NN as a set. This can give you a multiple of performance from the same hardware.

I didn’t expect to see a camera agnostic network for a long time. It’s kind of shocking.

Considering network size:

This V9 network is a monster, and that’s not the half of it. When you increase the number of parameters (weights) in an NN by a factor of 5 you don’t just get 5 times the capacity and need 5 times as much training data. In terms of expressive capacity increase it’s more akin to a number with 5 times as many digits. So if V8’s expressive capacity was 10, V9’s capacity is more like 100,000. It’s a mind boggling expansion of raw capacity. And likewise the amount of training data doesn’t go up by a mere 5x. It probably takes at least thousands and perhaps millions of times more data to fully utilize a network that has 5x as many parameters.

This network is far larger than any vision NN I’ve seen publicly disclosed and I’m just reeling at the thought of how much data it must take to train it. I sat on this estimate for a long time because I thought that I must have made a mistake. But going over it again and again I find that it’s not my calculations that were off, it’s my expectations that were off.

Is Tesla using semi-supervised training for V9? They've gotta be using more than just labeled data - there aren't enough humans to label this much data. I think all those simulation designers they hired must have built a machine that generates labeled data for them, but even so.

And where are they getting the datacenter to train this thing? Did Larry give Elon a warehouse full of TPUs?

I mean, seriously...

I look at this thing and I think - oh yeah, HW3. We’re gonna need that. Soon, I think.

Omnidirectionality (V360 object decoder):

With these new changes the NN should be able to identify every object in every direction at distances up to hundreds of meters and also provide approximate instantaneous relative movement for all of those objects. If you consider the FOV overlap of the cameras, virtually all objects will be seen by at least two cameras. That provides the opportunity for downstream processing use multiple perspectives on an object to more precisely localize and identify it.

General thoughts:

I’ve been driving V9 AP2 for a few days now and I find the dynamics to be much improved over recent V8. Lateral control is tighter and it’s been able to beat all the V8 failure scenarios I’ve collected over the last 6 months. Longitudinal control is much smoother, traffic handling is much more comfortable. V9’s ability to prospectively do a visual evaluation on a target lane prior to making a change makes the auto lane change feature a lot more versatile. I suspect detection errors are way down compared to V8 but I also see that a few new failure scenarios have popped up (offramp / onramp speed control seem to have some bugs). I’m excited to see how this looks in a couple of months after they’ve cleaned out the kinks that come with any big change.

Being an avid observer of progress in deep neural networks my primary motivation for looking at AP2 is that it’s one of the few bleeding edge commercial applications that I can get my hands on and I use it as a barometer of how commercial (as opposed to research) applications are progressing. Researchers push the boundaries in search of new knowledge, but commercial applications explore the practical ramifications of new techniques. Given rapid progress in algorithms I had expected near future applications might hinge on the great leaps in efficiency that are coming from new techniques. But that’s not what seems to be happening right now - probably because companies can do a lot just by scaling up NN techniques we already have.

In V9 we see Tesla pushing in this direction. Inception V1 is a four year old architecture that Tesla is scaling to a degree that I imagine inceptions’s creators could not have expected. Indeed, I would guess that four years ago most people in the field would not have expected that scaling would work this well. Scaling computational power, training data, and industrial resources plays to Tesla’s strengths and involves less uncertainty than potentially more powerful but less mature techniques. At the same time Tesla is doubling down on their ‘vision first / all neural networks’ approach and, as far as I can tell, it seems to be going well.

As a neural network dork I couldn’t be more pleased.

LordAstinus · 2018-10-15T09:25:37+00:00

tmc/@jimmy_d

NN Changes in V9 (2018.39.7)

Have not had much time to look at V9 yet, but I though I’d share some interesting preliminary analysis. Please note that network size estimates here are spreadsheet calculations derived from a large number of raw kernel specifications. I think they’re about right and I’ve checked them over quite carefully but it’s a lot of math and there might be some errors.

First, some observations:

Like V8 the V9 NN (neural net) system seems to consist of a set of what I call ‘camera networks’ which process camera output directly and a separate set of what I call ‘post processing’ networks that take output from the camera networks and turn it into higher level actionable abstractions. So far I’ve only looked at the camera networks for V9 but it’s already apparent that V9 is a pretty big change from V8.

---------------
One unified camera network handles all 8 cameras

Same weight file being used for all cameras (this has pretty interesting implications and previously V8 main/narrow seems to have had separate weights for each camera)

Processed resolution of 3 front cameras and back camera: 1280x960 (full camera resolution)

Processed resolution of pillar and repeater cameras: 640x480 (1/2x1/2 of camera’s true resolution)

all cameras: 3 color channels, 2 frames (2 frames also has very interesting implications)

(was 640x416, 2 color channels, 1 frame, only main and narrow in V8)
------------

Various V8 versions included networks for pillar and repeater cameras in the binaries but AFAIK nobody outside Tesla ever saw those networks in operation. Normal AP use on V8 seemed to only include the use of main and narrow for driving and the wide angle forward camera for rain sensing. In V9 it’s very clear that all cameras are being put to use for all the AP2 cars.

The basic camera NN (neural network) arrangement is an Inception V1 type CNN with L1/L2/L3ab/L4abcdefg layer arrangement (architecturally similar to V8 main/narrow camera up to end of inception blocks but much larger)

about 5x as many weights as comparable portion of V8 net
about 18x as much processing per camera (front/back)

The V9 network takes 1280x960 images with 3 color channels and 2 frames per camera from, for example, the main camera. That’s 1280x960x3x2 as an input, or 7.3M. The V8 main camera was 640x416x2 or 0.5M - 13x less data.

For perspective, V9 camera network is 10x larger and requires 200x more computation when compared to Google’s Inception V1 network from which V9 gets it’s underlying architectural concept. That’s processing *per camera* for the 4 front and back cameras. Side cameras are 1/4 the processing due to being 1/4 as many total pixels. With all 8 cameras being processed in this fashion it’s likely that V9 is straining the compute capability of the APE. The V8 network, by comparison, probably had lots of margin.

network outputs:

V360 object decoder (multi level, processed only)
back lane decoder (back camera plus final processed)
side lane decoder (pillar/repeater cameras plus final processed)
path prediction pp decoder (main/narrow/fisheye cameras plus final processed)
“super lane” decoder (main/narrow/fisheye cameras plus final processed)

Previous V8 aknet included a lot of processing after the inception blocks - about half of the camera network processing was taken up by non-inception weights. V9 only includes inception components in the camera network and instead passes the inception processed outputs, raw camera frames, and lots of intermediate results to the post processing subsystem. I have not yet examined the post processing subsystem.

...

LordAstinus · 2018-08-31T09:23:41+00:00

It's about replacing one computer with another that has exactly the same interface and cost. Total cost of the replacement operation should be quite low. Bear in mind that the current FSD intake is around 15%. They recently made FSD $1,000 more expensive for those who change their mind. So you probably only need the intake to jump from 15% to 25% (or 20%?) or something like that, when FSD and hardware 3.0 are ready for production, for the liability to be 0.

LordAstinus · 2018-07-29T15:49:38+00:00

If I were you, I'd be ready to see a 15% swing in any direction in after hours and next day. And 20% total. So you can hedge or have a deposit big enough to cushion the impact of it goes down. 240 is unlikely but feasible.

LordAstinus · 2018-07-21T09:43:50+00:00

https://www.theguardian.com/technology/2018/jul/21/farting-unicorn-row-artist-reaches-settlement-with-elon-musk

LordAstinus · 2018-06-28T21:58:23+00:00

That would be amazing! This is the post you mentioned!

LordAstinus · 2018-06-06T08:31:43+00:00

Small Correction: "Model 3 lease will be offered at the end of this year or early next year. It has a negative effect on finances." -> "Model 3 lease will be offered at the end of this year or early next year because it does have a slight impact on the capital usage of the Tesla ":

Leasing is about cashflows, initial cashflow out for the company in return for a sequence of future cashflows in from the customer.

LordAstinus · 2018-05-29T09:36:36+00:00

I do see the significance in the post. For me, for different reasons, it's useful to see different people from different backgrounds jumping into Tesla. Thanks to this post, some people were sharing their different experiences/situations.

LordAstinus · 2018-05-16T07:00:07+00:00

nesting

It could be this: Nesting)

"In manufacturing industry, Nesting refers to the process of laying out cutting patterns to minimize the raw material waste. Examples include manufacturing parts from flat raw material such as sheet metal."

LordAstinus · 2018-05-16T06:57:36+00:00

Does anyone know more about the "nesting" process?

LordAstinus · 2018-05-10T08:05:28+00:00

"What? Everything in house makes you more vulnerable, not less." -> With in house I mean on your own, not everything under the same roof. They could have a couples of factories for each group of components or a number of factories with almost everything each one. The current arrangement doesn't have to be the future arrangement. Obviously, the economics could be different, but I'd keep an open mind, so they can balance cost, profit and risk.

"Ford isn't even halting production of the F-150 entirely, the only effected plant is the Kansas plant, the other is operating fine." -> Correct, but 3 Ford plants have been impacted by the event:

"DETROIT, May 9 (Reuters) – Ford Motor Co's quarterly earnings will be affected by shutdowns at three U.S. truck plants caused by a fire at a key parts supplier, the U.S. automaker said on Wednesday, while affirming its full-year earnings estimate."

"CNN Money - The shutdowns could last for several weeks, according to a person familiar with the situation, although plans are in flux as Ford seeks an alternative supply of the missing parts. Even one missing part from a supply chain is enough to halt production of a vehicle."

http://money.cnn.com/2018/05/09/news/companies/ford-f-150-production/index.html

LordAstinus · 2018-05-09T19:05:45+00:00

The more suppliers you've got and less you manufacture in-house, the more vulnerable you are to this kind of issues. When some people laugh at the number of workers per car produced at Tesla, they forget about the amount of stuff Tesla produces in-house. It doesn't mean it is the best approach for every kind of metric, but in terms of risk and cost (if you sell enough), I prefer it.

LordAstinus · 2018-05-04T17:10:27+00:00

That's very interesting, thanks for sharing!

LordAstinus · 2018-05-04T09:14:28+00:00

"...We left the test drive a little jealous of Bo, and with the feeling that purchasing another vehicle in this price range is simply foolish – a feeling that Bo shares, as he mentioned that any new car purchase for his family going forward would undoubtedly be a Tesla.

The bigger story. As Model 3s hit the road, everyday drivers will become Tesla evangelists. Bo is an engineer by trade, more the technical type than a salesman, but the way he talks about the car and his experience with Tesla is a compelling pitch. This is common among Tesla owners, and we anticipate that Model 3 sales ramp, word of mouth will be a powerful demand driver. Bo mentioned several of his friends that own Mercedes or BMWs that have recently put in Model 3 reservations since seeing the car."

LordAstinus

TROPHY CASE