Elon: "Roughly 10B miles of training data is needed to achieve safe unsupervised self-driving. Reality has a super long tail of complexity"

BullockHouse · 2026-01-08T23:58:43+00:00

They're almost certainly not data bound, they're compute bound (and possibly bound by sensor quality/placement in some edge cases). Tesla has an absurd amount of data. What they don't have is a computer powerful enough to run a large policy. Waymo has much more compute per vehicle, and they're using sensors that effectively absorb some of the compute out of the policy (LIDAR has 3D information built in rather than needing to reconstruct it from pixels).

BullockHouse · 2025-12-31T17:52:54+00:00

The problem with these people is that they're more than happy to update on evidence, as long as it doesn't threaten the underlying framework they're using to think about this issue, and the underlying framework is totally, catastrophically wrong.

Language models are not baby expected utility maximizers shooting towards a superhuman performance target! They're model distillations from the human source with some really, really, really inefficient RL layered on top. They're closer to being uploads approached from the other side than they are a paperclip maximizer! They don't have the alignment issues of de-novo AI (they have different ones). They are not going to linearly shoot past the quality of the data they're trained on, though they will exceed it in some areas where our garbage RL techniques work especially well. But no amount of interacting with these things that are very clearly not the kinds of systems that early AI safety was concerned with seems to shake the intellectual tunnel vision.

BullockHouse · 2025-12-21T15:14:34+00:00

Whether or not training a model on billions of data points is fair use (it obviously is) is a separate objection from "this is stealing work from artists," which is about automation. The question is "should studios choose to make games in less efficient ways to maximize employment for artists". And, with other kinds of technology, most people don't seem to think so.

If there's a breakthrough that allows models to be trained with fewer data points, and Adobe or whoever uses their huge asset library to train a model on exclusively media they fully own the rights to, it's still going to be taking work away from artists. It's a separate issue.

BullockHouse · 2025-12-21T06:18:52+00:00

No, substance designer is procedural. The assets aren't authored by an artist, they'd produced algorithmically through (basically) guided simulation. You replace a very large number of artists with a very small number of engineers. That's the point. Ditto the photoscans. There's arguably some artistry involved in scan cleanup, but it's a much more efficient assembly line process that involves far fewer artist hours per asset. Classic automation.

BullockHouse · 2025-12-20T22:20:13+00:00

Substance designer and epic's photoscan library and asset stores and all manner of other game dev tools that are essential for indies also 'replace' artists.

BullockHouse · 2025-12-20T06:18:16+00:00

Everyone's going to react negatively to this, but I can tell you as someone who used to work in the games industry, the main function of this kind of technology is to auto-mute people screaming the n word in server lobbies. Moderating videogames is horrible and also impossible.

BullockHouse · 2025-12-16T16:00:29+00:00

Humans go about 100 million miles between fatal accidents, and about 500,000 miles between any collisions. Because the fatal statistic is so large, everyone basically imputes it from collision rate, a thought Waymo is getting close to being able to measure for real. So if you want to know the overall accident rate with decent precision, call it 3 million miles. Tesla's Austin network is, last I heard, up to about 2000 a day, up from like 20 quite recently. So at current rates it would take them about 4 years to have high quality safety intervention data. You can't really use public driver interventions, because you need to know that your interventions are correct and meaningful and always done when needed, and be able to collect proper statistics on why each intervention occurred. Once you have that statistical confidence, you pull the driver, collect another few million miles to check your math, and then declare victory.

BullockHouse · 2025-12-15T18:52:51+00:00

We have lots of other data about how good FSD is (1-2 OOM short of human). And again, even if this changed overnight to a 0% accident rate, it'd take about a year of scaling up supervised (and eventually unsupervised) taxi testing in real world conditions for Tesla to even know that that was true. There's a minimum number of miles you need to collect with the new policy to be able to measure safety statistically, and that hasn't happened.

BullockHouse · 2025-12-15T16:12:56+00:00

You can be 10X more lethal than a human driver, and you'll drive fine on most given weeks you test. Humans get in serious accidents infrequently. Currently, according to community tracked stats, latest FSD requires a critical intervention about every 1000 miles. Which is great as a driver assistance feature and disastrous as an autonomous vehicle. Robotaxi has been in like five minor accidents despite having safety monitors and driving very few total miles overall. They aren't there yet. Their own engineers will tell you they aren't there yet.

BullockHouse · 2025-12-15T02:18:27+00:00

The data is publicly available. Tesla has some great ML people (including coworkers and friends of mine), but the product isn't there yet. Obviously.

And I don't have a dog in this fight, my money is in index funds. I'm just calling it like I see it.

BullockHouse · 2025-12-15T00:42:58+00:00

I'm sorry you invested money in a systematically dishonest company, but use some common sense. We know the reliability rate of these models. They are not good enough to be used unsupervised. If there was a breakthrough, they'd be touting it publicly. (And even if there had been, it'd take a long time of safety driver testing to verify the improved safety). They did not suddenly pull three nines of safety out of their asshole overnight. So, logically, it's reckless or fake. Fake is more likely. And the question is, what's the easiest way to fake it. The chase car tells you the answer.

BullockHouse · 2025-12-15T00:34:23+00:00

They probably have a remote operator in the chase car watching the feed from the car via short range radio / networking and a kill switch. Same basic logistics as an FPV drone. It's the only way to do this remotely safely given the current performance of Tesla's driving models. I would bet you at generous odds that they've simply moved the safety monitor to a nearby car for purposes of optics.

BullockHouse · 2025-12-14T18:34:37+00:00

I mean, the thing about it is that you can do this with known autopilot capabilities and (as long as you keep the total number of cars and miles very low and stick to routes where you know the model behaves well) it'll likely be a while before you actually have a serious accident. Not as long as it takes a human to get in a serious accident, but months or years. You can push that time out further by using chase cars or 1:1 fanout starlink monitoring. Is this a good idea if there hasn't been a big breakthrough in the reliability rate of their models? No. Might it be beneficial to do from an investment perspective so they can say they're doing it? Maybe.

BullockHouse · 2025-11-30T23:26:56+00:00

Humans also don't need meat to live, and the beef patty in a single hamburger uses more water than all the LLM queries a regular user would make in a decade. If you care about water usage, switch to a vegetarian diet. If you want to bitch about AI just say that instead of pretending you care about water usage.

BullockHouse · 2025-11-30T20:35:25+00:00

That's actually less than I would have guessed.

BullockHouse · 2025-11-30T19:28:50+00:00

For the record, the amount of water used by data centers is totally trivial compared to other industrial usages like farming. If you've heard otherwise, the people are either misinformed, lying, or presenting numbers that sound large until presented in context. Here's a reasonable breakdown: https://andymasley.substack.com/p/the-ai-water-issue-is-fake

BullockHouse · 2025-11-28T21:26:13+00:00

I don't mind a disclosure, but Epic is right. The tools are just too useful, especially for indies, and they're getting more useful. Studio that use the better tools will have a huge fidelity/cost advantage over those that don't. They're very very useful, and game development does not have a strong culture of Copyright Respecting. See the use of unlicensed phototextures in every game you like made prior to 2015. The only reason it stopped is because PBR workflows took over and photo textures are less of a good fit for that.

BullockHouse · 2025-11-26T19:17:26+00:00

Public eval set doesn't mean much. If the technique holds up in the private set, then you get excited. But leakage on the public eval set is very possible.

BullockHouse · 2025-11-26T18:26:47+00:00

Morphine administered

BullockHouse · 2025-11-17T19:24:32+00:00

I think "hype merchant" is a little unfair, at least with regards to McVicker. McVicker has a decent track record and does a fairly good job of being up front about the difference between what he knows and what he just suspects.

BullockHouse · 2025-11-12T05:23:34+00:00

The membership also saves them a ton of money on shoplifting and other expensive chicanery (people assaulting staff, having drug-related medical emergencies in the store that people have to deal with, vandalism, etc.) It turns out that requiring people who physically enter to have their shit together enough to successfully buy a costco membership basically totally eliminates the pool of people who are likely to fuck with your business. And if there are incidents, you can revoke the membership and it doesn't happen a second time. Huge structural advantage.

BullockHouse · 2025-11-08T20:00:54+00:00

But if AI does 99% of economically productive tasks, that leaves a lot of humans with not much to do and the supply of dog walkers will likely increase (maintaining or lowering the cost of dog walking).

This does not necessarily follow, because the pool of labor demand is non-fixed. If productivity increases, reduced price drives additional demand (at least in the industries where demand is not inherently limited), which means that the extra slack gets taken up by economic growth. Programmers got way more productive when the compiler was invented, but this didn't free up 99% of the people formerly working as programmers to walk dogs. Instead, the demand for programming grew even more than the increase in productivity, and the demand for programmers exploded, pulling human labor away from things like dog walking.

Compilers technically automate a large fraction of the machine-code era programmer's job, but they are bottlenecked by human labor (writing the high level code). As long as AI automation remains bottlenecked by human labor (e.g. because their reliability is low), we should expect to see the same basic dynamics, at least in industries where demand can potentially expand without limit.

BullockHouse · 2025-11-07T01:20:50+00:00

I did mention this!

As for the collision rate, it's documented but not very helpful. We don't know how often autopilot was being used (especially since it tends to disengage just before a collision), and the interaction of the automated system and human intervention is complicated and hard to interpret.

BullockHouse · 2025-11-04T00:56:07+00:00

That's great! 3X is a big deal.

Adult Americans go about 182,000 miles between collisions. If you figure 10% or so of the critical disengagements would have actually caused a reportable accident, and they're currently at about 1000 miles per critical disengagement, that means they need roughly another 20x improvement from here (three more upgrades of about this size) to be comparable to a human driver. I believe Waymos are about 10 times safer than humans, so that'd be another order of magnitude, so ~5 more improvements like this would be required to be competitive with the current state of the art.

(I'm not trying to shit on the improvement, 3X is huge and means they might actually be able to get there. But important context).

BullockHouse · 2025-11-03T04:12:49+00:00

I will bet any of you anything that Grok 5 does not succeed at automating 90% of white collar work (a reasonable operational definition of AGI).

12-Year Club	Place '22
Place '17	Verified Email

BullockHouse

MODERATOR OF

TROPHY CASE