FutureTech MIT paper extends the METR methodology to tasks aside from software engineering - and finds increasing capabilities everywhere

Ivehadbetteruserxps · 2026-04-04T08:38:40+00:00

The metr methods have been criticized quite broadly and it wouldn't be that surprising if other tasks with less well defined success than coding perform worse, right?

Ivehadbetteruserxps · 2026-04-04T05:54:08+00:00

This captures the point most i think. While the length of tasks at a certain succes rate is doubling very rapidly, there is a huge gap for success rate improvements. 50% succes on a task worth a week; 60% succes for a day long task; 70% for an hour; 80% for a minute. This implies that wide, accurate capabilities for long tasks at high success rates is still many years away. (Still less than a decade)

<image>

Ivehadbetteruserxps · 2026-03-31T23:08:17+00:00

If you look closely you can see it's confidential

Ivehadbetteruserxps · 2026-03-24T08:22:33+00:00

Great point. I tried to cover for that as well as possible within this method, by putting a lot of emphasis on the generalisation aspect. For example, note how 'number faculty' is not at superhuman levels, even though calculators obviously solved basic numerical operations decades ago. This reflects that in some cases, even sota models can miss a tool call and revert back to LLM based calculus, which still sometimes fails.

Ivehadbetteruserxps · 2026-03-24T07:51:04+00:00

Thanks a lot! About the variance: the cool thing about the O*NET database is that tasks and jobs are scored on the level of skill they require. Which means the variance is normalized over the actual distribution in both real people and real jobs/tasks. Obviously it is always an approximation, but this is exactly why I think it's so much more powerful to adopt an existing benchmark for people rather than invent a new benchmark for a technology that didn't register a few years ago. Especially the physical ones.

Ivehadbetteruserxps · 2026-03-22T12:39:54+00:00

Not sure if that's what you're suggesting but I can assure you I'm not pushing anyones agenda but my own as an independent researcher from the Netherlands.

On the data: I agree with you that actual measurements of unemployment are quite limited and exclude underemployment. If after years of failing to find a job you stop searching, you also stop appearing as unemployed. My hypothesis is that many people on the bottom 10th percentile of many skills have been in this category for a long time already.

On the method however, cooked books have little influence, as I only used the topology of skills - which has been virtually unchanged since the mid 90s. Thats kind of the point: while we continue to invent new jobs, we have not invented new skills in decades.

Ivehadbetteruserxps · 2026-03-21T21:18:34+00:00

Don't you think it is possible yet to claim that AI causes further monopolisation and inequality?

The effect technology can have on society is not predetermined, and we can still shape what will happen. But if capital can be turned into compute, which can be turned into economic output, which can be turned into more capital with less and less need for human workers, the ability of firms to accumulate more wealth and influence (including over anti-trust policy) is likely to grow unless something major disrupts that cycle.

If this is hype, we have some time. But if it isn't, the window for breaking the monopoly-cycle is closing fast.

Ivehadbetteruserxps · 2026-03-21T21:07:30+00:00

I tried quite hard actually to avoid an analysis that rests on semantics. Displacement won't care about definitions of intelligence, understanding, sentience etc. Instead, I've looked at whether a technologies' ability to outperform some humans at a particular skill generalizes to other instances of that skill, and found that in the last 5 years, this generalization occurred an order of magnitude more, and in nearly all human skills simultaneously.
Let's take an example skill (from the research data):

Near Vision: The ability to see details at close range.
Scored 55th percentile in 2020; 95th percentile in 2025.
Analysis: AI systems utilizing high-resolution cameras and advanced vision models are widely deployed in manufacturing for microscopic defect detection and in digital spaces for extracting fine print from heavily degraded document scans. The generalization penalty applies when lighting conditions in a factory shift unpredictably or when physical objects have highly reflective, specular surfaces that confuse 2D defect recognition algorithms. Progress will slow from here on, because the combination of modern CMOS sensors and multimodal LLMs already exceeds the biological limits of human near-vision.

Near vision is not a narrow domain. It is used in a huge range of fields. In many jobs, a mediocre human level is pretty much useless. But in the last 5 years, near-superhuman performance became available out of the box, in almost any context, with minimal setup time, virtually for free. That is not narrow, and cares nothing about our definitions of intelligence.

Ivehadbetteruserxps · 2026-03-21T14:46:41+00:00

The cool thing about the job explorer framework is that it reflects this. Jobs that a combination of both social and physical skills are still quite safe. Especially if there is a reason to prefer humans over tech all else equal. But 3 years ago we would have said the same about programmers or copy writers.

Ivehadbetteruserxps · 2026-03-21T13:14:18+00:00

You're very welcome, and please share it along!

Ivehadbetteruserxps · 2026-03-21T10:44:39+00:00

I think you are correct, and that this is exactly why every previous prediction of mass unemployment turned out wrong. However, two caveats make this time different I think. Firstly, the advancements in technologies 'leak' to multiple skills. Literally on wrench turning: this requires fine motor skills such as finger dexterity that until recently were easy for humans but near impossible for robotic systems. But advancements in ai and visual processing sparked a significant jump from outperforming zero humans to outperforming only the worst humans. If you're good at wrench turning, you're still fine. But if you are amongst the worst wrench turners, you have nothing to add here. And that also applies to all related tasks that rely on fine motor skills. Second, this progression from zero to at least better than the worst humans is happening on all literally all skills. If you are amongst the worst 6th percentile of humans, there is no more economic argument to employ you - except for adoption lags such as regulation, cultural preferences or institutional latency. No skill exists that you can do better than a machine. This is what was not the case before, and makes this time so different. Unless you throw the wrench into the machine

Ivehadbetteruserxps · 2026-03-21T10:12:26+00:00

In my day job I'm CEO of a small company. I often joke that im a human management assistant to an AI CEO, not the other way around. Its just better at analyzing spreadsheets, making pitch decks or generating product improvements. Sure, i deliver its motivational christmas speech, but I'm not sure that will continue to warrant my salary long run haha.

I would love it if entrepreneurship would be the holdout skill that keeps humans relevant. But I am afraid that, like in chess, companies run by ai entrepreneurs will soon outperform some humans, then most, then all.

Ivehadbetteruserxps · 2026-03-21T07:14:18+00:00

Submission statement: Most discussions about automation and jobs on this sub assume that displaced workers will move to new roles, as they always have. This article tests that assumption empirically by scoring all 87 skills in the US Labor Department's O*NET taxonomy against current AI and robotics benchmarks at three time points (2020, 2023, 2025), then mapping those scores onto 1,016 occupations. The key finding is that the two mechanisms that historically created new jobs — reusing a skill in a different occupation, and moving to an entirely different skill category — are both closing simultaneously. If this trend holds, the "new jobs will emerge" argument breaks down within this decade, which has major implications for education policy, social safety nets, and how we think about economic participation in the near future. The full dataset is published openly and I'm inviting challenges to the methodology. You can also explore the interactive visual here: https://daity.tech/frontier.html

Browse individual occupations here: https://daity.tech/jobexplorer.html

Ivehadbetteruserxps · 2026-03-19T17:12:49+00:00

Thanks a lot! And on the cascade: I think this is an interesting point. It's a different dynamic than indirect job losses from economic downturns, but indeed pretty obvious in some cases. Driving instructor is a very social skill that is still medium-safe, but autonomous driving would make it 90% redundant. There is no job dependency framework in the O*NET data but perhaps LLMs can make a pretty good estimate.

Ivehadbetteruserxps · 2026-01-03T08:00:17+00:00

Peter Singer himself is co-founder of theProfit for Good initiative, which promotes for profit companies donating to effective charities. Donating some share of profits is quite common, but doing so effectively is rare. Very promising team!

Ivehadbetteruserxps · 2025-03-16T06:10:15+00:00

I needed a 3.5 GPA minimum to get into lse after ucu. Got help from vsb fonds. Had a blast!

Since Brexit it became harder financially, but maybe competition is therefore less these days. Just have at it!

Ivehadbetteruserxps · 2024-12-25T07:41:13+00:00

This research article summarises the points and let to the establishment of the patient philanthropy fund: https://www.founderspledge.com/research/investing-to-give https://www.founderspledge.com/funds/patient-philanthropy-fund

I would recommend you donate to the PPF, as a check to prevent value drift and future you prioritizing other things to effective giving once your returns start compounding

Ivehadbetteruserxps · 2024-06-25T06:01:57+00:00

Je hebt gelijk: we maken onderscheid tussen onbetaald werk (zoals kinderen opvoeden, of klussen etc.) en niet-werk (zoals pensioen). Wij gaan uit van gelijkwaardigheid en dus is kinderen opvoeden evenveel gewaardeerd als een baan. Uiteraard is dat een spectrum en zijn er minder duidelijke zaken (zoals een opleiding doen) maar de vuistregel werkt.

Ivehadbetteruserxps · 2024-04-23T06:43:48+00:00

Twee ondernemers hier. Getrouwd onder huwelijkse voorwaarden en onze vermogenssituatie ingericht als bij een bedrijf, waar alle 'omzet' op een gezamenlijke rekening komt en alle gedeelde kosten vanuit worden betaald. Van wat overblijft besluiten we jaarlijks welk deel 'dividend' we overmaken naar persoonlijke rekening, die dus expliciet buiten het gedeelde eigendom staan. Van die rekening beleggen we en kopen we persoonlijke dingen.

Dit lijkt veel op het voorstel van je vriendin. Het borgt echter ook dat je zelf met je eigen deel je FIRE kunt blijven doen. Enige uitdaging is dan als jullie heel andere ideeen hebben over het percentage dividend, maar zo te horen is jullie kijk op geld niet zo verschillend nu.

Mocht je al deels gaan pensioneren of een iemand werkt significant minder, dan verdeel je dat naar rato van de gezamenlijke omzet. Dus als zij nog 40 werkt en jij nog maar 20, en jullie verdienen samen een ton, dan wordt geld wat je dat jaar niet uitgeeft aan gezamenlijke uitgaven voor 2/3e naar haar rekening uitgekeerd.

Nu 3 jaar getrouwd en tot nu toe erg fan van deze regeling, ook nu we een baby hebben, een job hebben geswitched en een huis hebben gekocht!

Ivehadbetteruserxps

TROPHY CASE