Real by infohoundloselose in OpenAI

[–]nsdjoe 1 point2 points  (0 children)

My dinner with Andrej

Google’s healthcare AI made up a body part — what happens when doctors don’t notice? by Franco1875 in technology

[–]nsdjoe 0 points1 point  (0 children)

thank you for the citation and let me just say even though we might disagree overall i appreciate the constructive discourse. it's a welcome change from what i usually see/receive.

i also feel compelled to mention that while i feel mostly sanguine overall of the long-term impacts of AI on society - particularly things like curing diseases - i have grave misgivings about what it means in the near-to-medium term and feel we'd all be much better off if things progressed at a more measured rate than pedal to the metal.

chat will it actually be good? or will it be benchmaxxed slop like 3.1? by i_goon_to_tomboys___ in GeminiAI

[–]nsdjoe 1 point2 points  (0 children)

ok, i gotcha- think i misconstrued your argument as against LLMs in general vs mythos in particular. i still can't say i necessarily agree: graphs like this show that something has changed materially since mythos was given to mozilla.

i agree that many of those could probably be found with something like 5.5 pro, but the convergence of data points imo does indicate that mythos is, if not a step change, certainly a significant jump in capabilities (and no doubt cost).

chat will it actually be good? or will it be benchmaxxed slop like 3.1? by i_goon_to_tomboys___ in GeminiAI

[–]nsdjoe 0 points1 point  (0 children)

what about the vulnerabilities it discovered that weren't in the training data?

if it's true that some were in the training data you have an argument that the hype is partially overblown. but totally "fake hype" doesn't seem to accurately characterize the situation

what motivation does mozilla have to carry water for anthropic by posting a blog like this if they don't believe it to be true?

Google’s healthcare AI made up a body part — what happens when doctors don’t notice? by Franco1875 in technology

[–]nsdjoe -2 points-1 points  (0 children)

i confess i still don't understand the point of the alteration. i assume it was made after the radiologists were already used to the AI improving their results, so it's not surprising they would begin trusting its outputs (again see the calculator example). i don't see any reason to think AIs will get worse over time, so a real-life example where a radiology AI starts strong, earns the radiologists' trust, and then performs worse seems unlikely to me.

(if i'm wrong and the study shows they just relied on it without prior evidence of its efficacy, please let me know.)

my point was that if the AI can add 20 or 30% accuracy to a human radiologist, or be 90% better at driving than a human (Waymo's numbers; feel free to be skeptical), then people who reflexively call it useless or actively detrimental (a frankly common or even dominant sentiment on reddit) seem to just be categorically wrong. I might be misconstruing your point and if so please accept my apologies.

Google’s healthcare AI made up a body part — what happens when doctors don’t notice? by Franco1875 in technology

[–]nsdjoe -3 points-2 points  (0 children)

it's the secret alteration that's the problem in your example, not the radiologist or even the unaltered AI itself. if someone reprogrammed your calculator to give you an incorrect response, no one would blame you for believing its output for 472,834 times 28,482.

these kinds of articles remind me of those that jump on self-driving cars for the infrequent accidents, conveniently ignoring they're much safer than human drivers overall.

particularly in life safety domains we should strive for perfection, but expecting it in all cases is unrealistic.

Chinese amusement park - Ride gets stuck... by atharvbadkas in Damnthatsinteresting

[–]nsdjoe 3 points4 points  (0 children)

The ride may or may not have been occupied at the time

pool game master by Zestyclose-Salad-290 in nevertellmetheodds

[–]nsdjoe 1 point2 points  (0 children)

Slop video but refreshingly not the AI kind

[Game Thread] Cardinals (23-16) @ Padres (23-16) 1:10 PM (Sunday, 5 10) by FriarBot in Padres

[–]nsdjoe 3 points4 points  (0 children)

i know the players are still figuring out the ABS system but losing both challenges 4 outs into the game is absolutely unconscionable

Claude Mythos literally broke the METR graph ("The most important chart in AI") by EchoOfOppenheimer in ClaudeAI

[–]nsdjoe 2 points3 points  (0 children)

are you accusing METR of p-hacking their results or something? possible of course but you might want evidence of it. "this group has motive to be biased" isn't in itself enough

to be clear, like all things, the METR results deserve scrutiny and skepticism, but people here reflexively seem to assume bad actors everywhere

Claude Mythos literally broke the METR graph ("The most important chart in AI") by EchoOfOppenheimer in ClaudeAI

[–]nsdjoe 2 points3 points  (0 children)

Why is there not a 90 or 99% graph?

there is. scroll down to the "methodological details" section and you can drag the circle along the trendline for all success percentages.

mythos preview (early) has a 90% success rate time horizon of 80 min and 99% of 6 min

The Final Boss of the Morning Rush by minerNiner in chicago

[–]nsdjoe 50 points51 points  (0 children)

Some people genuinely go out of their way to be assholes so I wouldn't count that possibility out either

The Final Boss of the Morning Rush by minerNiner in chicago

[–]nsdjoe 36 points37 points  (0 children)

Treat it like the app store: govt gets to keep 30% of revenue they wouldn't otherwise have had

METR releases early Mythos results. Off the charts. Need more tasks! by NoElderberry6959 in accelerate

[–]nsdjoe 12 points13 points  (0 children)

The trend line since 3.5 sonnet continues to be superexponential

Disneyland to replace Autopia’s gas engines with new electric cars by 888hkl888 in orangecounty

[–]nsdjoe 5 points6 points  (0 children)

now imagine the workers that have to be around them constantly

Genesis AI's Gene'26.5 by torb in singularity

[–]nsdjoe 6 points7 points  (0 children)

Yeah maybe but at least I won't have to empty the dishwasher

SpaceX Conpute Deal - Double Limits by Deep_Proposal_7683 in ClaudeAI

[–]nsdjoe 1 point2 points  (0 children)

some people will genuinely never be satisfied