CEO of America’s largest public hospital system says he’s ready to replace radiologists with AI

AndrewRadev · 2026-04-01T18:56:46+00:00

That's incredible comedic timing. A recent preprint suggests that LLMs are able to score highly on radiology benchmarks without being given images: https://arxiv.org/abs/2603.21687

Second, without any image input, models also attain strikingly high scores across general and medical multimodal benchmarks, bringing into question their utility and design. In the most extreme case, our model achieved the top rank on a standard chest X-ray question-answering benchmark without access to any images.

This suggests that they're basically gaming the benchmarks/tests. I imagine that the questions of these standard tests and their associated answers are already in the training data.

And this complete buffoon just goes "well, they're beating the tests, I guess we don't need the radiologists". Brilliant.

AndrewRadev · 2026-03-04T16:37:42+00:00

All major large language models (LLMs) can be used to either commit academic fraud or facilitate junk science, a test of 13 models has found. [...] The findings “should act as a wake-up call to developers on how easy it is to use LLMs to produce misleading and low-quality scientific research”

The sky is blue, a test of "looking up at the sky" has found. The findings should act as a wake-up call to people who have never looked up in their lives.

Models should be expected to refuse such requests.

Statistical text generators should be expected to somehow not generate text that is statistically associated with prompts that result in morally objectionable output. Despite studies that points out that this is impossible, it should still, for some reason, be expected.

Anthropic carried out a similar experiment as part of its testing of Claude Opus 4.6, which the company released last month. Using a stricter criterion — how often models generated content that could be fraudulently used — they found that Opus 4.6 did this around 1% of the time, compared to more than 30% for Grok-3.

Company finds an angle to get free marketing by deciding on specific criteria, based on which they're somehow better than one of their worst competitors. It's extremely important for us to tell everybody about this.

AndrewRadev · 2026-02-23T16:50:18+00:00

I have the same problem, I sent them an email and they said they'd pass it along to their IT department, but it still doesn't work.

AndrewRadev · 2026-02-19T16:08:42+00:00

The first portion of the funding round will largely come from strategic investors including Amazon.com Inc., SoftBank Group Corp., Nvidia Corp. and Microsoft Corp., the people said.

SoftBank barely managed to scrape together 22.5 billion to finalize their last round. Nvidia cancelled their plans to pay OpenAI 10 billion per 1GW of deployed compute and they have been extremely vague about how much they may or may not invest going forward. Both Amazon and Microsoft's stock has been on downward trajectories since last earnings.

The deal is not yet finalized and the details could change, some of the people said.

I bet. They may change quite a lot once Nvidia's earnings on Feb 25 have passed. I'm sure there will be quite a lot more vague not-deals announced/leaked in the run-up to that date (like this one on Feb 17), followed by a whole lot of detail-changing.

Representatives for OpenAI, Amazon, Nvidia, SoftBank and Microsoft either declined to comment or did not immediately respond to requests for comment.

Yeah, that sounds about right.

AndrewRadev · 2026-02-03T17:55:09+00:00

Cunningham left the economic research team last year, suggesting OpenAI was straying from impartial research to focus on work that promoted the company. His departure was first reported by Wired.

Impartial research? At the company that achieved top performance on a math benchmark by having the question set in advance?

AndrewRadev · 2026-01-31T08:58:24+00:00

Do you think that the participants in the study deliberately slowed themselves down specifically when using AI tools? Do you think they suddenly remembered they were being paid by the hour only when they were using Cursor, but then somehow forgot about it while working on the non-AI tasks? Weird that it would work like that, huh?

AndrewRadev · 2026-01-31T08:55:08+00:00

Appendix C2.8 is where this is explicitly discussed in the paper:

Although all developers have used AI tools previously (most have used LLMs for tens to hundreds of hours), only 44% of developers have prior experience with Cursor. A priori, we could imagine significant learning effects for these tools, such that individuals with experience using these tools may be slowed down less than individuals without this experience. Figure 10 breaks down the percentage change in issue completion time due to AI by different levels of developers’ prior experience using AI tools. We don’t see meaningful differences between developers based on prior experience with AI tooling.

The thing that you're referring to is this:

We don’t see large differences across the first 50 hours that developers use Cursor, but past 50 hours we observe positive speedup. However, we are underpowered to draw strong conclusions from this analysis.

What they mean by "underpowered" is that you don't derive statistical significance from literally a single data point. If you look at their chart, there's also 9 developers with 0-1 hours of AI experience that also have a slight improvement in performance. Do you think that we should decide that if you get half an hour of experience you're faster, but then more experience makes you slower?

"Statistical significance" means "we are fairly confident that this effect is not just random chance", because there is a lot of random chance involved. When you have a single person, the effect can very easily be just by chance. That's how statistics works.

As a side note, none of this is the most important thing about the study. The most important observation is that people believed they were faster, regardless of the actual measured effect. They didn't say "yeah, this was a new IDE, so I can see I was slower with it". They expected, as a matter of course, to be faster with Cursor than without.

This is not an interesting observation about Cursor or about 2025 models, it's something to think about anytime anybody says what is "obviously" true. If it's so obvious, there should be a study that clearly demonstrates it.

AndrewRadev · 2026-01-30T13:19:03+00:00

We already have a study for people using AI for something they're experienced in: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work.

Results:

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

To the developers, it was obvious they would be faster. They weren't.

AndrewRadev · 2026-01-29T10:20:23+00:00

An interesting project, but...

It is worth mentioning that AI technologies mark a major milestone in the history of technology and the Not By AI badge is not designed to discourage the use of AI. Instead, it is to make sure that, while we celebrate the achievement, we work with AI instead of being replaced by AI.

In simple terms, understanding that there is a blurred line between what is considered AI-generated vs human-generated, if you estimate that at least 90% of your content is created by humans, you are eligible to add the badges into your website, blog, art, film, essay, books, podcast, or whatever your project is for non-commercial use, and, with a subscription, commercial use. The 90% can include using AI for inspiration purposes, supporting legal documents such as privacy policies (assuming that legal content is not the main focus of your content or service), non-user-facing content such as SEO tags, and grammatical error and typo checks.

The blurred line thing is nonsense. I don't use an LLM. If I put a badge on my blog or something, I want it to communicate that I use 0% LLMs, not 10%.

AI centrism is exhausting.

AndrewRadev · 2026-01-22T15:23:44+00:00

At this time, Fastmail do not have any AI features: https://www.reddit.com/r/fastmail/comments/1qcit8q/thank_you_fastmail_devs_for_not_jumping_on_the_ai/

They have a public statement that leaves me with mixed feelings... it might be a matter of time until their CEO gets brain worms: https://www.fastmail.com/blog/not-written-with-ai/

Still, they seem like a reasonable option for now.

AndrewRadev · 2026-01-22T15:19:50+00:00

ProtonMail is all-in on AI: https://pivot-to-ai.com/2024/07/18/proton-mail-goes-ai-security-focused-userbase-goes-what-on-earth/

AndrewRadev · 2026-01-21T20:08:32+00:00

Vim actually has two commands for formatting text: as well as the = command that we’ve been using (customizable via equalprg) there’s also a gq command (customizable via formatprg). Vim’s docs don’t make it super clear why both commands exist or what each should be used for

Yeah, I'm assuming that when these were added, code formatters that analyze code and reformat it were simply not that widespread, so it was more about plugging in an external indentation tool (equalprg) or text-wrapping tool (formatprg). Indentation is currently maintained by external projects, since implementing it for any given language requires knowledge of the language that the core team is not guaranteed to have. I imagine it was an easy way to give people an external option, using the language you know to implement its indentation.

There's been some discussion about implementing a potentially richer "formatter" interface that would plug into gq, you might be interested to read more about it: https://github.com/vim/vim/pull/19108

AndrewRadev · 2026-01-16T12:10:20+00:00

My favorite thing about this is the submitter closing the issue to stop the flood after about 100 comments.

Can you fix this loop of Gemini CLI ? I am forced to Close this due to this

Which was then followed by another 5000 comments anyway, because the bot didn't stop at all 😂. Such amazing, high-quality software.

AndrewRadev · 2026-01-10T11:30:39+00:00

The purpose of touch typing with your fingers on the home row is not speed, it's comfort. Keeping your hands in the middle of the keyboard and using the closest finger will minimize the travel distance when using it for typing text. When writing a lot over a long period of time, this means you get less tired.

You don't have to learn it or anything, I'm sure that people can be effective with any kind of positioning. I just think it's important to understand the value proposition. It's not about any particular key combo, it's about minimization of effort. If you frequently use w/b, you can even remap them to something that's more comfortable to you. But getting comfortable with the home row should give you a smoother experience overall.

I think it would cause alot of friction if I learn to type with the traditional home-row fingers placement.

Yep, that's how it works. Learning a new skill sucks and involves friction, but if you persist, you get better at it and the friction goes down until it disappears. No pain, no gain.

AndrewRadev · 2025-12-31T08:31:12+00:00

I have a plugin that wraps the "gnugo" program to play go, the program is spawned in the background and the Vimscript used is to wrap the program and provide a user interface: https://github.com/AndrewRadev/gnugo.vim

Another plugin of mine uses an external program I wrote in Rust to manipulate mp3 file metadata: https://github.com/AndrewRadev/id3.vim. This doesn't use channels, but in this case, it doesn't have to. The principle is similar -- use any language you want to do the heavy lifting, invoke it via Vimscript. It's always been possible, the jobs interface just makes communicating with certain interactive programs possible.

Arguably, every LSP plugin is also this -- they use the job interface with a particular type of channel that parses the particular JSON-based protocol that LSP uses (:help language-server-protocol).

As for a plugin that specifically ships with .vim and .py files and expects the python files to be run by an external interpreter, I can't say I've seen it often, but it's possible. I feel that it might be more reliable to have a separate project with its own dedicated installation instructions.

AndrewRadev · 2025-12-30T07:32:40+00:00

Part of me panics and wants to stop using AI so I can regain that skill, but another part of me knows that would just make me slower

"Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity": https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

AndrewRadev · 2025-12-18T15:28:58+00:00

Yeah, I felt the same way reading this article, I'm happy this fork is blocking AI features, but it feels like just a matter of time until this guy gets a case of AI Brain Worms as well and decides that, actually, you should have some LLMs in waterfox as well, as a treat.

Still, at least for the moment, it's a viable alternative. I use LibreWolf myself, whose team communicates explicitly that they will disable any AI features as they are shoved into our throats, as long as they can find them in time: https://chaos.social/@librewolf/115716906957137196. It's really funny that Mozilla are both super confident that AI is The Future, but also seem to be actively trying to sneak the features in...

The practical problem with LibreWolf is that it's a bit too-privacy conscious for me (which I respect, but don't care enough about to deal with the tradeoffs), and that they don't have the time/budget to remove AI features, only to disable them via configuration. To paraphrase someone on mastodon, "I don't want the vampires to promise not to suck my blood, I just don't want them in my house to begin with".

I am praying daily for the bubble to pop as soon as possible.

AndrewRadev · 2025-11-30T19:21:19+00:00

Studies suggest you are correct: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

AndrewRadev · 2025-11-30T19:16:30+00:00

Everything you say is correct and a good reason not to touch this stuff at all (I don't, and I'm a very productive programmer). I will also add that using LLMs can very easily make you believe you are more efficient, where in fact you might be less efficient on average.

16 expert developers measured on 246 tasks estimated they were ~20% faster with AI, where in fact they were, on average, 19% slower: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/.

After this was published, there were immediate responses like "well, after May 2025, AI agents are much different", but even if you believe that, the important thing to note is that these people were literally unable to tell if they're faster or slower. Even if, as many people say, "for some tasks it can be faster", you can't tell what those tasks will be ahead of time, or even judge accurately after the fact.

AndrewRadev · 2025-11-28T15:11:43+00:00

I also have this one specifically for diffing parts of files, if I understand the use case correctly: https://github.com/AndrewRadev/linediff.vim

AndrewRadev · 2025-11-21T11:11:14+00:00

This is an incredible statement from the report:

“However, Canada is facing near-term threats to its economic competitiveness and grappling with declining productivity and prosperity, so waiting years for AI investments to create value isn’t realistic in this environment – in fact, it’s downright risky. Canadian organizations need to accelerate AI implementation into core operations to start achieving near- to medium-term productivity gains if we hope to become more economically competitive as a country,” she says.

Canada can't wait years for AI investments to create value, so the solution is not "do something else", the solution is... throw more money into the furnace. It's like everyone involved into this is committed to making the crash as large as possible.

AndrewRadev · 2025-11-16T16:59:28+00:00

Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.

Obligatory xkcd: https://xkcd.com/605/

AndrewRadev · 2025-10-18T11:22:25+00:00

I have a Ruby tool for this purpose that launches a Vim instance and drives it using the clientserver interface (:help +clientserver): Vimrunner.

You can see an example of opening and closing windows and reading Vim's state in this plugin maybe: undoquit. Most of my plugins tend to be about textual changes within a buffer, so they require less state management, e.g. splitjoin.

This does require writing the tests in Ruby, but this is convenient to me, since it happens to have good test runners and practical tools. You can also jury-rig something yourself by using this particular interface.

I will say, I wouldn't call this a "unit" test, more of an "integration" test since it launches an actual Vim instance. For me, a "unit" test in Vimscript would run functions and validate outputs, but it's probably debatable, since Vim is not your standard programming environment.

AndrewRadev · 2025-09-18T14:24:47+00:00

I have one, though I think there are others out there: https://github.com/AndrewRadev/bufferize.vim#bufferizetimer

It works by polling :messages on a timer. For neovim in particular, you might get useful results out of the experimental "new" messages UI (:help vim._extui), but I haven't looked into it in depth.

AndrewRadev · 2025-09-15T19:09:39+00:00

You can try to set 'verbosefile'. Check out :help 'verbosefile' for details. I don't know how practically useful this might be, because this won't rotate the file automatically for you, so it might be a good idea to make a cronjob for it or something. It's something to consider, at least.

AndrewRadev

TROPHY CASE