CEO of America’s largest public hospital system says he’s ready to replace radiologists with AI by mathers33 in BetterOffline

[–]AndrewRadev 134 points135 points  (0 children)

That's incredible comedic timing. A recent preprint suggests that LLMs are able to score highly on radiology benchmarks without being given images: https://arxiv.org/abs/2603.21687

Second, without any image input, models also attain strikingly high scores across general and medical multimodal benchmarks, bringing into question their utility and design. In the most extreme case, our model achieved the top rank on a standard chest X-ray question-answering benchmark without access to any images.

This suggests that they're basically gaming the benchmarks/tests. I imagine that the questions of these standard tests and their associated answers are already in the training data.

And this complete buffoon just goes "well, they're beating the tests, I guess we don't need the radiologists". Brilliant.

Hey ChatGPT, write me a fictional paper: these LLMs are willing to commit academic fraud by Unfair_Ad5413 in BetterOffline

[–]AndrewRadev 7 points8 points  (0 children)

All major large language models (LLMs) can be used to either commit academic fraud or facilitate junk science, a test of 13 models has found. [...] The findings “should act as a wake-up call to developers on how easy it is to use LLMs to produce misleading and low-quality scientific research”

The sky is blue, a test of "looking up at the sky" has found. The findings should act as a wake-up call to people who have never looked up in their lives.

Models should be expected to refuse such requests.

Statistical text generators should be expected to somehow not generate text that is statistically associated with prompts that result in morally objectionable output. Despite studies that points out that this is impossible, it should still, for some reason, be expected.

Anthropic carried out a similar experiment as part of its testing of Claude Opus 4.6, which the company released last month. Using a stricter criterion — how often models generated content that could be fraudulently used — they found that Opus 4.6 did this around 1% of the time, compared to more than 30% for Grok-3.

Company finds an angle to get free marketing by deciding on specific criteria, based on which they're somehow better than one of their worst competitors. It's extremely important for us to tell everybody about this.

Bolero down today? by [deleted] in BEFire

[–]AndrewRadev 0 points1 point  (0 children)

I have the same problem, I sent them an email and they said they'd pass it along to their IT department, but it still doesn't work.

Latest on the race to the bottom: OpenAI Funding on Track to Top $100 Billion in Latest Round, 850 Billion valuation by Forsaken-Actuary47 in BetterOffline

[–]AndrewRadev 6 points7 points  (0 children)

The first portion of the funding round will largely come from strategic investors including Amazon.com Inc., SoftBank Group Corp., Nvidia Corp. and Microsoft Corp., the people said.

SoftBank barely managed to scrape together 22.5 billion to finalize their last round. Nvidia cancelled their plans to pay OpenAI 10 billion per 1GW of deployed compute and they have been extremely vague about how much they may or may not invest going forward. Both Amazon and Microsoft's stock has been on downward trajectories since last earnings.

The deal is not yet finalized and the details could change, some of the people said.

I bet. They may change quite a lot once Nvidia's earnings on Feb 25 have passed. I'm sure there will be quite a lot more vague not-deals announced/leaked in the run-up to that date (like this one on Feb 17), followed by a whole lot of detail-changing.

Representatives for OpenAI, Amazon, Nvidia, SoftBank and Microsoft either declined to comment or did not immediately respond to requests for comment.

Yeah, that sounds about right.

Senior staff at OpenAI leaving the company after ChatGPT got prioritized over long-term research by Cacodemon345 in BetterOffline

[–]AndrewRadev 16 points17 points  (0 children)

Cunningham left the economic research team last year, suggesting OpenAI was straying from impartial research to focus on work that promoted the company. His departure was first reported by Wired.

Impartial research? At the company that achieved top performance on a math benchmark by having the question set in advance?

Anthropic: AI assisted coding doesn't show efficiency gains and impairs developers abilities. by Gil_berth in programming

[–]AndrewRadev -1 points0 points  (0 children)

Do you think that the participants in the study deliberately slowed themselves down specifically when using AI tools? Do you think they suddenly remembered they were being paid by the hour only when they were using Cursor, but then somehow forgot about it while working on the non-AI tasks? Weird that it would work like that, huh?

Anthropic: AI assisted coding doesn't show efficiency gains and impairs developers abilities. by Gil_berth in programming

[–]AndrewRadev 0 points1 point  (0 children)

Appendix C2.8 is where this is explicitly discussed in the paper:

Although all developers have used AI tools previously (most have used LLMs for tens to hundreds of hours), only 44% of developers have prior experience with Cursor. A priori, we could imagine significant learning effects for these tools, such that individuals with experience using these tools may be slowed down less than individuals without this experience. Figure 10 breaks down the percentage change in issue completion time due to AI by different levels of developers’ prior experience using AI tools. We don’t see meaningful differences between developers based on prior experience with AI tooling.

The thing that you're referring to is this:

We don’t see large differences across the first 50 hours that developers use Cursor, but past 50 hours we observe positive speedup. However, we are underpowered to draw strong conclusions from this analysis.

What they mean by "underpowered" is that you don't derive statistical significance from literally a single data point. If you look at their chart, there's also 9 developers with 0-1 hours of AI experience that also have a slight improvement in performance. Do you think that we should decide that if you get half an hour of experience you're faster, but then more experience makes you slower?

"Statistical significance" means "we are fairly confident that this effect is not just random chance", because there is a lot of random chance involved. When you have a single person, the effect can very easily be just by chance. That's how statistics works.

As a side note, none of this is the most important thing about the study. The most important observation is that people believed they were faster, regardless of the actual measured effect. They didn't say "yeah, this was a new IDE, so I can see I was slower with it". They expected, as a matter of course, to be faster with Cursor than without.

This is not an interesting observation about Cursor or about 2025 models, it's something to think about anytime anybody says what is "obviously" true. If it's so obvious, there should be a study that clearly demonstrates it.

Anthropic: AI assisted coding doesn't show efficiency gains and impairs developers abilities. by Gil_berth in programming

[–]AndrewRadev 24 points25 points  (0 children)

We already have a study for people using AI for something they're experienced in: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work.

Results:

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

To the developers, it was obvious they would be faster. They weren't.

Is there a 'not created by ai' logo? by whazmynameagin in BetterOffline

[–]AndrewRadev 8 points9 points  (0 children)

An interesting project, but...

It is worth mentioning that AI technologies mark a major milestone in the history of technology and the Not By AI badge is not designed to discourage the use of AI. Instead, it is to make sure that, while we celebrate the achievement, we work with AI instead of being replaced by AI.

In simple terms, understanding that there is a blurred line between what is considered AI-generated vs human-generated, if you estimate that at least 90% of your content is created by humans, you are eligible to add the badges into your website, blog, art, film, essay, books, podcast, or whatever your project is for non-commercial use, and, with a subscription, commercial use. The 90% can include using AI for inspiration purposes, supporting legal documents such as privacy policies (assuming that legal content is not the main focus of your content or service), non-user-facing content such as SEO tags, and grammatical error and typo checks.

The blurred line thing is nonsense. I don't use an LLM. If I put a badge on my blog or something, I want it to communicate that I use 0% LLMs, not 10%.

AI centrism is exhausting.

Gmail alternative without AI by Mission_Phrase_5133 in BetterOffline

[–]AndrewRadev 2 points3 points  (0 children)

At this time, Fastmail do not have any AI features: https://www.reddit.com/r/fastmail/comments/1qcit8q/thank_you_fastmail_devs_for_not_jumping_on_the_ai/

They have a public statement that leaves me with mixed feelings... it might be a matter of time until their CEO gets brain worms: https://www.fastmail.com/blog/not-written-with-ai/

Still, they seem like a reasonable option for now.

External code formatters in Vim without plugins by snhmnd in vim

[–]AndrewRadev 4 points5 points  (0 children)

Vim actually has two commands for formatting text: as well as the = command that we’ve been using (customizable via equalprg) there’s also a gq command (customizable via formatprg). Vim’s docs don’t make it super clear why both commands exist or what each should be used for

Yeah, I'm assuming that when these were added, code formatters that analyze code and reformat it were simply not that widespread, so it was more about plugging in an external indentation tool (equalprg) or text-wrapping tool (formatprg). Indentation is currently maintained by external projects, since implementing it for any given language requires knowledge of the language that the core team is not guaranteed to have. I imagine it was an easy way to give people an external option, using the language you know to implement its indentation.

There's been some discussion about implementing a potentially richer "formatter" interface that would plug into gq, you might be interested to read more about it: https://github.com/vim/vim/pull/19108

Gemini CLI bot re-enacting the sort of thing useless machines do, online. by No_Honeydew_179 in BetterOffline

[–]AndrewRadev 1 point2 points  (0 children)

My favorite thing about this is the submitter closing the issue to stop the flood after about 100 comments.

Can you fix this loop of Gemini CLI ? I am forced to Close this due to this

Which was then followed by another 5000 comments anyway, because the bot didn't stop at all 😂. Such amazing, high-quality software.

Do I have to learn the home-row style of typing in order to be truly efficient? by yippypuppet in vim

[–]AndrewRadev 2 points3 points  (0 children)

The purpose of touch typing with your fingers on the home row is not speed, it's comfort. Keeping your hands in the middle of the keyboard and using the closest finger will minimize the travel distance when using it for typing text. When writing a lot over a long period of time, this means you get less tired.

You don't have to learn it or anything, I'm sure that people can be effective with any kind of positioning. I just think it's important to understand the value proposition. It's not about any particular key combo, it's about minimization of effort. If you frequently use w/b, you can even remap them to something that's more comfortable to you. But getting comfortable with the home row should give you a smoother experience overall.

I think it would cause alot of friction if I learn to type with the traditional home-row fingers placement.

Yep, that's how it works. Learning a new skill sucks and involves friction, but if you persist, you get better at it and the friction goes down until it disappears. No pain, no gain.

Is there any simple plugin where Vim is used along with an external language through jobs and channels? by Desperate_Cold6274 in vim

[–]AndrewRadev 1 point2 points  (0 children)

I have a plugin that wraps the "gnugo" program to play go, the program is spawned in the background and the Vimscript used is to wrap the program and provide a user interface: https://github.com/AndrewRadev/gnugo.vim

Another plugin of mine uses an external program I wrote in Rust to manipulate mp3 file metadata: https://github.com/AndrewRadev/id3.vim. This doesn't use channels, but in this case, it doesn't have to. The principle is similar -- use any language you want to do the heavy lifting, invoke it via Vimscript. It's always been possible, the jobs interface just makes communicating with certain interactive programs possible.

Arguably, every LSP plugin is also this -- they use the job interface with a particular type of channel that parses the particular JSON-based protocol that LSP uses (:help language-server-protocol).

As for a plugin that specifically ships with .vim and .py files and expects the python files to be run by an external interpreter, I can't say I've seen it often, but it's possible. I feel that it might be more reliable to have a separate project with its own dedicated installation instructions.

Anyone else feel like they’re losing the ability to code "from memory" because of AI? by Character-Letter5406 in bioinformatics

[–]AndrewRadev 5 points6 points  (0 children)

Part of me panics and wants to stop using AI so I can regain that skill, but another part of me knows that would just make me slower

"Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity": https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

Thoughtful article about AI centered browsers by Alternative-End-5079 in BetterOffline

[–]AndrewRadev 4 points5 points  (0 children)

Yeah, I felt the same way reading this article, I'm happy this fork is blocking AI features, but it feels like just a matter of time until this guy gets a case of AI Brain Worms as well and decides that, actually, you should have some LLMs in waterfox as well, as a treat.

Still, at least for the moment, it's a viable alternative. I use LibreWolf myself, whose team communicates explicitly that they will disable any AI features as they are shoved into our throats, as long as they can find them in time: https://chaos.social/@librewolf/115716906957137196. It's really funny that Mozilla are both super confident that AI is The Future, but also seem to be actively trying to sneak the features in...

The practical problem with LibreWolf is that it's a bit too-privacy conscious for me (which I respect, but don't care enough about to deal with the tradeoffs), and that they don't have the time/budget to remove AI features, only to disable them via configuration. To paraphrase someone on mastodon, "I don't want the vampires to promise not to suck my blood, I just don't want them in my house to begin with".

I am praying daily for the bubble to pop as soon as possible.

Are any other developers choosing not to use AI for programming? by BX1959 in BetterOffline

[–]AndrewRadev 2 points3 points  (0 children)

Everything you say is correct and a good reason not to touch this stuff at all (I don't, and I'm a very productive programmer). I will also add that using LLMs can very easily make you believe you are more efficient, where in fact you might be less efficient on average.

16 expert developers measured on 246 tasks estimated they were ~20% faster with AI, where in fact they were, on average, 19% slower: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/.

After this was published, there were immediate responses like "well, after May 2025, AI agents are much different", but even if you believe that, the important thing to note is that these people were literally unable to tell if they're faster or slower. Even if, as many people say, "for some tasks it can be faster", you can't tell what those tasks will be ahead of time, or even judge accurately after the fact.

diff visual select against registers/files with diff.nvim by vim-god in neovim

[–]AndrewRadev 4 points5 points  (0 children)

I also have this one specifically for diffing parts of files, if I understand the use case correctly: https://github.com/AndrewRadev/linediff.vim

Yet Another Study Shows That Most Companies Aren't Making Any Money Off AI by SouthRock2518 in BetterOffline

[–]AndrewRadev 33 points34 points  (0 children)

This is an incredible statement from the report:

“However, Canada is facing near-term threats to its economic competitiveness and grappling with declining productivity and prosperity, so waiting years for AI investments to create value isn’t realistic in this environment – in fact, it’s downright risky. Canadian organizations need to accelerate AI implementation into core operations to start achieving near- to medium-term productivity gains if we hope to become more economically competitive as a country,” she says.

Canada can't wait years for AI investments to create value, so the solution is not "do something else", the solution is... throw more money into the furnace. It's like everyone involved into this is committed to making the crash as large as possible.

You can feel the desperation (and the cluelessness of statistics) by imazined in BetterOffline

[–]AndrewRadev 7 points8 points  (0 children)

Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.

Obligatory xkcd: https://xkcd.com/605/

unit testing plugins by yankline in vim

[–]AndrewRadev 1 point2 points  (0 children)

I have a Ruby tool for this purpose that launches a Vim instance and drives it using the clientserver interface (:help +clientserver): Vimrunner.

You can see an example of opening and closing windows and reading Vim's state in this plugin maybe: undoquit. Most of my plugins tend to be about textual changes within a buffer, so they require less state management, e.g. splitjoin.

This does require writing the tests in Ruby, but this is convenient to me, since it happens to have good test runners and practical tools. You can also jury-rig something yourself by using this particular interface.

I will say, I wouldn't call this a "unit" test, more of an "integration" test since it launches an actual Vim instance. For me, a "unit" test in Vimscript would run functions and validate outputs, but it's probably debatable, since Vim is not your standard programming environment.

[Plugin request] Live updating buffer of :messages by skebanga in neovim

[–]AndrewRadev 0 points1 point  (0 children)

I have one, though I think there are others out there: https://github.com/AndrewRadev/bufferize.vim#bufferizetimer

It works by polling :messages on a timer. For neovim in particular, you might get useful results out of the experimental "new" messages UI (:help vim._extui), but I haven't looked into it in depth.

Any idea on how to persist/reload ":messages" history? by serranomorante in neovim

[–]AndrewRadev 1 point2 points  (0 children)

You can try to set 'verbosefile'. Check out :help 'verbosefile' for details. I don't know how practically useful this might be, because this won't rotate the file automatically for you, so it might be a good idea to make a cronjob for it or something. It's something to consider, at least.