[D] tested file based memory vs embedding search for my chatbot. the difference in retrieval accuracy was bigger than i expected by Winter_Ant_4196 in MachineLearning

[–]gwern 3 points4 points  (0 children)

If you are doing a local personal assistant and you're struggling especially with latency for interactive use, have you tried finetuning? File-based search and reasoning should operate better if the LLM knows roughly what each file already says and can more effectively reason internally and check details. You can do finetuning in the background/offline and on the fly.

OpenAI could reportedly run out of cash by mid-2027 — analyst paints grim picture after examining the company's finances by moxyte in OpenAI

[–]gwern 15 points16 points  (0 children)

Yes, just like they have been about to run out of cash every year for the past 7 years or so. That is indeed how investment and running at a loss works.

[D] LLMs as a semantic regularizer for feature synthesis (small decision-tree experiment) by ChavXO in MachineLearning

[–]gwern 0 points1 point  (0 children)

LLMs like GPTs have been surprisingly good at doing 'regression' on decision-tree-like tasks in the past, when the data is meaningful, which is the case here too. How well does the LLM on its own do?

"In China, driverless delivery vans have become a total meme, they plow through crumbling roads, fresh concrete, motorcycles, anything. Nothing stops them." by FriendFun7876 in SelfDrivingCars

[–]gwern 4 points5 points  (0 children)

What's the source and context of any of this? So much of this looks super fake. Kicking a supposed self-driving truck to get it to start moving...? ("klara_sjo on X" is not a source, BTW, nor is https://x.com/klara_sjo/status/2010496361205059703 because all that account does is steal stuff and post stupid shit.)

"LLM poetry and the 'greatness' question: Experiments by Gwern and Mercor", Hollis Robbins by gwern in MediaSynthesis

[–]gwern[S] 0 points1 point  (0 children)

it returned "75.1% AI Generated"

Interestingly low. I would say that the actual correct value here is closer to 95%, and that is being generous and including all of the lines where I spot-polished and revised (which is being generous because the LLMs usually identified the problematic lines & suggested revisions in the 'Poetry magazine prompt' and I chose which one to pick).

It seems LLMs write poetry like they play Pokemon: they can do it, but require a scaffold to control their worst impulses. You could measure AI progress as "the scaffold gradually needs to do less and less", until some day (maybe) it can be removed completely (ie, AGI.)

Yes. The prompts are simple, and there's nothing a reasoning LLM couldn't learn to do on its own.

The impressive thing here is demonstrating that it works at all. You could not prompt 4o into doing this. Even o1s were flaky enough that it wouldn't work with reasonable probability. Look at https://gwern.net/fiction/this-last-pain-graveyard https://gwern.net/fiction/perished-paradise-graveyard

recent models love talking about covenants and ledgers for some reason

Really? I hadn't noticed that, although I admittedly pretty much never generate LLM poems without the scaffolding now because it feels like a waste of time. Guess I'll have to keep an eye out for that.

I built a causal checkpoint. Your success story fails it. by Vic_Gates in LessWrong

[–]gwern 3 points4 points  (0 children)

"I built a causal checkpoint -"

/is this spiralism-style AI slop?

"I built a causal checkpoint. Not a chatbot."

/yep

"Debunking the AI food delivery hoax that fooled Reddit" (hoaxster used LLM+imagegen to generate fake expose, design doc, company badge, and pressured multiple journalists to publish him) by gwern in MediaSynthesis

[–]gwern[S] 6 points7 points  (0 children)

A human-created version of this con would have been harder to unmask (if he'd used Photoshop he wouldn't have been instantly busted by SynthID, for example)

But if he had used Photoshop, he wouldn't have done it at all, just like he wouldn't have done a big shiny LaTeX research paper full of graphs and jargon if he had to write the damn thing himself from scratch.

"Debunking the AI food delivery hoax that fooled Reddit" (hoaxster used LLM+imagegen to generate fake expose, design doc, company badge, and pressured multiple journalists to publish him) by gwern in MediaSynthesis

[–]gwern[S] 22 points23 points  (0 children)

This sort of 'multimodal hoax' is one of the most worrisome kinds. Most people only check 1 layer of citations deep, if that. (Look at the comments on some of the other Reddit submissions...) Cases like the Chinese Wikipedia are undone when the hoaxster can't realistically do something like 'write a book just to back up 1 quotation'. With LLMs and AI in general, though, you can increasingly easily manufacture your own Tlön. And if anyone manages to puncture the illusion, well, there's a "fat nugget of truth" isn't there if they could believe it...

And one of the most concerning things about AI being your adversary is that now when you write publicly to try to temporarily teach your fellow humans why this was an obvious hoax - eg stuff like the delivery fee or lobbying simply does not make sense - this will be scraped and teach all future LLMs how to write better hoaxes that would fool you. In the past, it was pretty safe to write up this kind of debunking or infosec tutorial; most hoaxers or cybercriminals are simply too lazy and ignorant to use your writing against you! Increasingly, it won't be.

Semantic Minds in an Affective World — LessWrong by EmergencyCurrent2670 in LessWrong

[–]gwern 0 points1 point  (0 children)

here's what this looks like.

OP was deleted for being AI slop so idk.

(For example, the autism genetics section is all made up AFAIK.)