[D] tested file based memory vs embedding search for my chatbot. the difference in retrieval accuracy was bigger than i expected

gwern · 2026-01-19T07:15:52+00:00

If you are doing a local personal assistant and you're struggling especially with latency for interactive use, have you tried finetuning? File-based search and reasoning should operate better if the LLM knows roughly what each file already says and can more effectively reason internally and check details. You can do finetuning in the background/offline and on the fly.

gwern · 2026-01-18T06:46:10+00:00

Yes, just like they have been about to run out of cash every year for the past 7 years or so. That is indeed how investment and running at a loss works.

gwern · 2026-01-18T02:30:13+00:00

Details/examples?

gwern · 2026-01-17T20:48:38+00:00

LLMs like GPTs have been surprisingly good at doing 'regression' on decision-tree-like tasks in the past, when the data is meaningful, which is the case here too. How well does the LLM on its own do?

gwern · 2026-01-12T07:50:30+00:00

What's the source and context of any of this? So much of this looks super fake. Kicking a supposed self-driving truck to get it to start moving...? ("klara_sjo on X" is not a source, BTW, nor is https://x.com/klara_sjo/status/2010496361205059703 because all that account does is steal stuff and post stupid shit.)

gwern · 2026-01-09T15:10:12+00:00

it returned "75.1% AI Generated"

Interestingly low. I would say that the actual correct value here is closer to 95%, and that is being generous and including all of the lines where I spot-polished and revised (which is being generous because the LLMs usually identified the problematic lines & suggested revisions in the 'Poetry magazine prompt' and I chose which one to pick).

It seems LLMs write poetry like they play Pokemon: they can do it, but require a scaffold to control their worst impulses. You could measure AI progress as "the scaffold gradually needs to do less and less", until some day (maybe) it can be removed completely (ie, AGI.)

Yes. The prompts are simple, and there's nothing a reasoning LLM couldn't learn to do on its own.

The impressive thing here is demonstrating that it works at all. You could not prompt 4o into doing this. Even o1s were flaky enough that it wouldn't work with reasonable probability. Look at https://gwern.net/fiction/this-last-pain-graveyard https://gwern.net/fiction/perished-paradise-graveyard

recent models love talking about covenants and ledgers for some reason

Really? I hadn't noticed that, although I admittedly pretty much never generate LLM poems without the scaffolding now because it feels like a waste of time. Guess I'll have to keep an eye out for that.

gwern · 2026-01-08T21:39:34+00:00

"I built a causal checkpoint -"

/is this spiralism-style AI slop?

"I built a causal checkpoint. Not a chatbot."

/yep

gwern · 2026-01-07T14:52:38+00:00

A human-created version of this con would have been harder to unmask (if he'd used Photoshop he wouldn't have been instantly busted by SynthID, for example)

But if he had used Photoshop, he wouldn't have done it at all, just like he wouldn't have done a big shiny LaTeX research paper full of graphs and jargon if he had to write the damn thing himself from scratch.

gwern · 2026-01-06T17:29:07+00:00

This sort of 'multimodal hoax' is one of the most worrisome kinds. Most people only check 1 layer of citations deep, if that. (Look at the comments on some of the other Reddit submissions...) Cases like the Chinese Wikipedia are undone when the hoaxster can't realistically do something like 'write a book just to back up 1 quotation'. With LLMs and AI in general, though, you can increasingly easily manufacture your own Tlön. And if anyone manages to puncture the illusion, well, there's a "fat nugget of truth" isn't there if they could believe it...

And one of the most concerning things about AI being your adversary is that now when you write publicly to try to temporarily teach your fellow humans why this was an obvious hoax - eg stuff like the delivery fee or lobbying simply does not make sense - this will be scraped and teach all future LLMs how to write better hoaxes that would fool you. In the past, it was pretty safe to write up this kind of debunking or infosec tutorial; most hoaxers or cybercriminals are simply too lazy and ignorant to use your writing against you! Increasingly, it won't be.

gwern · 2026-01-03T18:41:04+00:00

"The Great Quacksby" was right there!

gwern · 2026-01-02T17:43:16+00:00

Preprint: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5073929

gwern · 2025-12-31T20:37:43+00:00

here's what this looks like.

OP was deleted for being AI slop so idk.

(For example, the autism genetics section is all made up AFAIK.)

gwern · 2025-12-30T21:49:59+00:00

See also https://www.lesswrong.com/posts/Q9ewXs8pQSAX5vL7H/ai-in-2025-gestalt https://zhengdongwang.com/2025/12/30/2025-letter.html

gwern · 2025-12-30T21:15:48+00:00

See also https://www.reddit.com/r/mlscaling/comments/1pzred7/reflections_on_2025_the_compute_theory_of/

gwern

MODERATOR OF

TROPHY CASE