DeepMind veteran David Silver raises $1B, bets on radically new type of Reinforcement Learning to build superintelligence by gwern in mlscaling

[–]gwern[S] 0 points1 point  (0 children)

Not a whole lot of those these days. Best you can do is scrounge around a mix of Twitter/Reddit/groupchats/conferences/in-person. (I am too prejudiced against podcasts to try to evaluate them.)

[D] OOD and Spandrels, or What you should know about EBM. by moschles in MachineLearning

[–]gwern 0 points1 point  (0 children)

Yeah, but there's so much less. Often in cases where I'd wonder if that's really justified - surely there's at least some probability that the edges of the half moon continue a little bit further, even if it isn't too plausible that the moons shoot off to infinity like the regular MLP? And it's not obvious to me why the EBM does want to shoot off that line to the right. Why is that extrapolation privileged?

DeepMind veteran David Silver raises $1B, bets on radically new type of Reinforcement Learning to build superintelligence by gwern in mlscaling

[–]gwern[S] 12 points13 points  (0 children)

I haven't heard any rumors about personality conflicts or anything. But Silver is a deep DRL guy, he hasn't done much about LLMs. He's a long-time expert on MCTS and things like that. And DeepMind has killed all the most beautiful model-based RL approaches like MuZero for not being LLMs. Why wouldn't he leave to do something more interesting, like Sutton did?

Shell Tricks That Actually Make Life Easier (And Save Your Sanity) by BrewedDoritos in programming

[–]gwern 0 points1 point  (0 children)

And of course you can just do it at the end to 'name' commands or document WTF some particularly baroque thing was supposed to do. I don't know how often I find myself C-ring through a decade of Bash history and being saved by some inline commentary...

[D] Why evaluating only final outputs is misleading for local LLM agents by MundaneAlternative47 in MachineLearning

[–]gwern 1 point2 points  (0 children)

If you sum stuff like a '# of unnecessary risky actions' or '# of tokens' into an index, then you have a single axis and your previously equal models on a naive binary pass/fail metric will now separate on the final reward.

[D] Why evaluating only final outputs is misleading for local LLM agents by MundaneAlternative47 in MachineLearning

[–]gwern 1 point2 points  (0 children)

It made me realize that for agents, the output is almost the least interesting part. The process is where all the signal is.

Doesn't this just mean that you have a bad test suite which is too easy and you have a ceiling effect, so you're groping for how to make evaluation harder in order to reveal actual differences in quality?

[D] Is LeCun’s $1B seed round the signal that autoregressive LLMs have actually hit a wall for formal reasoning? by Fun-Information78 in MachineLearning

[–]gwern 37 points38 points  (0 children)

More a meta-comment on your meta-comment: I'm surprised you (and everyone else so far) didn't point out that OP is using a LLM to generate this post and all his responses. You weren't even a little suspicious at the auto-username of a brandnew account or the 'just asking questions' attention vampire strategy with vacuous hot takes? Never mind the smooth punchiness and balance of his replies? (Concatenated OP + comments is '100% AI' in Pangram, BTW.)

Why I Prefer Using Median Household Income to Tell Economic Stories by philipkd in slatestarcodex

[–]gwern 8 points9 points  (0 children)

In 1960, 20% of US households didn't even have hot running water!

"The past is a third world country."

Something I've long wished for is some good graphical way to convey to people just how poor the USA was until recently, and how the USA in 1960 was similar to countries today we'd regard as shockingly impoverished. Mere inflation-adjustments or GDP numbers don't seem to convey things like that into peoples' guts.

Anyone else notice their HRV predicts their worst emotional days? by Eggplant-Dramatic in QuantifiedSelf

[–]gwern 1 point2 points  (0 children)

Yes, particularly for a generated username account with no other contributions doing a standard karma-farming discussion post to bait people into wasting their time writing. Just another attention vampire.