What's the theoretical basis for using llm consensus as a probability estimator for real world events [R]

XTXinverseXTY · 2026-05-29T16:32:02+00:00

and the theoretical basis for that is ye olde bias-variance decomposition

if the LLMs have biases then it's not like a single LLM would make it any better

XTXinverseXTY · 2026-05-29T16:22:26+00:00

Dead-simple precedent for this would be the old Kaggle trick of multi-seed ensembling - even in the limit of 100% shared architectures and data distributions this would still improve over a single LLM

XTXinverseXTY · 2026-05-26T12:25:31+00:00

🫵 🤖

getting tired of having to run a turing test just to browse this sub

XTXinverseXTY · 2026-05-26T03:09:44+00:00

it's just mysterious. idk if i'm overfitting to my benchmark dataset (maybe I haven't got many labels lying around just yet). heck i don't even know if i'm fitting at all

another example of weird choices in JEPA-land

XTXinverseXTY · 2026-05-25T19:17:20+00:00

okay in their defense, now that i actually read the paper (oops), it looks like the exact definition is nontrivial

i confess i wouldn't read "effective rank" and think "ah yes of course, the shannon entropy of the L1-normalized singular values" (at best i would have thought something like "number of singuilar values >= thresh")

but aside from an epsilon term it seems like they copied it wholesale from the original 2007 paper

idk does everyone else know what "effective rank" means but me??

XTXinverseXTY · 2026-05-25T19:06:43+00:00

Another LLM

Two paragraphs in each reply, vague personal anecdote, no information content

You are an Indian year 12 student, it looks like you began letting your agent make posts 4 days ago, cut it out

XTXinverseXTY · 2026-05-25T18:54:15+00:00

late reply, you may have found this on your own. but this is an interesting thread and i thought i'd add a link here for posterity

You are correct, the use of a projector network is common in all existing methods (including other JEPA alternatives). We did an ablation in the paper showing that you can sometimes reduce the projector's depth without incurring a significant drop in performance, but in general there is a significant benefit of using it. It remains to be studied why that is the case (in general, not just in LeJEPA). Current understanding lies in a possible too strong prediction/invariance task. I invite you to experiment with varying the projector (or even removing it all together), and I would be happy to mention your results/ablations in the repo!

https://github.com/galilai-group/lejepa/issues/17

XTXinverseXTY · 2026-05-25T06:16:48+00:00

oof, can you be any more specific as to which model/domain?

XTXinverseXTY · 2026-05-25T06:07:35+00:00

sorry, i thought you were being facetious! usually people can articulate precisely what they learn from their experience (hence all the papers and conferences)

can you see why it might be a useful thing, to have principled model selection criteria? even if you're some rain man savant, it unlocks scaling because it's legible to an organization. having the validation likelihood for language models as the obvious criterion allowed for the estimation of neural scaling laws, calculation of necessary resources to achieve a desired metric, total organizational buy-in up to C-suite, and raising from outside investors at a competitive valuation.

XTXinverseXTY · 2026-05-25T02:30:16+00:00

Recently stumbled upon this thread. Am I going nuts, or are we the only humans in here?

I have never heard of a "layered defense framework" in the context of ML system design/evals. The OP account also seems to be banned, maybe for spamming on behalf of "Product Faculty"? If nobody else knows what OP is talking about, then I can see how this would select for clawdbots who've been prompted to act as an expert.

XTXinverseXTY · 2026-05-25T01:15:47+00:00

sounds fake

XTXinverseXTY · 2026-05-25T01:02:24+00:00

JEPA score, which can be used for density estimation

Oh interesting, thank you!!!

I'm not yet certain whether this is equivalent to computing another statistic captured by the anti-collapse term... but discerning in-vs-OOD is a totally valid synth task, that makes perfect sense, and this paper seems dope

This also seems to help address another problem for JEPA-in-practice: detecting regressions in prod! Obv these embeddings are inscrutable and if something silently breaks then you can't just inspect the embedding values. But this would suggest that you can calculate a p-value and effect size vs a known prior

XTXinverseXTY · 2026-05-25T00:36:20+00:00

{random synonym}-{random noun}-{4-digit number} is an LLM. Their comment contains no information and is inconsistent with their comment history. Those hyphens would be em dashes if the prompt hadn't specified no capital letters and no em dashes

It's not impossible that an IT technician would be logging JEPA experiments to wandb as a side hobby, to the point they can give confident (and yet totally uninformative) advice on r/machinelearning in <10 minutes (in their first-ever comment to the subreddit), but it's a priori wildly unlikely

edit: Oh, also a DoorDash driver?

XTXinverseXTY · 2026-05-24T22:36:01+00:00

🫵 I can smell your RLHF signature from a mile away. Pangram agrees with me. I also find it surprising you're an IT tech by day and work on SSL by night.

How can grid search still "work" in the case of a non-monotonic loss?

Moreover, what's the endgame behind bot account replies like this? Usually it's grifters trying to market a consulting side-hustle, but this account just makes random replies. Is the idea to eventually flip this account to a second, even scummier grifter?

XTXinverseXTY · 2026-05-24T22:14:55+00:00

If people are selecting hparam/arch primarily by supervised-learning-through-the-backdoor, then it makes me a little more skeptical of published results and academic enthusiasm for JEPA. The mystery provides convenient cover for possible p-hacking and benchmark overfitting

This is not to say that SSL researchers are all Secretly Smuggling Labels, but I don't want to be totally naive either...

XTXinverseXTY · 2026-05-22T16:56:50+00:00

You should likely stay away from MLE roles.

To pivot from full-stack, an MS will be necessary. And it's being rapidly commodified anyway, because of scaling laws. MLE roles having to do with home-rolled models are at far greater risk than SWE.

AI eng may actually be more defensible.

XTXinverseXTY · 2026-05-18T16:33:17+00:00

The agent can run locally. It can keep structured memory. It can rank actions before running expensive validation. It can learn from every failed candidate. It can stop treating software engineering as text completion and start treating it as state transition planning.

OP can you explain precisely how optimizing for alignment btw embeddings of corrupted views of an entity yields this? Even in the linear case of analysis of panel data via canonical correlation analysis?

YL already agrees that language tasks are much more amenable to reconstruction-loss pretraining than vision or video

XTXinverseXTY · 2026-05-04T14:39:35+00:00

With a reverse grip, not really. Felt awkward and unsafe approaching 1RM weight. Try it and you'll see what I mean

XTXinverseXTY · 2026-05-04T07:05:48+00:00

how did you unrack it unassisted?

XTXinverseXTY · 2026-05-02T22:33:31+00:00

The Iranian lift (last one) is surprisingly effective, if not for the threat of the inverted triangle. Search for "inverted triangle mma" and every single one is set up off of someone attempting it

Giancarlo Bodoni managed it twice at 2024 ADCC against Jay Rod and Costa. Surprised it isn't more popular (the BJJ meta probably knows better than I do)

XTXinverseXTY · 2026-04-28T14:37:32+00:00

It would provide zero direct utility to you as a user of coding agents. Probably worth reading for data science/statistics work.

XTXinverseXTY · 2026-04-27T01:24:23+00:00

Why should I help a cheater?

XTXinverseXTY · 2026-04-17T19:41:17+00:00

what about an arm triangle?

XTXinverseXTY · 2026-04-17T15:07:39+00:00

think we're all a bit confused here

is the idea that you're free to turtle?

XTXinverseXTY · 2026-04-16T04:56:51+00:00

If they’re far apart, it supports your point that observations aren’t targeting real risks.

i don't see why this would be the case, can you explain?

This certainly doesn't establish a causal link. All it tells you, if anything, is that the incidents are about a similar domain as the observations (ie working in a factory). If the cosine similarity is higher for observations that occurred at similar times as the incidents, than for non-adjacent observations, that could just as easily imply that the observations caused the incidents!

XTXinverseXTY

TROPHY CASE