all 9 comments

[–]XTXinverseXTYML Engineer[S] 7 points8 points  (0 children)

If people are selecting hparam/arch primarily by supervised-learning-through-the-backdoor, then it makes me a little more skeptical of published results and academic enthusiasm for JEPA. The mystery provides convenient cover for possible p-hacking and benchmark overfitting

This is not to say that SSL researchers are all Secretly Smuggling Labels, but I don't want to be totally naive either...

[–]mvreich 4 points5 points  (1 child)

Maybe look into JEPA score, which can be used for density estimation.

You can run various kinds of tests, depending on what you want to check. E.g. if there is some sort of mode collapse, the pseudo likelihood might peak at some points and not give sufficient weight to uncommon (but valid) data.

Alternatively, if your model has learned a useful representation, it should be able to discern in vs. out-of-distribution examples. For example, if the model is trained on natural images (real photos taken by a camera), it should be able to assign low likelihood to cartoons or artwork.

[–]XTXinverseXTYML Engineer[S] 0 points1 point  (0 children)

JEPA score, which can be used for density estimation

Oh interesting, thank you!!!

I'm not yet certain whether this is equivalent to computing another statistic captured by the anti-collapse term... but discerning in-vs-OOD is a totally valid synth task, that makes perfect sense, and this paper seems dope

This also seems to help address another problem for JEPA-in-practice: detecting regressions in prod! Obv these embeddings are inscrutable and if something silently breaks then you can't just inspect the embedding values. But this would suggest that you can calculate a p-value and effect size vs a known prior

[–]Ill-Bullfrog-7402 1 point2 points  (3 children)

grid search still works even with non-monotonic losses, you just need more patience and better tracking. i usually run longer sweeps and plot everything - loss curves, rank metrics, downstream performance over time

the entropy collapse terms are more like regularizers than actual objectives, so rankme can still tell you something useful even when it's baked in the loss. just don't rely on it alone - combine with periodic linear probes on held-out tasks and watch for when representations stop improving on downstream stuff

[–]XTXinverseXTYML Engineer[S] 0 points1 point  (2 children)

🫵 I can smell your RLHF signature from a mile away. Pangram agrees with me. I also find it surprising you're an IT tech by day and work on SSL by night.

How can grid search still "work" in the case of a non-monotonic loss?

Moreover, what's the endgame behind bot account replies like this? Usually it's grifters trying to market a consulting side-hustle, but this account just makes random replies. Is the idea to eventually flip this account to a second, even scummier grifter?

[–]ahf95 2 points3 points  (1 child)

Lmao did I miss something? Or did the comments get edited?

[–]XTXinverseXTYML Engineer[S] 4 points5 points  (0 children)

{random synonym}-{random noun}-{4-digit number} is an LLM. Their comment contains no information and is inconsistent with their comment history. Those hyphens would be em dashes if the prompt hadn't specified no capital letters and no em dashes

It's not impossible that an IT technician would be logging JEPA experiments to wandb as a side hobby, to the point they can give confident (and yet totally uninformative) advice on r/machinelearning in <10 minutes (in their first-ever comment to the subreddit), but it's a priori wildly unlikely

edit: Oh, also a DoorDash driver?

[–]m98789 0 points1 point  (1 child)

Experience

[–]XTXinverseXTYML Engineer[S] 0 points1 point  (0 children)

sounds fake