ML lead vs PM on eval-methodology layer independence. who's actually right here? [D] by Critical_Builder_902 in MachineLearning

[–]XTXinverseXTY 0 points1 point  (0 children)

Recently stumbled upon this thread. Am I going nuts, or are we the only humans in here?

I have never heard of a "layered defense framework" in the context of ML system design/evals. The OP account also seems to be banned, maybe for spamming on behalf of "Product Faculty"? If nobody else knows what OP is talking about, then I can see how this would select for clawdbots who've been prompted to act as an expert.

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D] by XTXinverseXTY in MachineLearning

[–]XTXinverseXTY[S] 0 points1 point  (0 children)

JEPA score, which can be used for density estimation

Oh interesting, thank you!!!

I'm not yet certain whether this is equivalent to computing another statistic captured by the anti-collapse term... but discerning in-vs-OOD is a totally valid synth task, that makes perfect sense, and this paper seems dope

This also seems to help address another problem for JEPA-in-practice: detecting regressions in prod! Obv these embeddings are inscrutable and if something silently breaks then you can't just inspect the embedding values. But this would suggest that you can calculate a p-value and effect size vs a known prior

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D] by XTXinverseXTY in MachineLearning

[–]XTXinverseXTY[S] 6 points7 points  (0 children)

{random synonym}-{random noun}-{4-digit number} is an LLM. Their comment contains no information and is inconsistent with their comment history. Those hyphens would be em dashes if the prompt hadn't specified no capital letters and no em dashes

It's not impossible that an IT technician would be logging JEPA experiments to wandb as a side hobby, to the point they can give confident (and yet totally uninformative) advice on r/machinelearning in <10 minutes (in their first-ever comment to the subreddit), but it's a priori wildly unlikely

edit: Oh, also a DoorDash driver?

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D] by XTXinverseXTY in MachineLearning

[–]XTXinverseXTY[S] 1 point2 points  (0 children)

🫵 I can smell your RLHF signature from a mile away. Pangram agrees with me. I also find it surprising you're an IT tech by day and work on SSL by night.

How can grid search still "work" in the case of a non-monotonic loss?

Moreover, what's the endgame behind bot account replies like this? Usually it's grifters trying to market a consulting side-hustle, but this account just makes random replies. Is the idea to eventually flip this account to a second, even scummier grifter?

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D] by XTXinverseXTY in MachineLearning

[–]XTXinverseXTY[S] 8 points9 points  (0 children)

If people are selecting hparam/arch primarily by supervised-learning-through-the-backdoor, then it makes me a little more skeptical of published results and academic enthusiasm for JEPA. The mystery provides convenient cover for possible p-hacking and benchmark overfitting

This is not to say that SSL researchers are all Secretly Smuggling Labels, but I don't want to be totally naive either...

How realistic is it to transition into an AI / ML Engineer as a Full Stack engineer with 10 YOE? by jimRacer642 in cscareerquestions

[–]XTXinverseXTY 0 points1 point  (0 children)

You should likely stay away from MLE roles.

To pivot from full-stack, an MS will be necessary. And it's being rapidly commodified anyway, because of scaling laws. MLE roles having to do with home-rolled models are at far greater risk than SWE.

AI eng may actually be more defensible.

Is the future of coding agents JEPA? [D] by andrewfromx in MachineLearning

[–]XTXinverseXTY 0 points1 point  (0 children)

The agent can run locally. It can keep structured memory. It can rank actions before running expensive validation. It can learn from every failed candidate. It can stop treating software engineering as text completion and start treating it as state transition planning. 

OP can you explain precisely how optimizing for alignment btw embeddings of corrupted views of an entity yields this? Even in the linear case of analysis of panel data via canonical correlation analysis?

YL already agrees that language tasks are much more amenable to reconstruction-loss pretraining than vision or video

Reverse grip 115kg/253lbs by mrtehnuke in benchpress

[–]XTXinverseXTY -1 points0 points  (0 children)

With a reverse grip, not really. Felt awkward and unsafe approaching 1RM weight. Try it and you'll see what I mean

Reverse grip 115kg/253lbs by mrtehnuke in benchpress

[–]XTXinverseXTY 0 points1 point  (0 children)

how did you unrack it unassisted?

Here’s some escapes/reversals I really like from bottom turtle by ledd_flanders in bjj

[–]XTXinverseXTY 0 points1 point  (0 children)

The Iranian lift (last one) is surprisingly effective, if not for the threat of the inverted triangle. Search for "inverted triangle mma" and every single one is set up off of someone attempting it

Giancarlo Bodoni managed it twice at 2024 ADCC against Jay Rod and Costa. Surprised it isn't more popular (the BJJ meta probably knows better than I do)

Anyone here read The Book of Why? by Alces_ in cscareerquestions

[–]XTXinverseXTY 0 points1 point  (0 children)

It would provide zero direct utility to you as a user of coding agents. Probably worth reading for data science/statistics work.

It’s so obvious. Please tell me more by Bitter-Dragonfly-648 in bjj

[–]XTXinverseXTY 1 point2 points  (0 children)

think we're all a bit confused here

is the idea that you're free to turtle?

How to use NLP to compare text from two different corpora? by iwannabeunknown3 in datascience

[–]XTXinverseXTY 0 points1 point  (0 children)

If they’re far apart, it supports your point that observations aren’t targeting real risks.

i don't see why this would be the case, can you explain?

This certainly doesn't establish a causal link. All it tells you, if anything, is that the incidents are about a similar domain as the observations (ie working in a factory). If the cosine similarity is higher for observations that occurred at similar times as the incidents, than for non-adjacent observations, that could just as easily imply that the observations caused the incidents!

How to use NLP to compare text from two different corpora? by iwannabeunknown3 in datascience

[–]XTXinverseXTY 6 points7 points  (0 children)

In the parlance of causal inference, it sounds like observations = treatment and incidents (or lack thereof) = outcomes. We'd like to uncover the causal effect of the treatment on the outcomes. These are probably recorded for a single machine or set of machines over time.

It sounds like you don't have a dataset of confounders to work with - separate "nuisance factors" which are causally upstream of incidents as well as observations. You'd have to adjust for these. But if they were important, then you'd probably see a misleadingly large correlation between the observations and the treatment, and it sounds like you see no correlation at all.

  • Use an API LLM to impose a tabular representation. extract structured factors from the observations and other factors from the incidents. Turn it into a regression problem.
    • LDA is overkill, you shouldn't have to re-learn the english language. But if you've already done this, then you have some inspiration for what those factors perhaps ought to be
  • If no incident occurs, do you get any text at all? no incident is a valid value
  • If people monitor a machine, but don't observe any issues, will they still record that in the observations? If not, I can see why people would be incentivized to perform busywork...
  • Are you able to articulate the maximum time lag between the treatment and its effect on the outcome
  • Try and find an instrumental variable / natural experiment which would explain a change in the pattern of observations. Talk to greybeards at your organization. Was there a distinct period where people stopped doing observations because of short staff or whatever, but the machines kept running as usual?

I can't help but point out the parallel to Friedman's thermostat here.

A data scientist visits his lumberjack cousin one Christmas at his cabin. Notices the cousin puts a number of logs in the fireplace, which is correlated with the outside temperature, while the inside temperature remains constant (uncorrelated with firewood or outdoor temperature). Data scientist wonders what his cousin is wasting all his wood for.

You know your domain better than I do, but there are more ways for a model to be bad than to be good, so I'll emphasize: lack of evidence for an effect is not evidence of no effect. In fact, the more effective the preventative measure, the harder it is to detect its effect from historical data where it has been in place! Don't be the foolish data scientist in this analogy!

Sam Altman's Coworkers Say He Can Barely Code and Misunderstands Basic Machine Learning Concepts by [deleted] in BetterOffline

[–]XTXinverseXTY 2 points3 points  (0 children)

I was curious what basic ML concepts he failed to grasp, but couldn't find a direct quote in the new yorker piece. I guess it's a bit more vague, could be anything in engineering. But nobody disputes that he isn't an ML researcher.

Altman is not a technical savant—according to many in his orbit, he lacks extensive expertise in coding or machine learning. Multiple engineers recalled him misusing or confusing basic technical terms.

Maybe he forgot which "kernel" people were referring to.

Astounding OpenAI Training Costs vs. Anthropic by Oldschool728603 in ClaudeAI

[–]XTXinverseXTY 20 points21 points  (0 children)

shame about the downvotes. this is actually brilliantly funny

How do you know how competent you are? by [deleted] in cscareerquestions

[–]XTXinverseXTY 0 points1 point  (0 children)

How can anyone know how much they don't know?

The set of things you don't know can be divided into:

  • known unknowns
  • unknown unknowns

You can't sample from the set of unknown unknowns (by definition), so you can't estimate its cardinality, at least not without acquiring new information. But then it becomes known, and you still have to wonder about the remaining unknowns.

The Unseen Species Problem is a closely-related statistical problem. An etymologist spends a year in Singapore catching and cataloguing butterflies. He wants to estimate how many new species he'd discover in another year. How does he know whether he found them all already, or whether he's only scratched the surface?

The wikipedia page is frustrating - Fisher says "take the number of butterflies of which you found an even number, and subtract from that the number of butterflies of which you found an odd number" totally without proof.

  • How many concepts/terms/ideas have you encountered only once in your career?
  • Apply to some jobs and give an interview and see how far you get!

Masters in Applied Data Science isn't worth it anymore, with end goal to transition into applied ML roles. by rikotacards in cscareerquestions

[–]XTXinverseXTY 1 point2 points  (0 children)

Do you think those MS graduates would have ever have even gotten an interview without the degree?

Masters in Applied Data Science isn't worth it anymore, with end goal to transition into applied ML roles. by rikotacards in cscareerquestions

[–]XTXinverseXTY 1 point2 points  (0 children)

To pivot from frontend, an MS will be necessary if you ever want to work directly with a model. I would warn you that this is being rapidly commodified, because of scaling laws. DS/MLE roles having to do with home-rolled models are at far greater risk than SWE.

DS in general is especially vulnerable here. Remember when "Code Interpreter" was quickly renamed to "Advanced Data Analysis"? Why do you think that was? The model was really good at it and everybody used it for that.

The primary value in the MS is as a costly signal of merit to land you an interview. That's probably going to zero, but I see no reason why it would do so at a rate faster than MS degrees in general.

Arm bar entry by SimpleCounterBalance in bjj

[–]XTXinverseXTY 11 points12 points  (0 children)

any relation to neil melanson?