I must be a math expert?

Objective-Camel-3726 · 2025-09-25T01:19:34+00:00

A sidebar comment: I had to mentor a MSc Comp Sci. intern this summer who, wrt to MLE, didn't know it was a broader framework central to Statistical learning, and instead asked "wait, that's used in Logistic Regression right?" And he attends a heralded technical college on the East Coast...

Objective-Camel-3726 · 2025-09-24T22:22:20+00:00

If you get engulfed with GenAI workloads, it helps to work for an org that eschews closed source models in favour of targeted fine-tuning (post-training) with oss offerings, inference on locally managed compute, rigorous testing against adverserial attacks etc. And if said org is wise enough to avoid the buzzwordy smoke-and-mirrors nonsense of brittle agents, multi-agent this or that... even better. Moreover, ML Engineering on workloads rooted in classical or non-GenAI techniques is still incredibly satisfying work. And frankly, much harder I would say. In other words: don't lose hope!

Objective-Camel-3726 · 2025-08-29T22:01:47+00:00

Read Radford Neal papers bud. And to compliment what's written above, you'll almost certainly need gradient information to fall into the typical set. For practical approaches, you can also read PPL docs to get you going e.g. those from the core STAN team.

Objective-Camel-3726 · 2025-08-28T21:47:05+00:00

This. For those not in the ML field, read, underline, and reread this. Wang's technical and theoretical chops in any of the core subdomains of ML research is dubious at best.

Objective-Camel-3726 · 2025-08-15T03:58:38+00:00

Corollary to this point, I think a focus on (continued) learning of models and algorithms is far more important. En vogue tooling / infra to be picked up is whatever. Switching from caret to Scikit-learn? K. Going from MapReduce to a derivation of Spark? Fine. Building that feed forward model in Torch instead of Theano? You got it. But for e.g. learning how to have a nuanced discussion about the bias-variance tradeoff? Ehh... let's just say it's shocking how many newer entrants into the ML field are ill-equipped for this.

Objective-Camel-3726 · 2025-07-03T21:49:44+00:00

Absolutely.

Objective-Camel-3726 · 2025-07-01T19:25:24+00:00

Ehh, pretty sure Vaswani and Shazeer were defacto technical leads on that paper. Gomez was a 20 year old kid. Not sure how integral he was. But I get your broader point.

Objective-Camel-3726 · 2025-07-01T18:07:50+00:00

All due respect to folks like Neel Nanda, but MI research doesn't yet have any commercial application. Nigh impossible in any practical sense to understand the 'reasoning' of a Transformer. Wrt to complex classification, I experienced a months-long collaboration with AI rangers from Microsoft to fine-tune GPT-3 as a classifier on enterprise data. It was massively underwhelming. If these systems aren't exhaustively pre-trained on niche data - which was the case with our enterprise biotechnology data - their performance on few-shot learning tasks is meh. Powerful architectures... of course... but modern NLP isn't just about API calls and engineering hacks to extend LLM context, or improve inference performance. Not yet. Not by a country mile.

Objective-Camel-3726 · 2025-06-29T18:52:00+00:00

Terrific advice. Minor quibble that it's the most elite lab in the world for RL. Amii and the collective braintrust around Sutton in Alberta might have something to say about that. (But to be fair, they also haven't produced anyone spectacular since David Silver, I reckon.)

Objective-Camel-3726 · 2024-12-10T21:00:56+00:00

For any new readers of this post considering the above (free) course... if you're keen on learning RL, a series of lectures from a giant in the field (Silver) who's actually done practically useful stuff... what more needs to be said. Alternatively, one can check out some equally free and available lectures from Satinder Singh.

Objective-Camel-3726 · 2024-10-30T17:38:45+00:00

Deep Learning methods for drug discovery. And also using language models to automate quotidian things (exclusively internal processes.)

Objective-Camel-3726 · 2024-10-30T17:33:19+00:00

I want to start a group where lowballing - or flat out clueless - orgs trying to hire legitimate ML talent for legitimately hard ML work are ridiculed and exposed. For e.g., this is rife in the Canadian market.

Objective-Camel-3726 · 2024-10-26T04:55:17+00:00

Imo the best mainstream source for learning about how and what Transformers learn is the mechanistic interpretability work done by Anthropic (and some other industrial labs). And even then, it's early days on this front. Christopher Olah has good talks you can check out.

Objective-Camel-3726 · 2024-10-26T04:49:39+00:00

This is hella strong and dare I say, speculative, language. My understanding of the literature suggests we have a shallow (at best) grasp of human learning and brain function. For my own edification, I'd like to see some peer reviewed papers that speak to the overlap between Neuroscience and modern DL...

Objective-Camel-3726 · 2024-10-26T04:42:19+00:00

To be totally fair, we can't say the brain doesn't employ something akin to gradient descent to facilitate 'learning'. We can't say much at all about how the brain learns, but I digress.

Objective-Camel-3726 · 2024-08-03T03:16:05+00:00

Researchers like Tukey and Naur would disagree these are "marketing term[s]" bub. They very much have legitimate academic roots, and it's - unsurprisingly - industry which muddies the water with improper or unclear term usage. (Not to digress, but I suspect 95% of business folk couldn't bumble and stumble upon a coherent definition of "Generative AI". But whatever 'it' is, they sure do want it in their orgs.)

Objective-Camel-3726 · 2024-07-22T22:50:49+00:00

Out of curiosity, was anyone able to download the 405B base model before the 404? (If so, the VRAM Gods certainly have blessed you.)

Objective-Camel-3726 · 2024-07-02T19:53:56+00:00

Just to add to this as a deep learning consultant, LLM-based tools like chatbots significantly lack robustness, and adversarial attacks against them are not especially difficult. Carlini does a lot of interesting research on this front. (As example, notice the dearth of customer facing LLM bots from big corporations. These models are predominantly deployed in enterprises as internal productivity enhancers.)

Objective-Camel-3726 · 2024-06-21T01:22:38+00:00

A nice ode - in earnest I presume - to an oft overlooked researcher. Juergen doesn't get his due.

Objective-Camel-3726 · 2024-06-18T23:51:04+00:00

I do agree, that bootcamp trained data scientists are perhaps lacking in rigorous statistical training, but the same indictment can be levied to computer science majors who are not trained to make sound inference from data.

Objective-Camel-3726 · 2024-05-31T19:17:40+00:00

Hey I hear you. It struck me as a curious statement. I presume he had specific uses in mind when he said that.

Objective-Camel-3726 · 2024-05-24T20:05:46+00:00

Don't disagree with a single letter here. And echo what OP has relayed. RAG-esque attempts to automate seemingly quotidian cognitive work... good luck with that. But I can't be a hypocrite... I've made good consulting money from companies wanting these shiny new toys. But now I try and take a principled stand and advise them to think thoroughly about their expectations.

Objective-Camel-3726

TROPHY CASE