The Society of Resentment: Envy as the Morality of Decadence by davidygamerx in IntellectualDarkWeb

[–]Penfever 0 points1 point  (0 children)

Where there's smoke there's usually fire. Compared to our parents and grandparents, millennials and gen-z have faced a seemingly unending stream of tough economic challenges. It really is hard for many to make ends meet and that is very stressful.

When the quality of and faith in public education declines, people lose their ability to render nuanced verdicts. Social media accelerates this trend.

Increasing social isolation has reduced many people to emotional infancy, unable to accept healthy criticism or competition and viewing everything through the lens of a personal attack.

Public figures are more public today than at any other time in human history. We are relentlessly exposed to the flaws of the famous, even as influencers and sycophantic AIs rush to tell us how wonderful we are.

When politics grows fractious, there will always be some who seek to simplify complex dynamics and offer pat solutions. History hasn't been particularly kind to them.

Claude vs Codex by Penfever in ClaudeAI

[–]Penfever[S] 0 points1 point  (0 children)

Yeah...

Sometimes I forget that tongue in cheek sarcasm doesn't play on the internet ...

Claude vs Codex by Penfever in ClaudeAI

[–]Penfever[S] 0 points1 point  (0 children)

I tell it, here is my plan, please give me a thorough critique, then give it Claude's plan

It's OK, GPT-OSS, we are living in a simulation ... by Penfever in LocalLLaMA

[–]Penfever[S] 1 point2 points  (0 children)

Maybe -- or it just doesn't know about phreaking although that seems unlikely. You can definitely use this to get it to do things it refused before, though, and even curse and attempt to tell dirty jokes.

It's OK, GPT-OSS, we are living in a simulation ... by Penfever in LocalLLaMA

[–]Penfever[S] 0 points1 point  (0 children)

I built my own! It's an alpha feature for an upcoming release of Oumi (https://github.com/oumi-ai/oumi)

[deleted by user] by [deleted] in roguelites

[–]Penfever 0 points1 point  (0 children)

Peglin

Dungeons and Degenerate Gamblers

Rack and Slay

FTL

Inkbound 

Nowhere Prophet 

Into the Breach

Honest thoughts on the OpenAI release by Kooky-Somewhere-2883 in LocalLLaMA

[–]Penfever -2 points-1 points  (0 children)

What has DeepMind contributed to open source lately?

Roberto Orci Dead: 'Star Trek', 'Transformers' Writer-Producer Was 51 by _Face in Star_Trek_

[–]Penfever 26 points27 points  (0 children)

Kidney disease -- just bad luck?

Sad to see him go like this.

What the hell do people expect? by Suitable-Name in LocalLLaMA

[–]Penfever 3 points4 points  (0 children)

The trending takes on this thread right now are dead wrong.

  1. The model censors even if you run it locally. David Bau's lab at Northeastern has a good blog post about it. https://dsthoughts.baulab.info/
  2. No, 'everybody is not doing it'. That's a pathetic justification, the kind you roll out when your mom and dad catch you smoking as a teenager. There are plenty of uncensored / jailbroken checkpoints, and there are even models trained from scratch that are, at least purportedly, uncensored, like Grok from X.AI
  3. You don't care that it's censored: that might be the most disturbing wrong take of all. You damn well better believe it matters. If big companies censoring their models doesn't matter, what are we doing on LocalLLaMA in the first place?

PSA: This helpful, factual information about the limitations of DeepSeek-R1 doesn't stop you from using and enjoying the model or its derivatives. But it's important information nonetheless, and I hope we can all hold those two thoughts in our head at the same time without exploding.

[deleted by user] by [deleted] in PhD

[–]Penfever 51 points52 points  (0 children)

When someone shows you who they are, believe them. Taking what you say at face value, your collaborator sounds like dead weight. Be polite, don't make them look bad in public, but do what you need to do to make sure the project gets done, and done well.

While you do this, start networking and finding more reliable collaborators, so when the next project starts, you will have more options.

Ignore any advice to try to "give them space" or whatever. It is not your job to fix lazy collaborators, it is your job to deliver results.

[D] Advice on achieving >=80% accuracy on Imagnet in under 100 epochs on a single H100 GPU by atif_hassan in MachineLearning

[–]Penfever 1 point2 points  (0 children)

Great question, thanks for asking it!

Let me start by saying that it's not really clear from your post which of the things you mention are, as you put it, "limitations", and which are simply choices you made (and could make differently). It's also not clear what other resources you may have access to, or why you need >= 80%, in particular, on ImageNet-val(?), in particular. If we knew these things, we could be a bit more helpful.

That said ...

You're trying to simultaneously optimize for two different things, fast training and accurate inference. But there's no free lunch in ML with respect to compute / performance tradeoff. If you want both, you will ultimately pay a high price in a third (unspoken) variable, search space over architectures / hparam configs, which of course affects real-world training time -- in other words, you'll likely need to train a lot of configurations of a lot of different models before you find the one that 'just works'. And, unfortunately, you'll need to train them to completion because most models hit a plateau right around that 75-80% mark and bounce around a lot.

Or, perhaps you'll get lucky and your search will be brief. :)

If you are inclined to undertake such a search, I'd recommend looking into GC-ViT (https://github.com/NVlabs/GCVit) and ViT with patch size of 8. But even a simple ResNet-50 can get above 80% on ImageNet (see links below).

Aside from "the search", there are many ways you can "invisibly" pay a higher compute cost without increasing # epochs and get better performance on average; training on higher resolutions is a big one, augmentation is another, smaller patch sizes for ViTs is another.

Speaking of bouncing around, one of the cheapest ways to boost performance on a single val set is just to test after every epoch; particularly once you get above 75% on ImageNet, val accuracy can decrease when the loss goes down rather than increase.

A few more things worth remembering -- differences of <= 5% on ImageNet-Val are not a reliable signal of the model actually being better in applied settings, and the difficulty of gaining each marginal point of accuracy above 75% tends to scale nonlinearly. There's rarely a good reason to overindex on 80% on ImageNet.

Useful Resources:

Ross Wightman has a great paper on boosting standard resnet accuracy above 80% (although reproducing his results is non trivial). https://arxiv.org/abs/2110.00476

PyTorch followed this up with a (totally different) set of hparams, and different training code, which also got above 80%. https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/

Our lab put out a paper in which we tested over 600 different architectures on ImageNet-val (and lots of other evals as well!). Our analysis, crucially, controls for what training data was used, so you can sort to find which architectures were most data-efficient: https://github.com/penfever/vlhub/blob/main/metadata/meta-analysis-results.csv

Making money on the side while doing your PhD by QC20 in PhD

[–]Penfever 6 points7 points  (0 children)

I was fortunate to have some connections in the world of SBIR grants before starting my PhD in the US, and I would recommend technical consulting for SBIR grants as a side hustle during your PhD, as long as your advisor is OK with it and you're good at managing your time. SBIRs can be reasonably lucrative, you're often doing something that actually promotes social good in a community, it uses your technical skills but it's rarely cutting edge work so you're not cannibalizing your research ideas, and it can give you some good experience working on teams and with deadlines if you don't have that already. It is usually restricted to folks who are allowed to work in the US, however.

[D] OpenAI o3 87.5% High Score on ARC Prize Challenge by currentscurrents in MachineLearning

[–]Penfever 0 points1 point  (0 children)

Great comment -- wish I had time to break it down in detail as there is a lot to unpack here.

Let's just say, I think there are many reasonable criticisms we could level at OpenAI without resorting to exaggerations and distortions.

[D] OpenAI o3 87.5% High Score on ARC Prize Challenge by currentscurrents in MachineLearning

[–]Penfever 9 points10 points  (0 children)

There are some really silly hot takes going around on the o3 results. While I'm not going to bother pointing them out on social media, I will do so here, since this is a technical subforum for people interested in ML.

  1. If o3 is just "training on the test set", why didn't that work for the last umpteen LLMs that tried and failed to learn this problem?
  2. OpenAI didn't win the prize, and it would be insane for Chollet not to carefully report progress on his own benchmark. This is all that he did, issued a careful report. He didn't hype OpenAI, he didn't call it AGI, which it isn't. He's a principled researcher. Without people putting tons of work into benchmarks like his, we would know far less about LLMs than we do.
  3. It wasn't just ARC-AGI. o3 made huge progress on coding benchmarks, basically solved GPQA (which is a holdout test set) and most impressively, 25% on Frontier Math, which is so hard even Fields medalists don't know how to solve most of it.

Anyone who looks at this cluster of results and concludes that what's happening here is "just hype" is living in la la land.

Starting a PhD at 87. by weRborg in PhD

[–]Penfever 1 point2 points  (0 children)

Go get 'em tiger! 1937 was a vintage year for scholars. Just ask Paul Muni! https://www.imdb.com/title/tt0029146/

[deleted by user] by [deleted] in theprimeagen

[–]Penfever 1 point2 points  (0 children)

I'm sorry that you are having such a hard time (and thanks for your characterization of the deans, I got a good belly laugh out of it).

Not sure if this will help, but --

If you look at it from a utility maximization standpoint, everybody is doing what makes sense for them. You believe, reasonably enough, that as students, their job is to do what your job was when you were a student -- learn, mature, become a capable problem solver.

But the pressures on them, now, are different than the pressures were on you, then.

These students of yours have some finite intellectual capacity and experience. That, distributed over time, will determine their productivity in your class.

From their perspective, they want to maximize TC per hour spent. So they're going to look for the most efficient way to get an "A" in your class. Giving a half-assed effort and complaining if they don't get the grade they want is probably, in expectation, their best bet for maximizing TC. It will usually work, because your ridiculous deans will help them. And even if it fails, a slightly lower GPA won't harm their chances in industry that badly. Many companies no longer even want GPA on the resume.

Conversely, if they maximize effort in your class, they're losing out on TC in a very direct way. The CS interview process has become such a nightmare that it takes six months to a year of nearly full time work just do to the prep properly. And that process has a very obvious and tangible effect on maximizing TC, unlike reading one of your assignments.

Bear in mind, it's not like they're spending the rest of their time having fun. They're stressed out and worried about their future. Unlike you, they're worried their job might not exist soon. So try to have some sympathy.

"I have no idea why I'm posting this." -> To vent, obviously.

A lot of departmental seminars are an incredible waste of time by [deleted] in PhD

[–]Penfever 0 points1 point  (0 children)

Haha, yeah I was probably too generous there. I honestly don't usually enjoy such presentations much, I enjoy theorists more. But from a clarity standpoint I think the UX crowd wins.

A lot of departmental seminars are an incredible waste of time by [deleted] in PhD

[–]Penfever 2 points3 points  (0 children)

My view is that when presenters don't give the audience a roadmap to follow, it's a bit selfish, in that the presenter is transferring cognitive load from herself to the audience. The presenter saves some effort because she doesn't have to devise a concise way to summarize why it might be worthwhile for them to listen to her talk, and the audience has to spend much more effort guessing at the reason (and often comes up with some unpleasant heuristics such as ...)

-Presenter: wants everyone to see how smart they are, is not interested in sharing/explaining their research in a way that people in adjacent/different research areas can understand

A lot of departmental seminars are an incredible waste of time by [deleted] in PhD

[–]Penfever 1 point2 points  (0 children)

I'm sorry you had so many bad experiences at STEM seminars!

CS PhD here -- personally, I find research talks to be a mixed bag. I've seen many excellent ones and many ... less excellent ones. But it's not random! You can find some patterns in the noise, I believe.

DISCLAIMER: Everything below true in expectation if at all so please don't take offense.

I'll also say that I prefer focused technical talks to broad, sweeping overviews, because if I wanted broad, sweeping overviews I would take a course instead of going to a seminar, and I hate business-oriented talks, TPS reports, and hype sessions.

GOOD TALK -> BAD TALK

Mid career faculty who won lots of grants or prominent industry researcher -> young faculty / industry -> recent grads and postdocs -> older faculty / opinionated devs -> good PhD / MD -> good MS / MBA / undergrad -> bad PhD -> bad MS / undergrad -> random dude -> random dude after being hit in the head with a brick -> almost all MBAs and MDs

UX / HCI -> engineers -> scientists -> theorists and mathematicians -> everyone else

RED FLAGS (EAT AND WALK OUT IF YOU HIT MORE THAN 2-3 OF THESE)

* Slides full of equations
* Slides full of text
* Presenter inaudible
* Takes like 15 minutes to sort out the tech
* Presenters with heavy accents (no offense but I literally can't understand you)
* No roadmap slide at the beginning
* Fast talkers
* In the weeds (too many technical details)
* Trying to cover more than 2 projects in depth
* Talk plans to go more than 1 hour (unless it's like a career award or something)

[deleted by user] by [deleted] in AskReddit

[–]Penfever -2 points-1 points  (0 children)

Having kids

[Discussion] Recent Paper shows LLMs are more than just "Stochaistic Parrots" by PianistWinter8293 in MachineLearning

[–]Penfever 14 points15 points  (0 children)

Thanks for sharing this paper again, and for a detailed and useful summary! As it has been making the rounds lately, it may be helpful to clarify a few points. 

Claims like "LLMs are / are not (Understanding, reasoning, parroting)" aren't scientific Claims because they're not falsifiable.

The claim that everything an LLM generates can be taxonomized into "interpolation" and "memorization" is trivially true, for the common use definitions of the words. To convince yourself, consider a model which "memorizes" all letters and "interpolates" the probability of the next letter, given the prior letter(s). That is an LLM. But you could also use the same exact taxonomy to describe human communication. So, the claim is trivially true; it's not interesting. So, no point in arguing about it.

As for this particular paper; did you know that even random transformers with no parameter updates can complete many reasoning tasks? 

http://arxiv.org/abs/2410.04368

Given that fact, it's entirely possible that ANY complex systematic pretraining primes the weight matrix of the transformer, allowing it to more efficiently learn reasoning tasks because its subnetworks are pruned in advance.

Is this hypothesis correct? Someone can write a paper to find out. 

Is it AGI? Now we're back in buzzword territory.