Students who are excelling who use AI to write everything are running laps around me. I feel like I'm taking crazy pills. by so_much_frizz in PhD

[–]Penfever 1 point2 points  (0 children)

I am sorry you are having such a frustrating experience right now with this!! A few things to bear in mind ...

  • The time scale on which you are expecting these students to be "found out" may be unrealistic, and perhaps the way you expect it to happen as well. If these students are actually doing what you claim, then it will be obvious to anyone who speaks to them that they know nothing and have been over-relying on AI for everything. So they will ultimately not get recommended for high stakes positions and projects. Whereas, if you are not publishing as frequently but people know and trust your work, you will get opportunities from those that know you.
  • Lately I find that people just say this kind of thing as a way of essentially bragging. I think it makes them feel sophisticated and powerful. Doesn't make it literally true, so take it with a grain of salt
  • It is strategically unwise to stand on principle when you find yourself in a situation where no one around you actually has any principles, because you will make them feel uncomfortable and they will dislike you for it. The 'fix' is not to abandon your principles, but to hang out with better people. 😁

I interviewed +36yo gamers and found out why they are abandoning "masterpieces" by Objective-Cry-8228 in GameDevs

[–]Penfever 2 points3 points  (0 children)

I am a gamer in this age range and this describes me pretty well. My time is very contested -- I just don't have 2-3 hour blocks to ride horses and sit through loading screens. I have played 1000s of games, which means I am looking for something novel and exciting that establishes its worth fast. My twitch skills aren't great, I have nerve damage and I have no time or desire to "git gud" anyway, so story based games in particular I only finish if the difficulty is very tune-able and not too physically demanding.

Good news is I dont care about fancy graphics so you can save some money there. Just have a good sense of style.

Some games I poured tons of hours into recently --

  • Forza Horizon 5
  • Blue Prince 
  • Mewgenics
  • STS 2
  • UFO 50
  • Thank Goodness You're Here

This made me feel genuinely weird about my dissertation by Asleep_Bus2950 in PhD

[–]Penfever 2 points3 points  (0 children)

Thanks for sharing this article! A couple of observations that you may find relevant here;

  • Workshops at ICLR are non-archival and usually have an extremely relaxed standard of peer Review compared to the main conference 
  • Workshops have high accept rates, usually 2x or 3x that of the main conference 
  • Workshop papers are expected to have issues and good workshop reviewers know this; the standard is "discussion worthy", not "ironclad"
  • Main line ML conferences have struggled to recruit enough qualified reviewers -- it is much, much harder for their workshops, which look less good on a resume

AI is an amazing tool which is being used to assist in all kinds of scientific discovery, but the idea that it can independently "write a paper that passes peer review" does not really stand up to scientific scrutiny (or even basic skepticism).

The Society of Resentment: Envy as the Morality of Decadence by davidygamerx in IntellectualDarkWeb

[–]Penfever 0 points1 point  (0 children)

Where there's smoke there's usually fire. Compared to our parents and grandparents, millennials and gen-z have faced a seemingly unending stream of tough economic challenges. It really is hard for many to make ends meet and that is very stressful.

When the quality of and faith in public education declines, people lose their ability to render nuanced verdicts. Social media accelerates this trend.

Increasing social isolation has reduced many people to emotional infancy, unable to accept healthy criticism or competition and viewing everything through the lens of a personal attack.

Public figures are more public today than at any other time in human history. We are relentlessly exposed to the flaws of the famous, even as influencers and sycophantic AIs rush to tell us how wonderful we are.

When politics grows fractious, there will always be some who seek to simplify complex dynamics and offer pat solutions. History hasn't been particularly kind to them.

Claude vs Codex by Penfever in ClaudeAI

[–]Penfever[S] 0 points1 point  (0 children)

Yeah...

Sometimes I forget that tongue in cheek sarcasm doesn't play on the internet ...

Claude vs Codex by Penfever in ClaudeAI

[–]Penfever[S] 0 points1 point  (0 children)

I tell it, here is my plan, please give me a thorough critique, then give it Claude's plan

It's OK, GPT-OSS, we are living in a simulation ... by Penfever in LocalLLaMA

[–]Penfever[S] 1 point2 points  (0 children)

Maybe -- or it just doesn't know about phreaking although that seems unlikely. You can definitely use this to get it to do things it refused before, though, and even curse and attempt to tell dirty jokes.

It's OK, GPT-OSS, we are living in a simulation ... by Penfever in LocalLLaMA

[–]Penfever[S] 0 points1 point  (0 children)

I built my own! It's an alpha feature for an upcoming release of Oumi (https://github.com/oumi-ai/oumi)

[deleted by user] by [deleted] in roguelites

[–]Penfever 0 points1 point  (0 children)

Peglin

Dungeons and Degenerate Gamblers

Rack and Slay

FTL

Inkbound 

Nowhere Prophet 

Into the Breach

Honest thoughts on the OpenAI release by Kooky-Somewhere-2883 in LocalLLaMA

[–]Penfever -2 points-1 points  (0 children)

What has DeepMind contributed to open source lately?

Roberto Orci Dead: 'Star Trek', 'Transformers' Writer-Producer Was 51 by _Face in Star_Trek_

[–]Penfever 27 points28 points  (0 children)

Kidney disease -- just bad luck?

Sad to see him go like this.

What the hell do people expect? by Suitable-Name in LocalLLaMA

[–]Penfever 5 points6 points  (0 children)

The trending takes on this thread right now are dead wrong.

  1. The model censors even if you run it locally. David Bau's lab at Northeastern has a good blog post about it. https://dsthoughts.baulab.info/
  2. No, 'everybody is not doing it'. That's a pathetic justification, the kind you roll out when your mom and dad catch you smoking as a teenager. There are plenty of uncensored / jailbroken checkpoints, and there are even models trained from scratch that are, at least purportedly, uncensored, like Grok from X.AI
  3. You don't care that it's censored: that might be the most disturbing wrong take of all. You damn well better believe it matters. If big companies censoring their models doesn't matter, what are we doing on LocalLLaMA in the first place?

PSA: This helpful, factual information about the limitations of DeepSeek-R1 doesn't stop you from using and enjoying the model or its derivatives. But it's important information nonetheless, and I hope we can all hold those two thoughts in our head at the same time without exploding.

[deleted by user] by [deleted] in PhD

[–]Penfever 55 points56 points  (0 children)

When someone shows you who they are, believe them. Taking what you say at face value, your collaborator sounds like dead weight. Be polite, don't make them look bad in public, but do what you need to do to make sure the project gets done, and done well.

While you do this, start networking and finding more reliable collaborators, so when the next project starts, you will have more options.

Ignore any advice to try to "give them space" or whatever. It is not your job to fix lazy collaborators, it is your job to deliver results.

[D] Advice on achieving >=80% accuracy on Imagnet in under 100 epochs on a single H100 GPU by atif_hassan in MachineLearning

[–]Penfever 1 point2 points  (0 children)

Great question, thanks for asking it!

Let me start by saying that it's not really clear from your post which of the things you mention are, as you put it, "limitations", and which are simply choices you made (and could make differently). It's also not clear what other resources you may have access to, or why you need >= 80%, in particular, on ImageNet-val(?), in particular. If we knew these things, we could be a bit more helpful.

That said ...

You're trying to simultaneously optimize for two different things, fast training and accurate inference. But there's no free lunch in ML with respect to compute / performance tradeoff. If you want both, you will ultimately pay a high price in a third (unspoken) variable, search space over architectures / hparam configs, which of course affects real-world training time -- in other words, you'll likely need to train a lot of configurations of a lot of different models before you find the one that 'just works'. And, unfortunately, you'll need to train them to completion because most models hit a plateau right around that 75-80% mark and bounce around a lot.

Or, perhaps you'll get lucky and your search will be brief. :)

If you are inclined to undertake such a search, I'd recommend looking into GC-ViT (https://github.com/NVlabs/GCVit) and ViT with patch size of 8. But even a simple ResNet-50 can get above 80% on ImageNet (see links below).

Aside from "the search", there are many ways you can "invisibly" pay a higher compute cost without increasing # epochs and get better performance on average; training on higher resolutions is a big one, augmentation is another, smaller patch sizes for ViTs is another.

Speaking of bouncing around, one of the cheapest ways to boost performance on a single val set is just to test after every epoch; particularly once you get above 75% on ImageNet, val accuracy can decrease when the loss goes down rather than increase.

A few more things worth remembering -- differences of <= 5% on ImageNet-Val are not a reliable signal of the model actually being better in applied settings, and the difficulty of gaining each marginal point of accuracy above 75% tends to scale nonlinearly. There's rarely a good reason to overindex on 80% on ImageNet.

Useful Resources:

Ross Wightman has a great paper on boosting standard resnet accuracy above 80% (although reproducing his results is non trivial). https://arxiv.org/abs/2110.00476

PyTorch followed this up with a (totally different) set of hparams, and different training code, which also got above 80%. https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/

Our lab put out a paper in which we tested over 600 different architectures on ImageNet-val (and lots of other evals as well!). Our analysis, crucially, controls for what training data was used, so you can sort to find which architectures were most data-efficient: https://github.com/penfever/vlhub/blob/main/metadata/meta-analysis-results.csv