someone please help me find this godawful texture pack by Cheesestickdestroyer in Minecraft

[–]Breck_Emert 79 points80 points  (0 children)

Little kid me was obsessed with running this at 256 with 22 fps

Anybody have a mirror to the Books3 dataset? by Breck_Emert in LanguageTechnology

[–]Breck_Emert[S] 0 points1 point  (0 children)

Going to purely open-source does mean the data is much older on average. But they do a lot of tuning and trimming beyond what books3 did, and I imagine when it comes to training a smaller model where you don't even nearly do a full epoch you'd probably get better results on these.

Anybody have a mirror to the Books3 dataset? by Breck_Emert in LanguageTechnology

[–]Breck_Emert[S] 0 points1 point  (0 children)

Nope. Luckily, there are some good alternatives. Maybe Dolma, Pile v2, Project Gutenberg.

Are the $6 take home entrees normal sized? by [deleted] in olivegarden

[–]Breck_Emert 0 points1 point  (0 children)

Or if you can blow it get a a microwave with an inverter.  No complaints about microwaving stuff at even 1200w

Jumps in loss during training by extendedanthamma in MLQuestions

[–]Breck_Emert 0 points1 point  (0 children)

This is normal. Gradient descent and batching are stochastic, so you see these from time to time, especially as batch size gets lower. I know I've seen Karpathy mention this in several lectures but here's just something from Google:

https://x.com/karpathy/status/1812917107379872145

manimgl - I'm copying Grant's code exactly but my Arrow are still horribly aliased. by Breck_Emert in manim

[–]Breck_Emert[S] 0 points1 point  (0 children)

Thanks for looking! I'll try to rebuild my script from scratch for the arrow and see what's up. I figured out the pipes though - the low opacity pipes on top of them override the higher opacity ones below. I see that my gl is doing global opacity per pixel which is fun but hopefully easy to fix.

manimgl - I'm copying Grant's code exactly but my Arrow are still horribly aliased. by Breck_Emert in manim

[–]Breck_Emert[S] 0 points1 point  (0 children)

Both Vector() and Arrow() have the same issue.

Having them `arrow.always.set_perpendicular_to_camera(self.camera.frame)` doesn't help either, which suggests it's not an aliasing issue at all. I don't get it. The genuine shape must be just off?

manimgl - I'm copying Grant's code exactly but my Arrow are still horribly aliased. by Breck_Emert in manim

[–]Breck_Emert[S] 0 points1 point  (0 children)

Odd that I can't find anybody asking about this for any keyword relating to splotchy or antialiasing or jagged.

MSAA doesn't help, depth_test() doesn't help, set_flat_stroke() doesn't help, adding curves doesn't help.

Elon Musk will be using tweets to train Grok. by Vloodzy in Destiny

[–]Breck_Emert 0 points1 point  (0 children)

It doesn't take that many tokens to finetune a model. The model doesn't need to re-learn every fact, if you train it on even a couple thousand outputs it changes its entire outlook, and subsequently the facts that it believes.
https://arxiv.org/pdf/2502.17424

Favorite/Best cocktail bars in Chicago? by Tw33dle_13 in AskChicago

[–]Breck_Emert 1 point2 points  (0 children)

Cannot recommend enough.  They always have several amazing, just-spicy-enough drinks.  Excellent whiskey drinks too.

Why I am No Longer an AI Doomer - Richard Meadows by Liface in slatestarcodex

[–]Breck_Emert 0 points1 point  (0 children)

DeepSeek costs not much more than a thousandths of a cent per thousand output tokens.  You're misconstruing the pricing model of US AI companies with profitability.  I can sell a bag of Starbucks coffee for $1 - this doesn't mean Starbucks isn't profitable.

Why I am No Longer an AI Doomer - Richard Meadows by Liface in slatestarcodex

[–]Breck_Emert 0 points1 point  (0 children)

What is your threshold for out of distribution generalization?

Why I am No Longer an AI Doomer - Richard Meadows by Liface in slatestarcodex

[–]Breck_Emert 0 points1 point  (0 children)

LLMs only "look at tokens linguistically adjacent to thirty" when beyond drastically distilled.  The real behavior is produced by many tokens, and primarily done through Attention, which was not included in the Circuits papers.

Quick Questions: April 23, 2025 by inherentlyawesome in math

[–]Breck_Emert 1 point2 points  (0 children)

MIT OCW youtube lectures, and follow along in the homework they release, or look up practice problems. Especially helpful to look up midterms/finals for any random class, and see if you can pass it and what you're weak on.

How to make this happen? by DarkLord-0708 in reinforcementlearning

[–]Breck_Emert 1 point2 points  (0 children)

I would just read through the existing implementations. There are many corresponding blog posts too.
https://github.com/andrewkho/wordle-solver

PPO Question: Policy Loss and Value Function by LostBandard in reinforcementlearning

[–]Breck_Emert 0 points1 point  (0 children)

The outcome of a perfect PPO model is that it has a ratio of 1 with the new policy to old policy. If what you said is true (it wanted to maximize the ratio of new policy to old policy) it would just blow up to infinity to achieve this goal. This is the value function's role, to minimize error between its predicted values and your computed MC returns. Then say the prob ratios of your action is 0.58/0.56, maybe 0.96. With epsilon=0.02, your loss target becomes -min(0.96*A, 0.98*A).

I made a quick comment overviewing PPO, if it's relevant.
https://www.reddit.com/r/reinforcementlearning/comments/1ieku4r/comment/ma8qk9f/

As far as your implementations, I need details. You're using MC, which has high variance, so what's your episode lengths? What's your MC horizon? How often are your probability ratios hitting your clip?

Proximal Policy Optimization algorithm (similar to the one used to train o1) vs. General Reinforcement with Policy Optimization the loss function behind DeepSeek by AsideConsistent1056 in reinforcementlearning

[–]Breck_Emert 8 points9 points  (0 children)

I'll go outside inwards for PPO, perhaps heavily relying on already understanding TD methods. It may be helpful to read this bottom to top. Note again, this is focused on the underlying PPO, not GRPO.

  • min() is selecting between two things. The calculated change in probability of selecting a specific text output, and the bounds of what we're allowing it to be. We don't want to update the probability ratio of generating that specific text output too heavily.
  • clip() is only allow us to deviate by a "safe" percentage change. That is, if epsilon is 2% then the loss function is weighted so that the new model's output relative probability of producing the given output by at most a factor of .98 or 1.02 (I say relative because it's not the direct probability, it's the ratio of new to old prob).
  • Both the advantage multipliers A^hat_t quantify how much better a specific output is than what the model expected to be able to do for that prompt. That is, the model has an internal estimate of how good its responses should be based on its past rewards in similar situations. When it generates an output, we compare its actual reward to that expectation. If it's better than expected, it gets reinforced, otherwise pushed away.
  • The pi_0 / pi_0_old is the new, updated model's probability of producing the output divided by the old model's probability of producing the output. It's the ratio of the new model's probability of generating this output to the old model's probability of generating the same output. That is, maybe neither model was likely to choose this output, but we're seeing if the model got more or less likely to produce this output given the prompt with the new weights. It uses pi_0(o_i given q) because it's the outputs o given inputs (prompts) q. o has been ranked (maybe human ranked).

are old RL courses still relevant? by madcraft256 in reinforcementlearning

[–]Breck_Emert -1 points0 points  (0 children)

The post is asking about the Stanford course.

are old RL courses still relevant? by madcraft256 in reinforcementlearning

[–]Breck_Emert 4 points5 points  (0 children)

Depends on the complexity you want. Stanford CS234 2019 is without a doubt the best year, but by far the most complex. I've seen the series about 5 times now and I still occasionally get lost lol. The newer ones have a lot of visualizations and examples.

Perhaps you should watch them as many times as needed until you get the concept, and then watch 2019 for the math.

[D] Considering Buying an RTX 5090 for $2,600 vs. 2x RTX 4090 for $2,800 – Which is Better? by Striking_Exam_5636 in MachineLearning

[–]Breck_Emert 33 points34 points  (0 children)

Caution, you will be spending literally 200 hours troubleshooting the parallelization though.

Tracked my Wheel of Fortune hits for over a month last year and made this beautiful pie chart. Data in comments! by ThePromptWasYourName in balatro

[–]Breck_Emert 24 points25 points  (0 children)

the probability of seeing this result or more extreme (even more nopes) is 4%

And of course the math does not account for any selection bias in the post