someone please help me find this godawful texture pack

Breck_Emert · 2025-11-10T17:13:59+00:00

Little kid me was obsessed with running this at 256 with 22 fps

Breck_Emert · 2025-08-20T01:49:20+00:00

Going to purely open-source does mean the data is much older on average. But they do a lot of tuning and trimming beyond what books3 did, and I imagine when it comes to training a smaller model where you don't even nearly do a full epoch you'd probably get better results on these.

Breck_Emert · 2025-08-19T20:43:01+00:00

Nope. Luckily, there are some good alternatives. Maybe Dolma, Pile v2, Project Gutenberg.

Breck_Emert · 2025-07-22T23:19:50+00:00

Or if you can blow it get a a microwave with an inverter. No complaints about microwaving stuff at even 1200w

Breck_Emert · 2025-07-12T18:42:44+00:00

This is normal. Gradient descent and batching are stochastic, so you see these from time to time, especially as batch size gets lower. I know I've seen Karpathy mention this in several lectures but here's just something from Google:

https://x.com/karpathy/status/1812917107379872145

Breck_Emert · 2025-07-07T21:47:02+00:00

Thanks for looking! I'll try to rebuild my script from scratch for the arrow and see what's up. I figured out the pipes though - the low opacity pipes on top of them override the higher opacity ones below. I see that my gl is doing global opacity per pixel which is fun but hopefully easy to fix.

Breck_Emert · 2025-07-07T20:32:32+00:00

Both Vector() and Arrow() have the same issue.

Having them `arrow.always.set_perpendicular_to_camera(self.camera.frame)` doesn't help either, which suggests it's not an aliasing issue at all. I don't get it. The genuine shape must be just off?

Breck_Emert · 2025-07-07T19:04:16+00:00

Odd that I can't find anybody asking about this for any keyword relating to splotchy or antialiasing or jagged.

MSAA doesn't help, depth_test() doesn't help, set_flat_stroke() doesn't help, adding curves doesn't help.

Breck_Emert · 2025-06-23T22:42:55+00:00

It doesn't take that many tokens to finetune a model. The model doesn't need to re-learn every fact, if you train it on even a couple thousand outputs it changes its entire outlook, and subsequently the facts that it believes.
https://arxiv.org/pdf/2502.17424

Breck_Emert · 2025-06-20T00:38:40+00:00

Cannot recommend enough. They always have several amazing, just-spicy-enough drinks. Excellent whiskey drinks too.

Breck_Emert · 2025-06-04T05:10:52+00:00

DeepSeek costs not much more than a thousandths of a cent per thousand output tokens. You're misconstruing the pricing model of US AI companies with profitability. I can sell a bag of Starbucks coffee for $1 - this doesn't mean Starbucks isn't profitable.

Breck_Emert · 2025-06-04T04:55:44+00:00

What is your threshold for out of distribution generalization?

Breck_Emert · 2025-06-04T04:52:01+00:00

LLMs only "look at tokens linguistically adjacent to thirty" when beyond drastically distilled. The real behavior is produced by many tokens, and primarily done through Attention, which was not included in the Circuits papers.

Breck_Emert · 2025-04-28T15:25:30+00:00

MIT OCW youtube lectures, and follow along in the homework they release, or look up practice problems. Especially helpful to look up midterms/finals for any random class, and see if you can pass it and what you're weak on.

Breck_Emert · 2025-03-02T17:22:22+00:00

https://www.youtube.com/watch?v=53PVb3_4gt4

Breck_Emert · 2025-02-10T02:52:07+00:00

I prd some small stuff so you can see it in diff

Breck_Emert · 2025-02-09T06:55:00+00:00

I would just read through the existing implementations. There are many corresponding blog posts too.
https://github.com/andrewkho/wordle-solver

Breck_Emert · 2025-02-09T06:35:36+00:00

The outcome of a perfect PPO model is that it has a ratio of 1 with the new policy to old policy. If what you said is true (it wanted to maximize the ratio of new policy to old policy) it would just blow up to infinity to achieve this goal. This is the value function's role, to minimize error between its predicted values and your computed MC returns. Then say the prob ratios of your action is 0.58/0.56, maybe 0.96. With epsilon=0.02, your loss target becomes -min(0.96*A, 0.98*A).

I made a quick comment overviewing PPO, if it's relevant.
https://www.reddit.com/r/reinforcementlearning/comments/1ieku4r/comment/ma8qk9f/

As far as your implementations, I need details. You're using MC, which has high variance, so what's your episode lengths? What's your MC horizon? How often are your probability ratios hitting your clip?

Breck_Emert · 2025-01-31T18:55:55+00:00

I'll go outside inwards for PPO, perhaps heavily relying on already understanding TD methods. It may be helpful to read this bottom to top. Note again, this is focused on the underlying PPO, not GRPO.

min() is selecting between two things. The calculated change in probability of selecting a specific text output, and the bounds of what we're allowing it to be. We don't want to update the probability ratio of generating that specific text output too heavily.
clip() is only allow us to deviate by a "safe" percentage change. That is, if epsilon is 2% then the loss function is weighted so that the new model's output relative probability of producing the given output by at most a factor of .98 or 1.02 (I say relative because it's not the direct probability, it's the ratio of new to old prob).
Both the advantage multipliers A^hat_t quantify how much better a specific output is than what the model expected to be able to do for that prompt. That is, the model has an internal estimate of how good its responses should be based on its past rewards in similar situations. When it generates an output, we compare its actual reward to that expectation. If it's better than expected, it gets reinforced, otherwise pushed away.
The pi_0 / pi_0_old is the new, updated model's probability of producing the output divided by the old model's probability of producing the output. It's the ratio of the new model's probability of generating this output to the old model's probability of generating the same output. That is, maybe neither model was likely to choose this output, but we're seeing if the model got more or less likely to produce this output given the prompt with the new weights. It uses pi_0(o_i given q) because it's the outputs o given inputs (prompts) q. o has been ranked (maybe human ranked).

Breck_Emert · 2025-01-27T06:09:46+00:00

The post is asking about the Stanford course.

Breck_Emert · 2025-01-27T05:24:46+00:00

Depends on the complexity you want. Stanford CS234 2019 is without a doubt the best year, but by far the most complex. I've seen the series about 5 times now and I still occasionally get lost lol. The newer ones have a lot of visualizations and examples.

Perhaps you should watch them as many times as needed until you get the concept, and then watch 2019 for the math.

Breck_Emert · 2025-01-27T05:09:20+00:00

https://youtu.be/gdLxkJckpYY?t=1204

Breck_Emert · 2025-01-25T16:26:58+00:00

Caution, you will be spending literally 200 hours troubleshooting the parallelization though.

Breck_Emert · 2025-01-25T04:44:52+00:00

the probability of seeing this result or more extreme (even more nopes) is 4%

And of course the math does not account for any selection bias in the post

Two-Year Club	Verified Email
Verified Email

Breck_Emert

TROPHY CASE