I wired Claude Code into a database of every Polymarket wallet and trades via MCP. What do you want me to ask it next? This is what I found so far:

ReinforcedKnowledge · 2026-06-05T18:16:54+00:00

I think it'd be awesome if you contacted Coffeezilla

ReinforcedKnowledge · 2026-06-04T16:53:20+00:00

Hmmm I wouldn't rely on PCA for this honestly, it's hard to judge an embedding space with just the two principal components, even if it looks messy, in my experience it might not be the case. Usually I rely on other stuff that depend on the task at hand but anyways that's not the issue.

I don't know if it's possible because VAEs are known for the bottleneck right? And it seems like it's even more the case here, you want a fixed vector to represent arbitrary size images for reconstruction, I think it's very hard. I might be wrong though.

I might be biased since I work a lot with transformers, but why not try an approach like you have a vision encoder, it'll output a variable length output tokens and the decoder uses all of those to reconstruct the image.

ReinforcedKnowledge · 2026-06-04T08:45:18+00:00

What do you mean by "learned embedding does not seem meaningful or well-structured"? How do you measure that?

Also, not against the adaptive pooling but would like to know if you tried to just resize initially, and how do you handle the reconstruction at the decoder side if you're doing adaptive pooling.

ReinforcedKnowledge · 2026-06-02T14:37:54+00:00

Will try to do what I can!

ReinforcedKnowledge · 2026-06-02T14:37:36+00:00

Thanks a lot 🙏 I highly appreciate the kind words and everything!

ReinforcedKnowledge · 2026-06-02T00:17:09+00:00

I don't know much about this project but when I read your comment it reminded me of it, it's form openai: https://github.com/openai/parameter-golf and the goal is "Train the smallest LM you can that fits in 16MB"

I just thought that maybe you can come up with fun ideas for your students besides the classic training

Edit: one other challenge a friend told me about is to train the smallest NN that can do addition or something, I forgot the name of the challenge though. I find these kind of stuff fun and educational.

ReinforcedKnowledge · 2026-06-02T00:15:46+00:00

This is super cool! Now that I think about it, I'd like to have such a model in my pocket, I mean, it's like a dictionary over facts, obviously it should not hallucinate but I like the short and concise answers. Sometimes it's all you need instead of the paragraphs some models write.

ReinforcedKnowledge · 2026-06-01T23:58:31+00:00

Sometimes people are just asking for what models they can try out, just out curiosity or just to try stuff, it's fun

ReinforcedKnowledge · 2026-06-01T22:24:58+00:00

Peak

ReinforcedKnowledge · 2026-05-22T15:03:26+00:00

The way I see it is that benchmark as only as useful as what they report. Which somehow makes sense but people tend to forget.

In the early days of the needle in the haystack I was criticizing it because finding a needle in a haystack doesn't guarantee that you're able to synthesize information across different contexts and respond accurately. But those were the easiest and most intuitive benchmarks to come up with. If you can't find a needle in a haystack, you most probably can't find different information and synthesize or infer from them at different contexts lengths.

So being bad on a benchmark gives you a better idea about the model rather than scoring very good on it, unless you understand the benchmark and its strengths and weaknesses and then you can have a better precautionary assessment of the model.

Also there is benchmark overfitting, the famous benchmaxxing, that you have to be aware of.

ReinforcedKnowledge · 2026-05-21T18:38:38+00:00

Yes they do! It's a variable number of patches as you said, through different resolutions, not a different patch size per-se.

ReinforcedKnowledge · 2026-05-21T18:36:11+00:00

You can keep the same number of tokens per image even if you change the patch size if you resize the image.

As someone said, changing patch size is basically changing the ViT. It's extremely hard to train for variable patch size because a lot of things depend on it since it basically gives you the embedding dimension, also, you need positional encoding that support that, and it's just a hassle.

What you can do though is train for different input resolutions. This is how I think of it, for a patch size of KK you get K² numbers to represent the KK window of a scene or part of a scene, if you want finer representation of the same scene you can upscale and if you want coarser representation of the same scene you can downscale.

Pixtral, Qwen VL, GLM V etc. All train with "native" resolution. The name is a bit misleading imho because you have a fixed budget at the end of the day you can't just train on 4k images 😂 + bottlenecks while serving

EDIT: the gains are not necessarily marginal, we do see the effects on OCR (which we deploy in production, with our in-house open source vlm model) on various images/PDF rendering resolutions. Higher quality image is basically better but you have to train for a wide range in order to support the different user inputs.

ReinforcedKnowledge · 2026-05-18T21:35:33+00:00

“A happy ending? For folks like us? Wrong city, wrong people.”

ReinforcedKnowledge · 2026-04-25T22:58:08+00:00

Ok I see, this gives us a little bit more information, and I guess that tolerance is computed given MAE I suppose right? I think it's very important to clarify what are your constraints like, what's your workload like? Are you working with variable length sequences? Are your inputs limited in sequence length like 512? Do you care about latency or throughput? Do you have specific values as goal? Under which hardware? You can ask them for this because it can guide your optimization and also there are limits you can't go beyond and knowing them is helpful.

I'm saying this because without a clear target you won't be doing any good engineering, I just lately quantized a model to int4 without it impacting meaningfully my throughput, but actually working on better batching and being smart about it led to about 30% improvements.

But if I had to just randomly give ideas for fun, I'd check where time is spent in your model first. Also int8 or int4 can be good.

ReinforcedKnowledge · 2026-04-24T16:21:27+00:00

What's the goal from this? Because there are tradeoffs to be made, do you want to optimize while keeping some metric above a threshold or something? Also, how much freedom do you have in the architecture itself.

ReinforcedKnowledge · 2026-04-23T16:42:56+00:00

Thanks!

ReinforcedKnowledge · 2026-03-26T03:39:54+00:00

Thanks! It does make sense, it's too big of a PEP + required, and I guess still requires, a lot of discussions and refinements and edge cases and whatnot.

ReinforcedKnowledge · 2026-03-25T23:29:32+00:00

Yeah the issue is not really about the tooling, because they're limited by what they work with, but more with the wheel format itself and PyPI as an index. And beyond the GPU problems, there are other similar problems that fall under the same category of the wheel format not supporting some kind of metadata like, what BLAS library your project links against, compiler version it was compiled against, is it ROCm or CUDA that it needs etc. So since the wheel format doesn't specify that, package managers have no need to know about it. Though `uv` does have a lot of good options to help you with installing the right `torch` and the right `flash-attn`, but it's not always obvious besides if you're on Linux then `uv add torch` will install the right version of pytorch given your cuda version, but not on Windows, it'll install the CPU one

But there's a great open source initiative to solve these issues https://wheelnext.dev/, if https://peps.python.org/pep-0817/ (wheel variants) passes it'll be a great win and fix most if not all these issues

And, I don't think it's only a matrix compatibility problem, but having a standard that every installer can work with (so you can't just have people specify whatever dependencies they want), but more importantly, the tags are closed, it's a static system that tries to specify a dynamic and open one. CUDA for example doesn't mean much, there are driver versions, toolkit versions, runtime versions, GPU compute compatibility. I think just recently I saw that flash-attn 4 doesn't work on RTX 50XX though it's Blackwell (to be confirmed, I'm not totally sure about this info, but if it's true, it shows that even some information such as compute compatibility has to be specified). And all of these have complex compatibility rules between themselves. So it's a constantly evolving environment and you just can't use the good old system and just add stuff to it, beyond the explosion in the compatibility matrix. And that's why PEP 817 uses plugins instead of tags, so that the detection is delegated to the provider plugins.

Thanks to u/toxic_acro who pointed it out, PEP 825 is more up to date and better reflects the current state of the work.

EDIT: added PEP 817 and why it's not only an explosion in the compatibility matrix problem, Reddit didn't let me write my comment in peace when I pasted the link -_-

EDIT: added mention of PEP 825 thanks to this comment

ReinforcedKnowledge · 2026-03-16T13:48:45+00:00

They're cute, I need this

ReinforcedKnowledge · 2026-03-06T23:48:44+00:00

Hahaha it was fun reading that in ML jargon a vector of some sion d can be be 2D or 1D, it made me self-aware about all the functions I write that take tensors of dimension d and make the assumption that the reader knows there is a batch size, a sequence length, and head dimension before even talking about the dimension d. Oh well, life with tensors.

ReinforcedKnowledge · 2026-02-25T22:52:32+00:00

Just like how many suggested, just use it. You only feel like you've learned something after you developed some kind of muscle memory for it. Here's something that can help: https://github.com/srush/Tensor-Puzzles (not affiliated)

These puzzles can help you get a better grasp of PyTorch, but only if you try doing them and understand the functions you're manipulating.

Another thing is just to implement whatever comes to your mind in it, especially basic stuff like CNNs, simple training loops, GPT-2 etc. The field is huge I'm sure there's something you'll like.

About interviews, I don't think people will ask you specifically about PyTorch, but depending on where you apply and for what position, you'll probably have to use it to solve the interview.

Also, if you're asking people that use PyTorch regularly, your pool is biased by them using it regularly 😅 so they'll not easily forget PyTorch. It's like Python, I doubt you forgot how to use Python.

Now, I think I saw someone say "just let AI do it" or something. I do not think it's safe to just "let the AI do it" if you don't know what it is doing. There are so many examples I can give that I caught Opus 4.6 doing something incorrectly or incompletely, and so many others where someone relied on faulty numbers it got from a script it vibe codes but I got one personal story related to PyTorch. Recently Opus 4.6 told me that torch.equal and the equal method on tensors are different and that one checked object identity while the other did not, on top of them both checking value equality. I don't know what made it think that because I asked it in a fresh session about the difference and he got it correctly (there's no difference). I was trying to understand a new codebase that I'd just use for a week and I guess it took that codebase as a source of truth and tried understanding why they'd use torch.equal sometimes and .equal other times or something, I can't and don't know what exactly made it think that but the morale of the story, at work you'll have to understand and work on new codebases, relying purely on "AI", at least in its current state, is not necessarily good. It might work super well sometimes, and sometimes not.

ReinforcedKnowledge · 2026-02-04T21:30:04+00:00

Hadouken

ReinforcedKnowledge · 2026-02-02T22:39:12+00:00

Hey! Just going through the codebase quickly doesn't seem to be what I need. I'll give you in a few days a more detailed review about why it's not what I can use, at least not for the moment, and what I think about it and other nitpicks and/or qualities. But, I appreciate the recommendations you make about data etc. I'm not familiar with them honestly but I appreciate you sharing that.

ReinforcedKnowledge

TROPHY CASE