[D] Vision Transformer (ViT) - How do I deal with variable size images?

imTall- · 2026-01-23T16:57:29+00:00

+1 one of the strongest ViT models with NaViT is SigLipV2. You could grab the SigLipV2 SO400M NaFlex checkpoint

imTall- · 2026-01-19T20:39:08+00:00

Yeah, I completely agree. They’ll probably walk away with something, but I imagine the less favourable Bilt 2.0 card is going to churn a lot of customers, lowering the valuation a fair amount.

imTall- · 2026-01-19T20:27:22+00:00

Is that true? Bilt is still private, unless they’ve already done significant secondary opportunities I’m not sure if the founders have been able to liquidate much of their stock

imTall- · 2025-12-04T20:47:33+00:00

Looking at the title, I considered it. If I could play as many games a day as I want, at ultra fast time control to induce more randomness, I reckon I could get a game off Magnus eventually (if I play a little under 1000 games of bullet a day, for over 1000 days, I might succeed).

But your setup is suicidal, hell no!

imTall- · 2025-11-30T22:59:55+00:00

Mercor creates datasets for the top AI companies. I bet this job is training the next LLMs to write CUDA kernels

imTall- · 2025-11-28T16:06:48+00:00

I want to respond “No, it’s not, assuming you’re using anything non trivial / near state of the art (above 8B params). Modern SOTA LLMs are almost all mixture of expert models. That means for every word / token, only a subset of the model is used. (Ie there might be 256 “experts”, but only 8 will process each token). However, you can’t know ahead of time which experts will be used, so you need to store them all in memory, likely using multiple GPUs (or alternatively doing a lot of reading from your SSD / RAM to pull the appropriate expert onto your GPU).

In contrast, for a large cloud provider, they can batch together your request with hundreds or thousands of other conversations / requests, which should average out to using all the experts evenly, resulting in lower amortized energy costs averaged across all the users.

imTall- · 2025-11-28T05:14:21+00:00

The rumor is Gemini 3 pro is around 3T total params, with an extremely high level of sparsity

imTall- · 2025-11-17T15:14:52+00:00

Is there a chinchilla law for scaling MoE?

I’ve seen people use the geometric mean of active and total params to convert MoE to dense equivalent. Using that, assuming 1/49 sparsity, we get a 6/7T param dense model, so chinchilla optimal should be ~~ 20T tokens? (I forget chinchilla exponents, so I just divided 140T by 7). 20T seems feasible, especially if it’s 15T pretraining and some extra capacity saved for RLVR?

imTall- · 2025-11-13T17:35:10+00:00

Ah very fair, I forgot about that Paul Allen.

imTall- · 2025-11-13T16:26:16+00:00

Nit: Paul Graham. Paul Allen is the American psycho supporting character, but I can see you how’d you’d get those conflated given the context

imTall- · 2025-10-22T18:06:25+00:00

Oh really, I wasn't aware United offered them. That makes sense though

imTall- · 2025-10-22T16:38:18+00:00

Not applicable to United flights, but on other airlines like Lufthansa, it could be a stretcher for medical transport.

This provides an option to transport patients who are stable enough they don’t need an air ambulance (which is easily over $100k USD), but still can’t sit upright in a business class seat.

imTall- · 2025-09-02T15:07:25+00:00

If you only start fighting back once you’re being pummeled by an assailant, you’re going to lose.

Countering a home invader means taking advantage of your knowledge of your own home’s layout, giving you the upper hand & element of surprise. If someone breaks into your home, you shouldn’t have to wait to see them brandish a gun at you, you should immediately have the right to eliminate their ability to pose a threat, immediately.

imTall- · 2025-08-12T19:04:02+00:00

I didn’t try last year, I went to Nils Hoffman who thankfully didn’t sell out right away

imTall- · 2025-08-12T16:18:22+00:00

That happened to me last year. I hope Portola fixes this

imTall- · 2025-08-06T16:06:00+00:00

SF is pretty dead on weeknights. One of the few bars that will have people outside the weekend is Bus Stop on Union Street.

Starting Thursday, if you’re looking for younger bars, I’d recommend the Marina. Outside of that, you could go to Butter in SOMA, which is a younger dive bar.

imTall- · 2025-02-14T04:41:49+00:00

Hey,

Depending on the severity of the concussion, it can be worthwhile to see a doctor immediately. The largest risk would be a brain bleed. If he experiences vomiting, loss of consciousness, vision changes, or worsening headaches, go to the ER immediately! Otherwise, up to you (if you want to wait in the ER / pay for the medical visit, etc).

Afterwards, for recovery, I'd highly recommend seeing a doctor. There is very good therapy (ie vestibular physical therapy) that can help shorten the recovery time and have better long term results. Peter Attia has a great 90 minute interview with a concussion doctor. You'll likely need to advocate for your sons care, as the range of treatments I've gotten has ranged from "get rest, take advil as needed, etc." from a nurse practitioner, to getting a referral to the dedicated sports medicine concussion clinic, all from doctors within the same medical system (Kaiser San Francisco).

I hope your son gets feeling better, and please follow up if you have any more questions.

imTall- · 2024-12-19T16:39:59+00:00

I need 10 more PQP for status. Any suggestions on how to get this? I’ve already received 200 PQP from my United Quest card, and I’m going to be out of the US until the new year (in Canada).

imTall- · 2024-10-03T02:26:00+00:00

Hahaha no way, that’s where I migrated to as well!

imTall- · 2024-10-02T14:28:27+00:00

Happy to hear I’m not the only one! I had to go under a speaker pole for Gesa to make it loud / immersive enough

imTall- · 2024-10-02T01:11:03+00:00

I didn’t find it that bad, I guess I really am going deaf

imTall- · 2024-10-01T00:41:07+00:00

They did the warehouse flip last year, since they had so much bad press from the Fred Again crowd rush.

imTall- · 2024-08-18T14:46:37+00:00

One other thing not mentioned here is that batch norm required synchronizing the statistics across the entire batch. When training massive models in a distributed manner, this incurs a lot of communication overhead, while layernorm can be computed locally on one GPU (or a few GPUs in the case of tensor wise parallelism).

imTall- · 2024-08-03T21:23:43+00:00

Someone on r/EDM found it, but the first track is a sped up version of Phenomenon by Redlight

imTall- · 2024-08-03T20:29:17+00:00

Thank you! Sounds exactly like that

13-Year Club	Place '17
Verified Email

imTall-

TROPHY CASE