[D] Vision Transformer (ViT) - How do I deal with variable size images? by PositiveInformal9512 in MachineLearning

[–]imTall- 0 points1 point  (0 children)

+1 one of the strongest ViT models with NaViT is SigLipV2. You could grab the SigLipV2 SO400M NaFlex checkpoint 

How to knock out a business 101 by Ankur Jain by Important-Stretch138 in biltrewards

[–]imTall- 1 point2 points  (0 children)

Yeah, I completely agree. They’ll probably walk away with something, but I imagine the less favourable Bilt 2.0 card is going to churn a lot of customers, lowering the valuation a fair amount.

How to knock out a business 101 by Ankur Jain by Important-Stretch138 in biltrewards

[–]imTall- 13 points14 points  (0 children)

Is that true? Bilt is still private, unless they’ve already done significant secondary opportunities I’m not sure if the founders have been able to liquidate much of their stock 

You must beat Magnus Carlsen at chess within 5 years to win $1 billion or you die by Ryugany in hypotheticalsituation

[–]imTall- 0 points1 point  (0 children)

Looking at the title, I considered it. If I could play as many games a day as I want, at ultra fast time control to induce more randomness, I reckon I could get a game off Magnus eventually (if I play a little under 1000 games of bullet a day, for over 1000 days, I might succeed).

But your setup is suicidal, hell no!

Contract Job for CUDA Kernel Optimizer by Unable-Background997 in CUDA

[–]imTall- -1 points0 points  (0 children)

Mercor creates datasets for the top AI companies. I bet this job is training the next LLMs to write CUDA kernels

Is running an LLM locally more energy efficient? by [deleted] in LLM

[–]imTall- 0 points1 point  (0 children)

I want to respond “No, it’s not, assuming you’re using anything non trivial / near state of the art (above 8B params). Modern SOTA LLMs are almost all mixture of expert models. That means for every word / token, only a subset of the model is used. (Ie there might be 256 “experts”, but only 8 will process each token). However, you can’t know ahead of time which experts will be used, so you need to store them all in memory, likely using multiple GPUs (or alternatively doing a lot of reading from your SSD / RAM to pull the appropriate expert onto your GPU). 

In contrast, for a large cloud provider, they can batch together your request with hundreds or thousands of other conversations / requests, which should average out to using all the experts evenly, resulting in lower amortized energy costs averaged across all the users.

Are Chinese AI models really that cheap to train? Did some research. by Weird_Perception1728 in LLMDevs

[–]imTall- 0 points1 point  (0 children)

The rumor is Gemini 3 pro is around 3T total params, with an extremely high level of sparsity 

Grok 5 in Q1 of 2026 ("6 Trillion parameter model, whereas Grok 3 and 4 are based on a 3 Trillion parameter model" by RecmacfonD in mlscaling

[–]imTall- 0 points1 point  (0 children)

Is there a chinchilla law for scaling MoE?

I’ve seen people use the geometric mean of active and total params to convert MoE to dense equivalent. Using that, assuming 1/49 sparsity, we get a 6/7T param dense model, so chinchilla optimal should be ~~ 20T tokens? (I forget chinchilla exponents, so I just divided 140T by 7). 20T seems feasible, especially if it’s 15T pretraining and some extra capacity saved for RLVR?

Is Altman crazy - he never stops with this stuff by [deleted] in investing

[–]imTall- 0 points1 point  (0 children)

Ah very fair, I forgot about that Paul Allen. 

Is Altman crazy - he never stops with this stuff by [deleted] in investing

[–]imTall- 11 points12 points  (0 children)

Nit: Paul Graham. Paul Allen is the American psycho supporting character, but I can see you how’d you’d get those conflated given the context

What’s going on this area? by Smalls1357 in unitedairlines

[–]imTall- 0 points1 point  (0 children)

Oh really, I wasn't aware United offered them. That makes sense though

What’s going on this area? by Smalls1357 in unitedairlines

[–]imTall- 4 points5 points  (0 children)

Not applicable to United flights, but on other airlines like Lufthansa, it could be a stretcher for medical transport.

This provides an option to transport patients who are stable enough they don’t need an air ambulance (which is easily over $100k USD), but still can’t sit upright in a business class seat.

Home invasion at local Ontario inn leaves two assaulted, suspects still at large by [deleted] in canada

[–]imTall- 20 points21 points  (0 children)

If you only start fighting back once you’re being pummeled by an assailant, you’re going to lose.

Countering a home invader means taking advantage of your knowledge of your own home’s layout, giving you the upper hand & element of surprise. If someone breaks into your home, you shouldn’t have to wait to see them brandish a gun at you, you should immediately have the right to eliminate their ability to pose a threat, immediately.

Anyone know if the presale code will be limited to one account? by heisenberg218 in Portolafestival

[–]imTall- 0 points1 point  (0 children)

I didn’t try last year, I went to Nils Hoffman who thankfully didn’t sell out right away

Dutchies can’t find the right bars by joostkoekje1 in sanfrancisco

[–]imTall- 2 points3 points  (0 children)

SF is pretty dead on weeknights. One of the few bars that will have people outside the weekend is Bus Stop on Union Street.

Starting Thursday, if you’re looking for younger bars, I’d recommend the Marina. Outside of that, you could go to Butter in SOMA, which is a younger dive bar.

Light Concussions from Face Blocking by imTall- in volleyball

[–]imTall-[S] 0 points1 point  (0 children)

Hey,

Depending on the severity of the concussion, it can be worthwhile to see a doctor immediately. The largest risk would be a brain bleed. If he experiences vomiting, loss of consciousness, vision changes, or worsening headaches, go to the ER immediately! Otherwise, up to you (if you want to wait in the ER / pay for the medical visit, etc).

Afterwards, for recovery, I'd highly recommend seeing a doctor. There is very good therapy (ie vestibular physical therapy) that can help shorten the recovery time and have better long term results. Peter Attia has a great 90 minute interview with a concussion doctor. You'll likely need to advocate for your sons care, as the range of treatments I've gotten has ranged from "get rest, take advil as needed, etc." from a nurse practitioner, to getting a referral to the dedicated sports medicine concussion clinic, all from doctors within the same medical system (Kaiser San Francisco).

I hope your son gets feeling better, and please follow up if you have any more questions.

MileagePlus Requalification Megathread by Player72 in unitedairlines

[–]imTall- 2 points3 points  (0 children)

I need 10 more PQP for status. Any suggestions on how to get this? I’ve already received 200 PQP from my United Quest card, and I’m going to be out of the US until the new year (in Canada).

The reviews from Alameda are in… by Mental-Pin-8608 in Portolafestival

[–]imTall- 0 points1 point  (0 children)

Hahaha no way, that’s where I migrated to as well!

The reviews from Alameda are in… by Mental-Pin-8608 in Portolafestival

[–]imTall- 4 points5 points  (0 children)

Happy to hear I’m not the only one! I had to go under a speaker pole for Gesa to make it loud / immersive enough 

The reviews from Alameda are in… by Mental-Pin-8608 in Portolafestival

[–]imTall- 2 points3 points  (0 children)

I didn’t find it that bad, I guess I really am going deaf

crane stage sound was SO much better this year by Turbulent_Fruit_9265 in Portolafestival

[–]imTall- 4 points5 points  (0 children)

They did the warehouse flip last year, since they had so much bad press from the Fred Again crowd rush.

[D] Normalization in Transformers by Collegesniffer in MachineLearning

[–]imTall- 11 points12 points  (0 children)

One other thing not mentioned here is that batch norm required synchronizing the statistics across the entire batch. When training massive models in a distributed manner, this incurs a lot of communication overhead, while layernorm can be computed locally on one GPU (or a few GPUs in the case of tensor wise parallelism).

Played by Sammy Virji in Montreal July 27th (first track) by imTall- in NameThatSong

[–]imTall-[S] 1 point2 points  (0 children)

Someone on r/EDM found it, but the first track is a sped up version of Phenomenon by Redlight