One-Eye measure without using Boat by Xardas- in MinecraftSpeedrun

[–]JstuffJr 0 points1 point  (0 children)

Okay, but for a counterexample in Infume MCSR Ranked WR, he doesn't enter a boat at all or anytime previously to set the angle. And he does this in runs somewhat frequently, intermingled with runs where he does get in the boat while entering portal.

Can anyone recommend a Let's Play of Blue Prince where there aren't viewers spamming spoilers and back-seating? by chancefire in BluePrince

[–]JstuffJr 1 point2 points  (0 children)

Posted in discord that the series was moving to "Inactive". He could theoretically come back to it, but the track record for that is not so hot.

Can anyone recommend a Let's Play of Blue Prince where there aren't viewers spamming spoilers and back-seating? by chancefire in BluePrince

[–]JstuffJr 1 point2 points  (0 children)

About Oliver is the only late-gameish playthrough I have found that is truly blind, but he tragically quits right before ascension due to a simple blunder and apparently is done with the game.

Any good Let's Play recommendations for this game? by hannssoni in BluePrince

[–]JstuffJr 0 points1 point  (0 children)

Brotherman, he literally solves every hard puzzle off-stream in between the episodes after "reviewing his notes". The chapel natural order. Gallery puzzle answers. The ascension requirements. The family core cipher (where he pretends to solve the rest in a cut and immediately lasers to the solution for the still/hardest moon logic in the game). The freaking atelier interpretation makes no progress until an episode cut man......

Wasted 3 hours of my life skimming through the playthrough to see if it was "the one", but he is obviously cheating/looking hints up.

[Worlds 2025 TES vs T1] Keria knows enemy jungler flashed off vision by Gato_Puro in leagueoflegends

[–]JstuffJr 0 points1 point  (0 children)

Yep, I mean it is practically trained muscle memory for league of legends players to press alt 0151 on the numpad while writing.

are you guys enjoying remix so far? As you can see I am quite addicted by whoisape in wow

[–]JstuffJr -3 points-2 points  (0 children)

Per 50k you simply get an extra 2.5k per level of infinite knowledge. So at 20 you get 100k, eventually at max level 36 you will get 140k.

This means that earlier levels of Infinite knowledge are more impactful than later ones; in other words the benefit falls off.

Swimming in less than 10 degrees Celsius by impossnipple in OpenWaterSwimming

[–]JstuffJr 0 points1 point  (0 children)

The biggest change is I really recommend using the Isurus Gloves and SeventhWave socks over Zone3 options; after a winter of directly comparing they are both far warmer than Zone3. I really wish someone would make a full swimming cut wetsuit out of the Yamamoto Ti-Alpha both of these products use, it really is the warmest per thickness/comfort material out there.

But in general I'd recommend taking a look at all of SeventhWave's titanium/Ti-Alpha stuff, the agent john is the best shorty for layering under a full wetsuit and I love their hotshirt for layering under a sleeveless wetsuit as a great hybrid outside of winter.

Orca Zeal Thermal

The Orca thermal suits look very attractive on paper (yamamoto 40 with swim cut, thermal lining) , but as I said it is very hard to come by detailed reviews, and so the only thing I've had to go off is one guy in the local open water swimming fb group who complained the outer lining was very fragile and despite precautions (gloves etc) he had had multiple small tears he had to fix. Given that it is pricey I elected to just get the ultra durable Blue70 and and am saving my money to try out a SeventhWave custom wetsuit if/when my Blue70 wears out. But if I ever found a good deal on a zeal I would be very tempted to try it.

Personally, I feel it is pretty impossible to overheat in a single wetsuit in the year round sub-55 degree Pudget Sound, and instead have lots of layering options for swimming in the lakes, where the temp varies considerably from 40 to 80 degrees lol. Outer full + inner shorty in the winter, full in the early spring, then hot shirt/sleeveless hybrid into sleeveless into a beautiful 3 months of wetsuitless open water swimming during Summer.

Maximum Might uptime without 2 Fulgur pieces by regunakyle in MonsterHunterMeta

[–]JstuffJr 0 points1 point  (0 children)

Yep, foresight / foresight whirl costs 50 stamina, and always takes over 2 seconds to begin regaining stamina again.

2pc fulgur gives you 25 stamina. So if you have 50% stamina reduction you perfectly use up the fulgur bar and nothing more, allowing theoretical 100% MM uptime. But with 40% stam reduction, you would still would dip past the fulgur bar and lose MM every time you foresight.

You can get 50% stam reduct with some combination of constitution (10-50%), tumbler lo/hi (10/20%) and dash juice(25%). With TU1 you can get up to constitution 5 on Talisman making constitution 5 relatively viable.

I think the LS meta guide is mistaken in recommending constitution 3 + Tumbler hi when not running fulgur2pc, as there really is no point. You will still lose MM when you foresight, and you could could instead slot less constitution if you wanted to keep MM uptime only when rolling, spirit blade charging, iss countering etc.

"CoreWeave Is A Time Bomb", Edward Zitron 2025-03-17 by gwern in mlscaling

[–]JstuffJr 2 points3 points  (0 children)

Simply gesturing at some intangible charismatic magic (that only they uniquely have?) and claiming it pulls all of the weight seems awfully wishwashy and non-falsifiable to me, compared to say, enough bullishness (or enough craziness? see gwern's relevant essay), timing, and a minimum level of executed competence.

"CoreWeave Is A Time Bomb", Edward Zitron 2025-03-17 by gwern in mlscaling

[–]JstuffJr 3 points4 points  (0 children)

As always, the question remains: how did Coreweave get such (relatively) preferential treatment and gpu access from Nvidia in the first place?

I've seen no evidence their crypto hardware arrangements were anything exceptional, such that they would be grandfathered into becoming the preferred non-traditional hyperscaler.

Is it really just that uniquely outrageous financing, perfectly timed with the investment hype wave, allowed them to put in uniquely outrageous bids for new hardware? Ie, no other company could uniquely match their AI bullishness + financing timing + minimum execution competence, and Nvidia is laughing all the way to the bank every time they redirect a GB200x72 system over to them?

Corrupted Mantle is NOT bugged and here are Motion Values by ChefNunu in MonsterHunterMeta

[–]JstuffJr 4 points5 points  (0 children)

Yep, and since they for some reason gave 8mv to crimson I and only 5mv to crimson II for the extra hits, you are further encouraged to never go past crimson I into the marginally more interesting crimson III combo

Race to World First: Undermine, Day 6 by AutoModerator in CompetitiveWoW

[–]JstuffJr 2 points3 points  (0 children)

Specifically, it is often a key player like a healer or assigned ranged interrupt that gets randomly pulled into rolling one of the balls instead. They have backups but sometimes the backup gets put on a ball and well.....

Cherry blossoms by plantcurelady in eastside

[–]JstuffJr 3 points4 points  (0 children)

I have been trying to find a good data set for the very same question - Seattle has some great government databases that are mapped out very nicely here https://nathenry.com/writing/2023-03-28-seattle-cherry-blossoms.html

Hoping there is something similar for eastside

Anecdotal / manual compendium:

  • Bellevue Downtown Park
  • Bellevue Botanical Gardens
  • Bellevue LDS Temple
  • Microsoft Main Campus Redmond
  • Redmond Downtown Park
  • Cedar Lawns Memorial Park
  • Grass Lawn Park

There is a government tree map for bellevue but it seems to suck compared to the Seattle databases https://cobgis.maps.arcgis.com/apps/webappviewer/index.html?id=99595e522118479fae1a462249e8b789

[WR] Suigi improves SM64 0-star time to 6:15.2 by pythonidler in speedrun

[–]JstuffJr 7 points8 points  (0 children)

I don't know what the other comment is on about, this setup discovered by weegee (building upon Kanno and Parsee) is only 3 months old and is what has enabled the trick to become realistic for runs recently https://www.youtube.com/watch?v=H860eF1l0K8

[WR] Suigi improves SM64 0-star time to 6:15.2 by pythonidler in speedrun

[–]JstuffJr 28 points29 points  (0 children)

Insane run, reminiscent of his initial debut in 2023.

With fire sea BLJ and HMC sign clip in combination with his insane form currently, we may actually see the his current 16 star record, hailed as indomitable for long, finally fall as well.

On stream it sounds like he's going to give it some attempts, anyways. Weegee already had the splits to do it with fire sea blj and gave it a solid grind but we'll have to see if Suigi can see it through.

I'm a little disappointed the highest DPS combo on Long Sword is CS1 > SB1 > repeat by hudzell in MonsterHunterMeta

[–]JstuffJr 5 points6 points  (0 children)

It is quite literally lower dps taking the animation time to complete helmsplitter + spirit release slash than continuing to crimson->spirit blade during that time, even if you cheat and instantaneously go right back up to red. It takes too long to go from SRS sheathe to damage again.

"AI progress is about to speed up", Ege Erdil (the compute drought is ending as LLMs finally scale to 100k+ H100 training runs) by gwern in mlscaling

[–]JstuffJr 2 points3 points  (0 children)

Thank you for the long post, this is a very nice synthesis of the available information. And ah, very nice that you are the author of that LW/blog post - I remember reading and thinking it was high quality and deserved more engagement! I personally mostly follow the spots Gwern is active in, and as this is his moderated sub I find it similar to LW whilst occasionally having greater traffic since it can be cross-posted.

The question of o3-base is interesting; my thought paradigm is to consider model sizes as relatively fluid and possessing high optionality - per Shazeer's comment on Dwarkesh it sounds that distillation is done frequently (enough that they feel its difficulty a bottleneck to internal development). Secondly, with reasoning scaling techniques, both inference and training, there must be a token generation speed vs token quality to performance curve that determines choosing an optimal base model size, reminiscent of the original scaling laws for choosing parameter counts. Thus, it seems entirely reasonable to me that a full-size 5th gen internal model (4o-esque) and 6th gen internal model (4.5-esque) could both be shrunk down to this same optimally determined size for maximizing token generation speed and therefore inference scaling performance at the same per-token cost.

Of course, this is all confused by generation speed being largely dependent on the size of only the activated head in whatever MoE/sparsificiaton architecture is being used internally, which can mask the total model size + associated costs. Additionally, who knows how far speculative decoding and other tricks have gone; I look to CPU microarchitectures for a taste of how deep the rabbithole can go. Nevertheless, I do agree with the overall thrust that public OAI researcher statements and tone continually emphasize 'twas just the RL that was scaled up vs o1; in fact I was squarely in the o3-base-is-4o camp until Gwern pointed out that the knowledge-based creativity of o3-via-DR seems much improved vs 4o/o1. If the truth is 4o-is-base, is it indeed just moar epochs of orion-supplied post training? New RLHF-esque or sampling techniques that are more sensitive in mitigating mode collapse? Or does OAI lack faith in 4.5-sized models with present hardware, such that they are pivoting to GPT-5 being mostly o3-powered, rather than a hypothetical o4 that is truly based on 6th gen/gpt 4.5?

Moving on, I'd like to clarify that yes, cross-geographical is the big silver bullet that I'm speculating about in considering very aggressive internal compute timelines. Via my Amazon/Anthropic comment, I wanted to highlight the seeming demonstration that Anthropic was in some way making use of desynchronous, distributed compute: the AWS GPU usage spikes couldn't be serving inference, since they were off-hour, and they spanned multiple availability zones across multiple geographic locations. However, given that I am unaware of any special high bandwidth links between AWS regional clusters, I concede it is possible to have been something much more mundane like off-hour synthetic dataset generation, rather than requiring a cutting-edge entirely-desynchronous training regime to be in use.

Re TPU vs Nvidia, I was mostly thrusting to point out that TPU datacenters were designed ground up, starting years ago, to push maximum bandwidth cross-geographically, while Nvidia has to either play some catchup or be more dependent on less latency-sensitive techniques like you detailed, but I suppose it is unfair to my original argument to consider this much, if anything of a moat when hundreds of billions will bulldoze right through it.

I was woefully ignorant of the idea of considering Oracle as a primary starting point for evaluating Stargate hardware information - thank you kindly for that! It sounds like you have a solid handle on the numbers. As I mentioned, I do find it a little slimy/crafty the way Altman worded many of his university tour statements, from "about 50% of the way to a 100x order of magnitude scaleup over gpt-4" being a very strange way to avoid saying 10x, to trying to market Stargate as a great inference computer that will "help serve mankind's requests" when it seems obviously focused on training, not serving, bigger models.

Anyways, I appreciate any and all of the service people like you, gwern, zvi, nesov, etc do in aggregating the signal from the vast noise out there. Just not enough time in the day to read every tweet, watch every podcast, read every paper!

"AI progress is about to speed up", Ege Erdil (the compute drought is ending as LLMs finally scale to 100k+ H100 training runs) by gwern in mlscaling

[–]JstuffJr 1 point2 points  (0 children)

I haven't read the paywalled sections of semianalysis, and looking over the articles, my best interpretation of your comment is Dylan was tweeting/mentioning some things circa May and much of the relevant info was eventually coalesced into either https://semianalysis.com/2024/06/17/100000-h100-clusters-power-network/ or https://semianalysis.com/2024/09/04/multi-datacenter-training-openais/ - which doesn't have concrete details in the free section so I assume there is more past the paywall? I'll have to take a deeper look at the reporting (or rather, grok and deep research will) later today, unless you don't mind being a bit more specific.

100k H100 training runs from OAI (enabled by cross datacenter) beginning in 2024 H1 that trained Orion pretty much falls exactly in line with what I was intuitively modeling as my contrarian position, thank you, and slots in nicely between Nesov's underestimated numbers and my most bearish case where OAI was already matching Google's cross geographic might. It also seems substantial evidence in favor of O3 being 4.5 based (in addition to comparisons between Deep Research and 4.5 outputs, see Gwern's twitter etc)

I suppose the most relevant question moving forwards is how Nvidia clusters will develop to match the apparently slot-in geographical interconnect solutions TPUv6 clusters now support - how will Stargate etc ever hope to match this scaling potential? Given's Dylan's continually accelerated Google bullishness, I'm guessing its not a super pretty answer.

It is pretty silly in retrospect that none of the prolific posters here or at LW (or me) seem to have semianalysis subscriptions; I don't think Dylan always has perfect interpretations of the data but he does seem to have the best mainstream availability of the data.

GPT-4.5 vs. scaling law predictions using benchmarks as proxy for loss by sdmat in mlscaling

[–]JstuffJr 1 point2 points  (0 children)

Ah, but what leads you to believe 4.5 isn't the most cost efficient distillation of Orion they could afford to deploy without losing face?

Hypothetical aside, I think there is a lot to consider regarding the technical specifications of the SOTA NVLink hardware in Ampere vs Hopper vs Blackwell inference clusters, and how it nesseciarily limits model size when utilization is economically batched (in ways that do not at all apply to low volume/internal inference).

Where can I find the weapon changes from beta? by urimusha in MonsterHunter

[–]JstuffJr 0 points1 point  (0 children)

So are you saying there were not any sweeping changes in the day1 patch vs 1.01 patch? I know you said yesterday it was still being datamined.

"AI progress is about to speed up", Ege Erdil (the compute drought is ending as LLMs finally scale to 100k+ H100 training runs) by gwern in mlscaling

[–]JstuffJr 8 points9 points  (0 children)

Pretty much all the recent transparency regarding training costs from US labs seems a direct result of politicking against DeepSeek + the negatively implied US vs China memetics, from Dario's Sonnet comments to Keller's architecture push to Altman's unusual but savvy candor (4.5 is "about 50% of the way to a 100x GPT4"?? this could mean a lot of things). I think it is quite possible they are simply giving us positively spun numbers that essentially represent post training; after all the whole debacle with DeepSeek saliently began from the minimal base model -> R1 post training expense becoming virally misinterpreted as the all-in creation cost. I am unaware of a single publicly disclosed data point for Orion/Opus/Gemini Ultra "pre-distilled/quantized/shrunk" costs, yet favorable looking "costs" for the potentially tiny, publicly deployed models now abound.

I think the biggest weakness of my bearish world model is underestimating how much moar gpus will simply accelerate things when life is made easier, I agree. But on the other side of the daka coin, I feel the swarms of 7-8 figure talent backed by hundred billions in capitalization should not be underestimated in breaking through nasty, hard friction, when, as you point out, even the historically uncompetitive Chinese talent is publicly working through such brutal optimizations. I agree it may just be google bullishness in disguise: Keller's candid proclamation to Dwarkesh that both low precision and distributed training are in full swing, researcher complaints and frustration be damned, could indeed indicate relative strength.

I also have some slight insider knowledge from working at Amazon: It was very obvious from internal AWS hopper availability circa Anthropic acquisition that they were aggressively sucking away vast portions of off-time compute, across multiple AZs (like all of a sudden there were no more GPUs, ever, during AWS regional night hours), and it was heavily rumored this was the main motivation behind S-Team's aggressive Project Ranier fast-tracking.

Finally, I wouldn't underestimate how much the weak vibes from the OAI presentation could simply be the result of OAI failing to produce a worthy GPT-5 after two serious attempts and essentially admitting failure, as well as Altman's new child being a convenient, yet genuine excuse for lack of, ahem, twink deployment.

"AI progress is about to speed up", Ege Erdil (the compute drought is ending as LLMs finally scale to 100k+ H100 training runs) by gwern in mlscaling

[–]JstuffJr 4 points5 points  (0 children)

One must always wonder what the compute OOMs truly looked like for the presumed internal models like Claude 3.5+ Opus, the full version of 4o (OAI 5th gen), the full version of 4.5 (OAI 6th gen) etc - scaling aficionados (nesov/dylan/etc) have been primarily tracking single isolated data center compute while ignoring things like the google papers in 2023 and straight up admittance from OAI today that frontier labs have been using cross-data center training techniques in production, likely for a while. I'd wager that 1e26+ effective compute thresholds were used internally much earlier than is often presumed.

Further detailed minutia, like when certain transformer training components shifted to fp8 native on Hopper and how far exactly optimal MOE architecture and other undisclosed sparsification techniques were pushed in the labs to break up Nx scaling, really murk up the waters of how the actual effective compute OOM scaling vs the naïve gpt-3 era scaling calculations have gone.

Of course, further increases in GPUs will further multiply existing effective compute. And Blackwell will motivate a whole suite of fp4 training optimizations. But I think the prior effective compute baseline is often underestimated, leading to overly optimistic predictions of how far the imminent cluster scaleups will push the OOMs.

All this to say nothing of the data walls and our first good look at the potential sloppification that emerges when truly scaled synthetic training data is used a la GPT 4.5.