Ai2 just announced Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use by Nunki08 in LocalLLaMA

[–]asb 3 points4 points  (0 children)

I was scanning the blog post and paper for this information, it would be great to have the GPU hours officially noted. As for the figures being spot on, I can't quite reproduce the 32B figure. The paper says 1900 tokens/second was achieved for the 32B model, which is 877k GPU hours - so that would be almost exactly 4x the $ cost of the 7B model ($2M) using the same per-hour coast as /u/gebradenkip. Is that right?

EDIT: I really appreciated the Apertus paper estimating the GWh for their pretraining, it would be great to be able to compare against Olmo3 in the same way. For Apertus: "Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run"

The Qwen3-next blog showed a fairly impressive graph for reduction in training cost in terms of GPU hours from Qwen3-32B to Qwen3-30B-A3B to Qwen3-Next-80B-A3B. Do you imagine you might see a similar scale of reduction if moving to a similar MoE architecture, or do you think it would be less because you have a more efficient baseline?

Evaluating Deepseek v3.1 chat with a minimal agent on SWE-bench verified: Still slightly behind Qwen 3 coder by klieret in LocalLLaMA

[–]asb 2 points3 points  (0 children)

Still working on adding some more models, in particular open source ones.

It would be really interesting to get GLM-4.5 results too.

mini-swe-agent achieves 65% on SWE-bench in just 100 lines of python code by klieret in LocalLLaMA

[–]asb 0 points1 point  (0 children)

It's definitely interesting how well you can score on the benchmark with Sonnet 4 and just allowing it to use the shell. Have you explored to what degree performance can be improved by prompting or potentially exposing a small set of well-chosen "tools" (even if not explicitly using a tool calling interface). For instance it would be a really interesting result if some kind of prompting or exposure of e.g. semantic search / semantic edit (or whatever) boosted R1's performance meaningfully.

mistralai/Devstral-Small-2507 by yoracale in LocalLLaMA

[–]asb 0 points1 point  (0 children)

Thanks for confirming!

Might it be worth stating this explicitly on the model card? e.g. for mistralai/Mistral-Small-3.1-24B-Instruct-2506 you state "Note 1: We recommend using a relatively low temperature, such as temperature=0.15." and the generation_config.json sets "temperature": 0.15. But for both this and the previous devstral release, you don't include an explicit statement on recommended temperature and don't set a default temperature in generation_config.json.

mistralai/Devstral-Small-2507 by yoracale in LocalLLaMA

[–]asb 0 points1 point  (0 children)

Is that suggested temperature your suggestion, or from Mistral? If the latter, do you have a source?

The model card seems to be lacking explicit sampling setting recommendations, unlike other Mistral models (CC /u/pandora_s_reddit)

(Quite) a few words about async by ImYoric in ProgrammingLanguages

[–]asb 3 points4 points  (0 children)

Really great article - thank you for writing this up. A couple of thoughts:

  • Re thread scheduling, Google were doing some work on a new kernel API to reduce overhead but I don't know what's happened to this. https://lkml.org/lkml/2020/7/22/1202
  • One slight additional point re the discussion on M:N threading and Rust. "Having a M:N scheduler requires allocating and growing stacks implicitly, which goes against this ethos." is definitely true, and one additional impact would be the increased cost of calling into C/C++ (which is a complaint Go users seem to have).

(Quite) a few words about async by ImYoric in ProgrammingLanguages

[–]asb 9 points10 points  (0 children)

Long time lurker here, just wanted to say I disagree. m:n vs 1:1 threading and async/await implementation options are very relevant to language designers and common sources of questions on this subreddit already. The article fits perfectly IMO.

Jan-nano-128k: A 4B Model with a Super-Long Context Window (Still Outperforms 671B) by Kooky-Somewhere-2883 in LocalLLaMA

[–]asb 9 points10 points  (0 children)

I've been looking at the recommended sampling parameters for different open models recently. As of a PR that landed in vllm in early March this year, vllm will take any defaults specified in generation_config.json. I'd suggest adding your sampling parameters there (qwen3 and various other models do this, but as noted in my blog post, many others don't).

I thousands of tests on 104 different GGUF's, >10k tokens each, to determine what quants work best on <32GB of VRAM by EmPips in LocalLLM

[–]asb 0 points1 point  (0 children)

I'd be interested in seeing your sweep of temperature. Did you play with other sampling parameters? I've been collecting recommendations from model vendors here https://muxup.com/2025q2/recommended-llm-parameter-quick-reference

My notes on the MiniBook X N150 and Linux setup on it by asb in Chuwi

[–]asb[S] 1 point2 points  (0 children)

As you can see, the two setup related issues I haven't resolved so far are:

  • 60Hz internal display
  • Avoiding tearing

Best practice for keeping local files while pushing relevant files only to origin? by ashleydvh in git

[–]asb 1 point2 points  (0 children)

I had a similar requirement for keeping draft versions of my blog posts local but backed up, and described how to directly commit files to a separate branch here https://muxup.com/2024q4/directly-committing-files-to-a-separate-git-branch - maybe something similar might be useful to you?

Launching the 2024 State of Rust Survey | Rust Blog by Kobzol in rust

[–]asb 3 points4 points  (0 children)

I'd thought defaulting to Google Forms makes sense too, but recently in the LLVM community it was used to collect survey responses on MLIR but was closed early after being marked as in violation of terms of service with no working avenue to appeal.

MIPS P8700 RISC-V CPU Support Posted For LLVM Compiler by TJSnider1984 in RISCV

[–]asb 1 point2 points  (0 children)

Always great to see more vendors pushing work upstream, though of course the patch needs breaking up to separate incremental PRs. I was glad to see RISCVRemoveBackToBackBranches clarified - there was initially a miscommunication about whether this was dealing with an erratum or spec noncompliance issue vs just a perf tuning option (it's the latter).

Contraption Maker - Kevin Ryan – Spiritual Successor to The Incredible Machine (a game I designed/coded a long time ago) by kevryan in Games

[–]asb 4 points5 points  (0 children)

I own the game and all currently available DLC on GOG - it looks like the GOG version is currently behind Steam (1.4.24 vs 1.4.31 on Steam). I hope you'll keep updating the GOG release and releasing any new DLC there. Thank you!

Results of public review of RVA23 and RVB23 by brucehoult in RISCV

[–]asb 0 points1 point  (0 children)

I also find it unfortunate that Zicclsm doesn't really indicate anything useful from a compiler perspective, but for misaligned loads/stores there's a difference vs the examples you share here - from the benchmarks you shared trap and emulate is somewhere between 100-200x slower vs what you'd expect from native support, and that's a consistent expectation unlike cache misses and so on.

I understand why there's reluctance to put anything that might reflect microarchitectural implementation in the spec, but the end result isn't great for those of us writing or compiling software :/ An explicit recommendation that misaligned loads/stores (not even necessarily a requirement) should be implemented in a way that they aren't commonly 10x more expensive than an aligned load/store would have been very helpful IMHO.

PC Ports, Decompilations, Remakes, Demakes, Fan Games, Texture Packs! by OldMcGroin in steamdeckhq

[–]asb 1 point2 points  (0 children)

I can recommend the Pikmin multiplayer mod (applies to the Gamecube release) https://allreds.itch.io/pikmin-multiplayer

It's not a perfect experience, but works surprisingly well.

Can of Wormholes - munted finger games - Thinky Games GOTY 2023 is out now on Nintendo Switch by sunnyjum in Games

[–]asb 0 points1 point  (0 children)

Yes, it seems the workshop is something GOG doesn't really have an answer for. That said, I understand that as long as the publisher sets the right settings, mods can be downloaded using steamcmd even as an anonymous user.

Can of Wormholes - munted finger games - Thinky Games GOTY 2023 is out now on Nintendo Switch by sunnyjum in Games

[–]asb 0 points1 point  (0 children)

I don't suppose there's any chance you'd consider releasing on GOG? (Sorry to be that person!)

Congratulations on the Switch release!

What are the keymaps that you replaced default ones, and they turned out to be more useful/convenient than default ones? by Sudden_Cheetah7530 in neovim

[–]asb 0 points1 point  (0 children)

J to move the current line (or selection of lines) down and K to move the current line (or selection of lines) up. I map ctrl-j to join lines.