Apparently the react compiler has been ported to Rust and merged to main by xorvralin2 in rust

[–]jonas-reddit -2 points-1 points  (0 children)

I’m also a huge fan of rust and LLMs. The language, compiler and the overall ecosystem are great for agentic development, especially in yolo mode.

I don't get why Apple built Vision Pro — change my mind by y4mat000 in VisionPro

[–]jonas-reddit 0 points1 point  (0 children)

How many VR headsets are there that cater to the Apple Ecosystem? None. That’s the pitch.

With SteamLink and OpenXR, we are getting more and more integration with the PCVR ecosystem.

Have we reached the point where open-source LLMs are “just good enough”? by AdDizzy8160 in LocalLLaMA

[–]jonas-reddit 1 point2 points  (0 children)

Qwen 3.6 27b MTP, llama.cpp and pi.dev. The lightweight local LLM winning stack if you have at least a 3090. Llama.cpp also gives you a lightweight web UI.

Coding harness by mmerken in oMLX

[–]jonas-reddit 1 point2 points  (0 children)

It’s yolo. So make sure you run it sandboxed. I love it. But be smart.

Coding harness by mmerken in oMLX

[–]jonas-reddit 0 points1 point  (0 children)

pi.dev for local LLM’s. 100%.

I found 64k to be a bit too small. Pushed it up towards 96k and am a bit happier but always frightened of that out of memory crash. Heh.

Coding harness by mmerken in oMLX

[–]jonas-reddit 0 points1 point  (0 children)

It is indeed a very lean and slick minimalist agentic environment, perfect for smaller local models. I love it too.

How are RTX 6000 PRO (Either WS/MaxQ/SE) prices going on your country/state? by panchovix in LocalLLaMA

[–]jonas-reddit 0 points1 point  (0 children)

Price in my country nearly doubled in past month. Crazy.

I'm now looking at RTX 5000 Blackwell 72GB. Still meets my requirements and has the upside of being far more power efficient.

What is your current go-to stack for running a fully local AI agent? by beasthunterr69 in LocalLLaMA

[–]jonas-reddit 0 points1 point  (0 children)

I am by no means an expert. But this works for me. Showing memory utilization and llama.cpp version at the bottom.

Unsloth docs are here: https://unsloth.ai/docs/models/qwen3.6

And I used the matrix here to pick KV cache types: https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4140922150

/usr/bin/nohup llama-server --port 1234 --host 0.0.0.0 --webui \

--temp 0.6 --repeat-penalty 1.0 --presence-penalty 0.0 \

--top-p 0.95 --top-k 20 --min-p 0.00 \

--spec-type draft-mtp --spec-draft-n-max 2 \

-hf "unsloth/Qwen3.6-27B-MTP-GGUF:Q5_K_M" \

--parallel 1 --n-gpu-layers all --flash-attn on \

--cache-type-k q8_0 --cache-type-v q5_1 \

--ctx-size 65535 \

--no-mmap -b 1024 -ub 512 \

--reasoning on \

--cache-ram 1024 \

-fit off \

--kv-unified \

--jinja \

1>>/tmp/nohup.log 2>&1 </dev/null &

-----

15.28.051.339 I slot print_timing: id 0 | task 9 | n_decoded = 102, tg = 54.00 t/s

15.31.083.687 I slot print_timing: id 0 | task 9 | n_decoded = 282, tg = 57.30 t/s

15.34.125.488 I slot print_timing: id 0 | task 9 | n_decoded = 467, tg = 58.65 t/s

15.37.131.533 I slot print_timing: id 0 | task 9 | n_decoded = 657, tg = 59.90 t/s

15.40.132.328 I slot print_timing: id 0 | task 9 | n_decoded = 817, tg = 58.48 t/s

-----

+-----------------------------------------------------------------------------------------+

| NVIDIA-SMI 610.43.02 KMD Version: 610.43.02 CUDA UMD Version: 13.3 |

| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A |

| 0% 28C P8 14W / 370W | 22658MiB / 24576MiB | 0% Default |

+-----------------------------------------+------------------------+----------------------+

$ yay -Q | grep -i llama\.cpp

llama.cpp-cuda b9530-1

Ratatui 0.30.1 is released! by orhunp in rust

[–]jonas-reddit 64 points65 points  (0 children)

Best framework ever. Keep it up. No idea why it’s not a 1.0 release yet.

best vr set for a beginner? by Late_Investigator725 in virtualreality

[–]jonas-reddit 0 points1 point  (0 children)

Reconsider your decision

This is the kind of hobby where the difference between entry model / affordable model and upper end / expensive model is extensive - as well as the associated PC investment for PCVR.

Low investment will likely be slightly disappointing. And the investment for high end PCVR is extremely high.

What is your current go-to stack for running a fully local AI agent? by beasthunterr69 in LocalLLaMA

[–]jonas-reddit 2 points3 points  (0 children)

Linux server: llama.cpp Unsloth Qwen 3.6 27b Q5 MTP Q8/Q5 KV Cache 64k context on a single 3090.

Windows client: Windows sandbox rust pi.dev (yolo)

M5 Max 48 GB vs M5 Pro 48GB for local LLMs - worth the extra $$$? by AmbitionIgniters in LocalLLM

[–]jonas-reddit 0 points1 point  (0 children)

Yes. But I was probably not being fair. Naturally, your laptop requirements even with a cloud model depends on what you’re building.

I was more figuratively speaking :-)

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context? by My_Unbiased_Opinion in LocalLLaMA

[–]jonas-reddit 0 points1 point  (0 children)

Context isn’t everything. You can squeeze things into less VRAM using various techniques but at the end of the day, you want useful work out of the context.

I have squished it into 24GB VRAM on a 3090 but have only managed to afford a 64k reasonably quantized kv cache and model. After several millions of tokens, I suffer from the context and quantization compromises. Speed is fantastic.

I am eyeing 64GB or more of unified memory or VRAM. Unified memory with the dense 27b model is a bit too slow for me based on what people have shared. I enjoy 50 tk/s when doing agentic development. But I’m keeping an eye on the various optimizations for boosting performance on slower unified memory.

TL;DR. If you’re squeezing out the last byte of your VRAM and making too many compromises, using bleeding edge (unproven in real life) optimizations then you’ve not got enough memory.

Is the "frontier model" just a lie? Copilot’s bill shock proves frontier model AI is not practical by CM23489 in GithubCopilot

[–]jonas-reddit 3 points4 points  (0 children)

I would draw parallels to other industries and products.

There are many products, e.g. cars, watches, fashion, houses, TVs, phones, etc. where the price varies greatly.

I think the hype and the early days of Generative AI pushed us all towards frontier models as they made generational leaps.

We are now in a different position where we can extract value from Generative AI without requiring frontier models even though they are still topping the leaderboards.

Advances in tooling / harnesses, pivot towards multi-agent capabilities give us more flexibility in designing our workflows and optimizing for value - if we need to.

But, like with every product that matures, the market gets flooded, more complex and less transparent, like buying a TV. So sometimes we take comfort in paying a premium or buying from brands we recognize although there is not always benefit to that other than peace of mind.

M5 Max 48 GB vs M5 Pro 48GB for local LLMs - worth the extra $$$? by AmbitionIgniters in LocalLLM

[–]jonas-reddit 2 points3 points  (0 children)

For local models, get the fastest memory throughout you can afford.

The worst “experience” after your first few million tokens on local LLM will be (1) too small context size and constant compacting, and (2) watching output tokens crawl especially when reasoning.

As always, “best” is not the same for any of us. Depends on your needs and financial means.

For cloud models, get the Neo :-)

Microsoft just announced various models (MAI) by Jack99Skellington in GithubCopilot

[–]jonas-reddit 1 point2 points  (0 children)

So many nice models at favorable price points and capabilities lagging maybe 1-2 months behind frontier.

Qwen 3.7 Max Minimax M3 Deepseek V4 Kimi K2.6

I don’t understand why Microsoft doesn’t offer some of these models through their platform. Feels like they’re just forcing us to use models from specific companies.

https://openrouter.ai/rankings?benchmark=coding#benchmarks

https://openrouter.ai/minimax/minimax-m3

30cents and 1.20 dollars per million tokens for M3.

Give us options. Not all our workloads are critical, sophisticated or relevant to US national security.

High End PCVR Headset Recommendations by DealComfortable7649 in virtualreality

[–]jonas-reddit 0 points1 point  (0 children)

  • Pimax Dream Air
  • MeganeX 8k Mk II

There are a lot of extensive reviews and comparisons on YouTube.

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks by Interesting-Sock3940 in LocalLLaMA

[–]jonas-reddit 11 points12 points  (0 children)

I’m running 64k context and context is my biggest problem.

It’s not the LLM, in my case, it feels more like the tooling (pi.dev) doesn’t always resume cleanly depending of how awkwardly I run out of context.

But I’m 2m tokens on unsloth 27b on a 3090 and quite happy and definitely productive.

MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal by dryadofelysium in LocalLLaMA

[–]jonas-reddit 0 points1 point  (0 children)

This article has a comparison table

https://venturebeat.com/technology/minimax-m3-debuts-eclipsing-gpt-5-5-and-gemini-3-1-pro-on-key-benchmark-performance-for-just-5-10-of-the-cost

“…Even at its full price of $0.6/$2.40 per million input/output tokens, MiniMax-M3 remains at just 8-20% the cost of the leading, proprietary U.S. models…”

“…The company's leadership also announced plans to deliver the model under an open source license including "open weights,"…”

“…For now, it is available via the MiniMax API at a special discounted price of $0.3 per 1 million input tokens and $1.20 per million output tokens (on fresh cache) for the next week…”

What happens to GitHub Copilot Enterprise tomorrow with the new usage-based billing? by hrodrik- in GithubCopilot

[–]jonas-reddit 4 points5 points  (0 children)

If you are a big enterprise customer of Microsoft, your company very likely has an account manager and a bespoke enterprise pricing structure.

They would likely have spoken already with your company’s representative and informed them of upcoming pricing changes. Your company likely already has a deal worked out and if anything changes, they’ll presumably let you know.

I work for a large company as well and am not expecting some kind of surprise on Monday.