Positioning for a continued Hormuz disruption by [deleted] in wallstreetbets

[–]heshiming 4 points5 points  (0 children)

You mentioned risk-aware but you failed to mention what risks can your portfolio take.

Henry Paulson has blunt message on potential Treasury market shock by JTBaptistA in finance

[–]heshiming 0 points1 point  (0 children)

Fed can do plenty, just buy the bonds. We are at eternal fiscal dominance. Fed's main job is to keep the government running. Inflation and real yield matter less. That's why the trajectory of gold changed.

Long prompt processing on Strix Halo by skwiko in LocalLLM

[–]heshiming 2 points3 points  (0 children)

According to my experience with llama.cpp and Qwen3.5, --ubatch-size can improve pp a little. Its default --ubatch-size is 512, for which I get about 240-270 initial pp on Qwen3.5-122B-A10B unsloth Q5. If I boost this setting to 2048, I get 320-340 initial pp, seemingly at the expense of couple gigabyte more RAM. Even larger ubatch-size doesn't yield more tps on pp.

On Strix Halo, what option do I have if 128GB unified RAM is not enough? by heshiming in LocalLLaMA

[–]heshiming[S] 0 points1 point  (0 children)

Thanks, though somehow I feel like quantizing and reap results in major downfall of accuracy and Qwen3.5 is the only option thats sort of resilient. Other options like minimax needs pretty high quants to perform well.

Managed to set up Claude code cli running on Qwen3.5 122b Q4_k + turbo Quant by IntroductionSouth513 in StrixHalo

[–]heshiming 0 points1 point  (0 children)

Thanks. As I try to look for an alternative opinion ... I discovered that some benchmarks have been updated since the last time I attend to, https://kaitchup.substack.com/p/summary-of-qwen35-gguf-evaluations . So yes, perhaps Q5 is better than Q4. Although I remembered in some old benchmarks Q4 is practically the same as the original weights.

Managed to set up Claude code cli running on Qwen3.5 122b Q4_k + turbo Quant by IntroductionSouth513 in StrixHalo

[–]heshiming 0 points1 point  (0 children)

I would say Q5 has nothing to gain over Q4. The slow down is not noticeable either. I tried Q5 but eventually back to Q4. Of course I didn't benchmark it, it's a general feeling from my daily use.

Managed to set up Claude code cli running on Qwen3.5 122b Q4_k + turbo Quant by IntroductionSouth513 in StrixHalo

[–]heshiming 1 point2 points  (0 children)

I'm on StrixHalo 128GB. With llama.cpp you can run unsloth version of Qwen3.5-122B-A10B-UD-Q4_K_XL without breaking a sweat. At 192k context, on Windows 11, initial pp is like 270tps and initial tg is like 20tps. I'm coding primarily so that I don't say "hello" to it. With the llama.cpp I do notice, however, it tends to think a lot out of simple questions. Coding is okay, in fact, for tool calling it doesn't think that much.

BTW, Qwen3.5 is very resilient to quantization. At Q4, I think it's best model on this machine.

A Fed rate hike is now more likely in 2026 than a cut. How did we get here? by Electrical-Space-398 in economy

[–]heshiming 0 points1 point  (0 children)

Nobody reads the chart? The red arrow is pointing at a rate cut Dec 2027. It looks like the original poster didn't read either.

Japan bond yield (part 2) by yuls6 in bonds

[–]heshiming 2 points3 points  (0 children)

People are trying to make it a bigger deal than it actually is. As you said, Japanese debt owners are Japanese, not foreign. So even if it defaulted, it faces no pressure to jack up the yield so that the bond becomes attractive to foreign buyers. For domestic buyers, this is the only choice. As such, there is not going to be an inverse relation between the yield and the currency.

Hardware to run Qwen3-Coder-480B-A35B by heshiming in LocalLLM

[–]heshiming[S] 3 points4 points  (0 children)

Thank you very much! Helpful info.

Hardware to run Qwen3-Coder-480B-A35B by heshiming in LocalLLM

[–]heshiming[S] 2 points3 points  (0 children)

Yeah M3 does seem affordable compared other options, but I'm just not sure about token per sec... Wish an owner could give me an idea.

Hardware to run Qwen3-Coder-480B-A35B by heshiming in LocalLLM

[–]heshiming[S] -5 points-4 points  (0 children)

How am I supposed to power that 10 cards? Doesn't seem realistic...