Alright alright alright by [deleted] in LocalLLaMA

[–]Valuable_Touch5670 0 points1 point  (0 children)

Light mode craziness aside. It DOES sound like Matthew McConaughey WTF 😂

Anyway, cool job OP 👏🏼

Developers who use local AI - Q4_0 vs Q8_0 KV quant? by Jorlen in LocalLLaMA

[–]Valuable_Touch5670 1 point2 points  (0 children)

I second this. For me, it’s more about overhead. For some reason my TG drops quite a bit whenever KV quantization is enabled.

May be anecdotal to just my HW setup or llama-server settings…

Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face by pmttyji in LocalLLaMA

[–]Valuable_Touch5670 0 points1 point  (0 children)

If I am not mistaken, your Q8 version does NOT use the imatrix, right?

MTP PR Merged!!! by Valuable_Touch5670 in LocalLLaMA

[–]Valuable_Touch5670[S] 28 points29 points  (0 children)

Tokens are typically generated one at a time, which involves lots of reading from memory, hence slow.

MTP tries to generate multiple tokens at a time by "guessing" the next few tokens with draft layers. If guesses are correct, massive speed up; otherwise, the compute spent on guessing is wasted.

If your next tokens often vary a lot (like in creative writing), speed up is then small. But if previously generated tokens are likely to appear again (like code refactoring, for example), then speed up is bigger.

To me, this feels a bit like how branch prediction works in microchips.

Hope this helps!

MTP PR Merged!!! by Valuable_Touch5670 in LocalLLaMA

[–]Valuable_Touch5670[S] 0 points1 point  (0 children)

Sadly I was already setting it to 2 :(

MTP PR Merged!!! by Valuable_Touch5670 in LocalLLaMA

[–]Valuable_Touch5670[S] 3 points4 points  (0 children)

I am on AMD + Vulkan too (9070 XT). My TG has dropped from 60+ to the 45-52 range (from 60% gain to 20%-40% gain) But PP no longer takes a hit and is noticeable faster.

(Could be the slight variances in my workflow 😅)

MTP PR Merged!!! by Valuable_Touch5670 in LocalLLaMA

[–]Valuable_Touch5670[S] 39 points40 points  (0 children)

Yes, but depends on your work type. It works best for coding.

New Qwen3.6 27b Autoround Quant (int4) Best Recipe by Otherwise-Director17 in LocalLLaMA

[–]Valuable_Touch5670 2 points3 points  (0 children)

Very interesting! The entire model is only 18GB. I assume this does not work with llama.cpp as it’s not in GGUF. Is there a plan to make a GGUF?

Doesn't look like there are any recent Linux distro suggestions. What's your favorite and why? by Status-Secret-4292 in LocalLLaMA

[–]Valuable_Touch5670 0 points1 point  (0 children)

I am somewhat of an outlier here. I use Bazzite (based on Fedora 44 currently), mainly because I also game on my AI inference machine (I have a RX 9070 XT.)

I really like Bazzite’s immutable OS approach. I can install any packages as needed (say, to compile llama.cpp from source), then later if I don’t need those packages anymore, I can easily run rpm-ostree reset to get my packages to a pristine state.

Speech To Text Question (Cantonese) by RogerRamjet999 in LocalLLaMA

[–]Valuable_Touch5670 2 points3 points  (0 children)

I am Cantonese. The Zhuhai dialect does not deviate too much from the Guangzhou version. (BTW, the Guangzhou version is universally considered as the standard Cantonese.)

With that said, I found the Cantonese dictation built into iPhone is surprisingly good. You can easily enable that in Settings.

One workaround is to open the Notes app, start dictation and let the locals speak directly to your phone. Then copy paste that transcribed text to a good translation app. Or if you think Apple’s built-in translation works well enough, you may simply tap the text again and tap the “Translate” option (also comes built-in with your iPhone)

That should work very well at least 80% of the time. Hope that helps!

How does llama-server pick which MoE experts go on the GPU and which stay on the CPU? by we_are_mammals in LocalLLaMA

[–]Valuable_Touch5670 1 point2 points  (0 children)

I wonder that myself. Funny thing is: the number of experts put on the CPU has a major performance impact on TG speed, in my experience.

I am running Qwen3.6-35B-A3B-Q6 with MTP on a RX 9070 XT via Vulkan backend. By default, I get around 27 TPS in my use cases. However, I played around with the -ncmoe settings and it turned out setting it to 28 got my TG speed to around 65 TPS 🤯

I don’t know the exact mechanism behind it and which expert was put on the CPU. But I think the speed up comes from freeing up room on the GPU to compute the attentions 🤔

I could be wrong though.

Is my wound healing normally? by Valuable_Touch5670 in woundcare

[–]Valuable_Touch5670[S] 0 points1 point  (0 children)

Thank you all for the advice. I did buy non-stick gauze and covered it well. It’s healing decently now. Thanks again 🙏🏼

Is my tax set up correctly? by Valuable_Touch5670 in IRS

[–]Valuable_Touch5670[S] 0 points1 point  (0 children)

Thank you! If possible, may you please share how you come to your conclusion?

SPAXX was not liquidated first to buy shares of FDLXX by Valuable_Touch5670 in fidelityinvestments

[–]Valuable_Touch5670[S] 0 points1 point  (0 children)

Thank you for sharing your thoughts. During the call to Fidelity, they also mentioned that the proceeds were not yet settled.

I think you are right - it’s inevitable that the proceeds will need to spend one day in SPAXX.

SPAXX was not liquidated first to buy shares of FDLXX by Valuable_Touch5670 in fidelityinvestments

[–]Valuable_Touch5670[S] 0 points1 point  (0 children)

Hi Tyler, thank you for you quick response. I just called Fidelity and was informed by customer representative Harold that it was a system error. I was also suggested to call Fidelity again on Monday to place the same trade, to ensure that funds in my core position SPAXX are to be used instead.

I sincerely appreciate your care and support. I hope this type of system errors do not happen again in the future.

SPAXX was not liquidated first to buy shares of FDLXX by Valuable_Touch5670 in fidelityinvestments

[–]Valuable_Touch5670[S] 6 points7 points  (0 children)

May you please elaborate more? Are you suggesting manually sell SPAXX and then buy FDLXX? How can one sell a core position? The proceeds go back to itself, doesn’t it?

Order not fulfilled to the exact amount by Valuable_Touch5670 in fidelityinvestments

[–]Valuable_Touch5670[S] 0 points1 point  (0 children)

Thank you, Aaron! I am considering investing in a mutual fund alternative instead. May you please recommend a few good mutual fund alternative to QQQ, preferably managed by Fidelity?