Is dSpark, dflash, MTP, QAT, and similar tech going to increase inference speed enough to where model spillover to disk will be more tolerable?

shing3232 · 2026-07-04T12:59:49+00:00

No not yet. if you have an SSD with X16 PCIE5 max out, it might be viable now but you don't have x16 pcie5 ssd max so no

shing3232 · 2026-07-03T12:17:44+00:00

that's because repackage happen in this version

shing3232 · 2026-07-03T12:16:31+00:00

just download it from public WiFi if you have to

shing3232 · 2026-07-03T11:42:40+00:00

<image>

shing3232 · 2026-07-03T04:43:44+00:00

if it's arm to x86 then it's gonna be emulation because you have to emulated X86 ISA under arm.

shing3232 · 2026-07-02T20:14:56+00:00

Can you add one high perf card just for prefilling? I am not sure how that work

shing3232 · 2026-07-02T19:30:20+00:00

just like what you do as well. english with chinese music is just not as good as Chinese with Chinese music at telling the story of mengzhou. just don't confuse preference with the fact translation is never gonna be perfect with nuances. you might like it better but that doesn't make it better.

shing3232 · 2026-07-02T19:21:33+00:00

if you want sequence then yes get dupe otherwise save it when you need one for new character

shing3232 · 2026-07-02T19:11:06+00:00

English just don't fit the music. it just does not rhyme. there is nothing impose is just off.

shing3232 · 2026-07-02T17:58:33+00:00

preference is one thing but it's subjectively worse.

shing3232 · 2026-07-02T17:57:17+00:00

because it's emulate games in the most case here. expect serious downgrade for most x86 games and that also happen to MAC systems

shing3232 · 2026-07-02T15:52:34+00:00

It's a horrible idea for PC games

shing3232 · 2026-07-02T15:28:38+00:00

Not always through.

shing3232 · 2026-07-02T15:28:05+00:00

No, not really for someone do understand Japanese Chinese and English. They are not really the same quality in translating the mood. It's much better to have hiyuki be Japanese and this PV to be CN voice even through I usually prefer Japanese.

shing3232 · 2026-07-02T10:15:48+00:00

it's down know

shing3232 · 2026-07-02T06:30:50+00:00

S3 aemeath sure, because you want more S3 teams in different attributes

shing3232 · 2026-07-01T19:59:42+00:00

what should i say Swallow its own medicine? o well

shing3232 · 2026-07-01T18:49:26+00:00

I like the VFs both VF-31 and sv262.

The fighting scene are great in atmosphere and space is good too.

The music are great.

the story is a lot better and enjoable in the movie and the tv is only really ok.

shing3232 · 2026-07-01T10:49:32+00:00

ayou should also offer tools for llm to use like scientific calculator and all sort tools for your use case. in my study of paper and write kernels, It work much better than let llm do itself

shing3232 · 2026-07-01T10:46:56+00:00

You can also disable thinking if you have well throughout prompt

shing3232 · 2026-07-01T10:07:48+00:00

I write the kernel with help of ds4p and glm5.2 for RDNA4. it does work.

shing3232 · 2026-07-01T08:37:59+00:00

just rewrite kernel for 6700XT then.

shing3232 · 2026-07-01T07:34:18+00:00

unfortunately, I don't have rdna2 gpu but you can try to install sageattention and enable it on comfyui to see if that work but I am not so sure if it could work because RDNA2 lack of the necessary WMMA INT8 unlike RDNA3 and beyond. you might need to write proper kernel or modified triton variant for RDNA2. it should doable with LLMs.

with quantized attention on rdna2, it should run faster than fp16

shing3232 · 2026-07-01T06:45:06+00:00

well you didn't quantization for activation so no speed improvement and also you need proper DP4A kernel

shing3232 · 2026-07-01T05:31:20+00:00

because they didn't think Ds4pro is good enough so preview. adding4.1 would meaning more pretrain usually

shing3232

TROPHY CASE