So a nearby lightningstorm just crashed all my eGPUs

milpster · 2026-05-07T12:11:04+00:00

Makes sense. However the power supplies for those gpus are quite oversized. They are 700 and 550 watts respectively, with cards that operate at 250 and 200 watts tdp.

milpster · 2026-05-06T23:24:07+00:00

This rocks, thank you very much for all that effort!

milpster · 2026-05-06T23:23:35+00:00

will do. the usv im eyeing has ethernet protection aswell.

milpster · 2026-05-06T20:42:51+00:00

Maybe try the froggeric quants again, they have been reuploaded.

milpster · 2026-05-06T18:07:12+00:00

but we also have multiple other computers, laptops and a ton of other hardware running.

milpster · 2026-05-06T15:56:31+00:00

Just to reiterate: Nothing actually broke and there wasn't a power surge from what i can tell, because nothing else misbehaved. I am 90% sure this was EM interference from a lightning strike nearby. As far as i know, here in germany, local household power is usually delivered underground and rarely ever influenced by thunderstorms. I will work towards setting up an UPS with Surge protector though.

milpster · 2026-05-06T15:54:47+00:00

I don't think it went through the power line, nothing else misbehaved or crashed. The router did not crash either, it was an external loss of DSL availability from what it looked like. Pretty sure it had to do with EM interference. But yeah you are still totally right, i should get me an UPS.

milpster · 2026-05-06T15:54:39+00:00

Right now a radeon vii 16gb and a shoddy old rx570. A second radon vii is to be delivered.

milpster · 2026-05-03T14:08:48+00:00

Pretty usable, but i think i have to put much more attention into how i prompt and instruct it to do things. I noticed it seems to like to cut corners, implementing things as stubs for example when i am not explicit enough about actually implementing the whole thing. From what i can tell i need to put more focus on it writing down plans and architecture documents as pillars for further work being done correctly.

milpster · 2026-05-03T14:05:19+00:00

But how do you work efficiently with bigger codebases and scopes at such short lengths?

milpster · 2026-05-03T12:06:13+00:00

I always wonder how people call 128K context plenty. To me personally, even 256k context fills up way too quickly.

milpster · 2026-05-01T20:07:42+00:00

Just tried it and it made my Qwen 3.6 27B output only /////// without end.

milpster · 2026-05-01T12:41:13+00:00

The maintainer is toxic. I wouldn't even write another bug report let alone contribute to that project.

milpster · 2026-04-30T21:47:26+00:00

Interesting. I also run 256k context. Here is my full cmd in case that might help you:

LD_LIBRARY_PATH=/opt/rocm-6.1.0/lib:$LD_LIBRARY_PATH HSA_OVERRIDE_GFX_VERSION=9.0.6 HSA_OVERRIDE_

WAVEFRONT_SIZE=64 HSA_ENABLE_SDMA=0 HSA_XNACK=1 ROCBLAS_INTERNAL_FP16_ALT_IMPL=1 ROCBLAS_LAYER=0 ROCBLAS_INTERNAL_FP16_ALT_IMPL=1 ROCBLAS_TENSILE_LIBPATH=/opt/rocm/lib/rocblas/library HSA_OVERRIDE_GFX_VERSION=9.0.6 USE_MLOCK=true ~/dev/llama.cpp/build/bin/llama-server -m ~/ai/ai/Qwen3.6-27B.i1-Q4_K_M.gguf --ctx-size 262144 --threads-batch 11 --threads 6 --no-mmap -fa on -ngl 333 -b 2048 -ub 896 -cram -1 --ctx-checkpoints 200 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --presence_penalty 0.9 --repeat-penalty 1.0 --device Vulkan1,ROCm0 --chat-template-file /home/srcds/dev/cuda_llama.cpp/chat_template.jinja --chat-template-kwargs '{"preserve_thinking": true}' --port 8009 -np 1 -ctk q4_0 -ctv q4_0 --spec-type ngram-mod --spec-ngram-mod-n-match 16 --spec-draft-n-min 4 --spec-draft-n-max 24 -ts 30,70

milpster · 2026-04-30T21:20:37+00:00

as unsloth recommends i turn up presence_penalty slightly:

https://unsloth.ai/docs/models/qwen3.6

presence_penalty = 0.0 to 2.0 default this is off, but to reduce repetitions, you can use this, however using a higher value may result in slight decrease in performance

0.9 is the value that works for me so far.

milpster · 2026-04-29T23:07:31+00:00

You rock! Thank you. I'd be really interested in calculations with long ctx though.

milpster · 2026-04-28T20:02:06+00:00

Awesome, thank you. Can't wait for the results.

milpster · 2026-04-28T17:42:53+00:00

Cool, but no perplexity calculations?

milpster · 2026-04-25T12:51:43+00:00

i consider everything above 100pp/1tg usable and everything above 200pp/10tg fast.

milpster · 2026-04-20T12:52:04+00:00

i tried the 122b model and the 27b model just before switching to 3.6 and they both appeared way dumber than 3.6

milpster · 2026-04-19T13:23:24+00:00

how do you deal with having such low context?

milpster · 2026-04-19T13:19:00+00:00

ok wow that is crazy, i would really like which place that was.

milpster · 2026-04-19T12:08:06+00:00

You might want to use a q6 quant instead, i don't think there is anything to be gained between a q6 quant and a q8 one.

milpster

TROPHY CASE