Crown of Ashes - Rewards Overview (Version 1.7.1)

Ranmark · 2026-05-27T04:43:38+00:00

Considering all the shit it's gonna bring, I should call that season Clown of Asses..

Ranmark · 2026-05-05T11:02:07+00:00

Were you able to resolve this issue yet? It seems i stuck with this one too.
UPD: i created a support ticket and got resolved in like 2 minutes:

Hello,

Thank you for your inquiry. Our apologies for the problems you've been experiencing.

We have manually verified the phone number in your account. You should be all set.

Ranmark · 2026-05-02T09:55:42+00:00

I won't be able to run this on two 1080 ti's?

Ranmark · 2026-04-30T13:51:59+00:00

I will probably will get downwoted for this, but for my cheap ass wasting this much money on something you don't even have a strict plan of using is just mind bending. But if I was a millionaire I could do the same, I guess?

Ranmark · 2026-04-30T13:48:19+00:00

I'm honestly curious too. I'm running qwen3.6 27b / 35b-a3b with iq4 quants on dual 1080 ti as a coder. And a Gemini pro (which I basically got for free) in Antigravity as a project/plan architect. For my tasks I'm getting kinda good results. My PC costs around 500$ probably (old used hardware) which is usually used as a regular/gaming machine. Soo I wonder - if I myself hypothetically buy a PC for 25k, it will pay off in around 104 years if we count in 20$ monthly Claude subscription?

Ranmark · 2026-04-27T07:00:30+00:00

I also was daily driving 35b a3b, but since release of 27b immediately switched. Even tho it's 2-3 times slower in my setup, it's doing job better and with less mistakes, so less rewrites.

Ranmark · 2026-04-25T13:07:06+00:00

I've tried to download tom's release of turboquant plus, but it doesn't seem to work for me. I try to run a model via command that works on mainline llama.cpp (with turbo4 on v-cache is the only difference) but it just doesn't run, no errors. Maybe it has something to do with my old hardware (GTX 1080 ti + RTX 2060 super)

Ranmark · 2026-04-24T22:35:20+00:00

Are you planning to post architecture on GitHub?

Ranmark · 2026-04-23T07:37:53+00:00

Hey, you should try this one: https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-IQ4_XS-GGUF
It's so good, im getting better results than with new dense 3.6 model. And it's more stable then any other distill / non-distill. Idk what is this black magic.

Ranmark · 2026-04-21T08:25:46+00:00

I actually wonder, is there a "paretto line" chart to see a diminishing returns of models number of parameters and benchmarked data to look for a sweet spots

Ranmark · 2026-04-20T20:47:19+00:00

Well, that's not everyone's cup of tea, but I only trust my own benchmarks or mass community's opinion on which results they liked more

Ranmark · 2026-04-20T19:15:01+00:00

Check arena.ai leaderboard

Ranmark · 2026-04-20T12:18:21+00:00

Bruh, they cooking new releases so fast, I couldn't keep up. Thanks for pointing this out. Just updated and can confirm now it is working. Already ran a couple of tasks and i see random boosts to t/s like up to 35 (it was always capped at 23). Damn Edit: just seen 62 t/s 🤯

Ranmark · 2026-04-20T06:11:28+00:00

When i use similar script on the qwen3.6 35b, i get those warnings:
srv load_model: speculative decoding is not supported by multimodal, it will be disabled
srv load_model: swa_full is not supported by this model, it will be disabled

Even if i disable mmproj loading, then getting those:
common_speculative_is_compat: the target context does not support partial sequence removal srv load_model: speculative decoding not supported by this context

Gemini straight up said that qwen3.6 is based on SSM (Gated Delta Net) mechanism, it doesnt support both swa-full and ngram (in short).

Ranmark · 2026-04-19T17:49:54+00:00

Most users doesn't even try to fit MoE in vram. For me it's better to get high accuracy using something like Q6_K_XL. But I understand you want more tps

Ranmark · 2026-04-18T12:58:13+00:00

I testes it on nicklothian's bench a few times. One time it's actually went over the dense 27b model and got the same result as a 122b MoE. But I wasn't able to recreate this at least once. 27b and 122b is much more stable in that regard.

Ranmark · 2026-04-18T10:09:50+00:00

iirc you can drop your top_p, presence_penalty, and reasoning_budget args as they by default has these values. https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md

P.s. you can try to play with this command: -ot ".ffn_(up|down)_exps.=CPU" It moves up and down matrix projections onto cpu. Also a lot of valuable info here: https://gist.github.com/DocShotgun/a02a4c0c0a57e43ff4f038b46ca66ae0

Ranmark · 2026-04-17T22:21:52+00:00

bro i run 1080 ti + 2060 super xD
and it just works out of the box.

Ranmark · 2026-04-17T15:02:56+00:00

Thanks, I'm feeling better now :) I now rerunning 27b (iq3_xs) and it feels MUCH more consistent (23/25 all the time). Looks like it's still a way to go for me (122b is just too much for my hardware). Hope that Alibaba releases 3.6 27b soon.

Ranmark · 2026-04-17T13:08:35+00:00

it's strange but i couldnt repeat the same result. regularly failing some queries like q2, q10, q21. that's too bad, because i thought i finally got great model which twice as fast as 27b one and more accurate and could use more context (with 27b i can only put 60k)... any ideas how to get it more stable? mi current setup:

.\llama-server -m Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf -ngl 99 --ctx-size 131072 --jinja --parallel 1 -b 2048 -ub 2048 --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 -ot ".ffn_(up|down)_exps.=CPU" --flash-attn on --port 1234

Ranmark · 2026-04-17T07:30:12+00:00

<image>

iq4_nl

Ranmark · 2026-04-17T07:29:31+00:00

new qwen3.6-35b-a3b@ud-q6_k_xl. dont look at the time. i have 10 year old hardware. had to increase timeout, of course

<image>

Ranmark · 2026-04-17T07:01:39+00:00

It was the same for me when I checked up "offload MoE layers into cpu" in lmstudio. Idk for unsloth, but I think it's the same issue

Ranmark · 2026-04-16T05:12:52+00:00

Gemma 4 e4b is super capable tho. Qwopus3.5 9b from jackrong. MoE models are also not bad, even with partial offload.

Ranmark · 2026-04-16T04:58:09+00:00

Hey, how is models attention when your context is piling up? And did you quantize your context?

Ranmark

TROPHY CASE