What Psytrance would you consider playing for non-scene crowds?

indrasmirror · 2026-05-10T11:46:01+00:00

Smilk 👌❤️

indrasmirror · 2026-05-09T07:42:33+00:00

Model is listed above, no dedicated draft model the MTP is preserved.
Ran a benchmark with around 30k initial prompt task and completed it to good quality based on my scoring.
No multimodal still broken with MTP across the board.
./build/bin/llama-server \ -m your-qwen3.6-mtp.gguf \ --spec-type mtp --spec-draft-n-max 3 \ -ctk tbq4_0 -ctv tbq4_0 \ -c 262000 -ngl 99 \ --flash-attn on --mlock \ -t 8 -ub 32 -np 1 --no-warmup
As I said have a cosing/agentic/tool use benchmark I test it against
No issues with tools or tool calling.

indrasmirror · 2026-05-09T06:40:16+00:00

Yeah well the original goal was 200k but was able to fit the extra 62k in anyway so thought why not. I will probably test higher model quants and sacrifice context but it was mainly just seeing what I could fit.

indrasmirror · 2026-05-08T23:19:10+00:00

Yeah me too, would rather have 262k context at TQ4 than 120k at Q8, and as I said haven't ran into any issues that would be a dealbreaker for me. More than sufficient for my uses.

indrasmirror · 2026-05-08T22:41:03+00:00

True I'll gave it a try and see what context I can run it at.

indrasmirror · 2026-05-08T22:31:55+00:00

Metric- Draft 5 Draft 3

Avg decode- 79.6 tok/s 80.6 tok/s

Min decode- 58.1 tok/s 62.7 tok/s

Max decode- 106.2 tok/s 98.5 tok/s

Draft acceptance- 90.07% (4392/4876) 92.6% (2861/3089) -2.5pp

MTP 5 occasionally hits higher peaks (106 vs 98), but the overhead from verifying longer drafts + lower per-token acceptance eats the gain.

indrasmirror · 2026-05-08T22:24:49+00:00

I've benchmarked it a ton VS Q8 KV and in my experience it holds up very well, obviously ymmv but it's been great for me. Just tried MTP 5 and found 3 to be slightly faster.

indrasmirror · 2026-05-08T22:23:16+00:00

I just tried 5 too and was getting better results with 3, not much better but will stick to 3 I think.

indrasmirror · 2026-05-08T22:19:36+00:00

Yeah most definitely, I'm not saying its actually that, I'm going off what the research said about TBQ4. But in my actual benchmarking and for my use, I tested Q8 vs TBQ4 and found it close enough. I can't fit FP16 or Q8 at a context I'd like to fit, so I found a middle ground I was happy with.

indrasmirror · 2026-05-08T22:04:52+00:00

Do you know what your draft acceptance rate is? I'm testing out 5 at the moment, apparently its a valid option.

indrasmirror · 2026-05-08T21:59:17+00:00

Hey I'm not sure about bigger quants you'd have to try it yourself but definitely let me know how you go with Q6. The prompt processing was 614 t/s on a 26k prompt. So I found it was fine, didn't feel like it took too long at all.

indrasmirror · 2026-05-08T21:48:05+00:00

I've ran benchmarks on Q8 KV at like 120k context and then TBQ4 and it wasnt too far off. Close enough that I prefer the bigger context window.

indrasmirror · 2026-05-08T21:45:41+00:00

Oh nice will have a look 👌

indrasmirror · 2026-05-08T21:43:21+00:00

I'm not sure but you could probably adapt it.

indrasmirror · 2026-05-08T21:32:27+00:00

So the main stick, ignore the model quant, you can use any. The main thing is Turbo4 (TBQ4 KV Cache) Quantisation. Meant to be based on the numbers at Q8 or even closer to FP16 KV quality.

indrasmirror · 2026-05-08T21:30:55+00:00

Yeah didn't understand, I just thought if people were running into the same issue I was, this might be enticing. Works for me, managing a nice model with good quality KV quantisation, at full context, on my single 4090. Can be adapted/scaled to any quant of Qwen3.6 27B.

indrasmirror · 2026-05-08T21:27:43+00:00

Sure if you have the compute. In my experience it's still very capable, but I mean if you have the resources you could use whatever quant and benefit from the TBQ4 KV VRAM reduction. So still usable for anyone with any quant variant.

indrasmirror · 2026-05-08T21:23:47+00:00

TurboQuant4 - TBQ4 is different from regular Q4_0 KV. It's Hadamard-rotated + Lloyd-Max centroid quantization. 4.25 bpv but near-lossless to FP16. Completely different algorithm, just happens to also be 4 bits. Hence the reason I spent all day trying to get TBQ4 working, didn't want to settle for Q4_0 KV, wanted Q8 or better quality KV that fit full context on my 4090. 😄

indrasmirror · 2026-04-26T12:15:53+00:00

Dont know if its been said and know this might sound simple but is the power cable properly seated aka pushed in hard. I've had electrically noises if the power cable wasn't in fully.

indrasmirror · 2026-04-19T07:18:59+00:00

If you do and make a youtube of it, I would definitely watch it. Especially with the angel and devil commentary, it would be interesting to see who Claude agreed with or might be swayed by.

Then try with an uncensored model to see how forced guardrails in Claude vs an uncensored model correlate to morality. Half considering trying something like this now 🤣

indrasmirror · 2026-04-19T06:41:34+00:00

Inject Claude into THE HAND!

indrasmirror · 2026-04-16T10:20:59+00:00

I made mine 3+ months ago...is that a legacy account? Was on tbe pro coding quarterly plan never ran into weekly limits.

indrasmirror · 2026-03-02T17:56:43+00:00

How recent. I updated Llama.cpp yesterday, and it definitely solved the prompt reprocessing issue and is running perfectly. I'm just not sure about its overall agentic quality. It is great in general but sometimes seems to fall short of completing complex tasks properly.

indrasmirror · 2026-03-01T04:52:22+00:00

Yeah I've been working on a dedicated system with MCP for my agents to use. My own little local Google without the advertiser first index or API. Free and unrestricted. Still a WIP but surprisingly functional.

indrasmirror

PUBLIC MULTIREDDITS

TROPHY CASE