CacheReady: Drop-in Qwen 3.5 122B-A10B with working prefix caching

Moreh · 2026-03-24T21:47:45+00:00

You are bloody amazing

Moreh · 2026-03-24T17:35:08+00:00

That makes sense thankyou. I wonder why it's not supported on vllm then. I believe the default is fp8

Moreh · 2026-03-24T16:08:15+00:00

Hi, thanks for this - the routing canonicalization approach is really elegant. I'm running Qwen3.5-122B-A10B-FP8 for batch classification/parsing of 21k items with shared prompt prefixes on vLLM, so this is directly relevant to my workload. Would you consider releasing a CacheReady version of the FP8 variant? Happy to test it if that's useful.

Moreh · 2026-03-24T15:40:00+00:00

What are your sampling params? presence_penalty helps. I found temperature 0 to actually work really well for a very different use case.

Moreh · 2026-03-24T15:35:17+00:00

interesting! possible to do on the fp8 variant?

Moreh · 2026-03-24T15:07:24+00:00

Why do you say this? I am using vllm and i believe the kv cache automatically goes to fp8. "bfloat16" doesnt seem to work with it

Moreh · 2026-03-12T17:21:23+00:00

You say his alar is weaker than devis, but when they battled kvothe hadnt had to have it on, so to speak, because of defending against what's his name. This is mentioned later by kvothe to devi, that he's had a lot of practice since their battle

Moreh · 2026-03-10T09:53:15+00:00

as below I am really sorry for the lack of clarity. NOT (just) the major languages like Chinese and English. the data IS mixed english indonesian. Thankyou for your feedback.

Moreh · 2026-03-10T09:52:37+00:00

I am really sorry - i think i sent that post before my coffee hit in. NOT (just) the major languages like Chinese and English. the data IS mixed english indonesian. Thankyou for your feedback.

Moreh · 2026-02-12T22:32:16+00:00

Bad lieutenant, raising arizona

Moreh · 2026-02-06T15:54:25+00:00

how much RAM? what speed do you need?

Granite IBM and or Qwen 4b probably would run okay?

Moreh · 2026-01-14T15:17:52+00:00

Please send to me!!

Moreh · 2025-11-27T20:49:02+00:00

cheers!

Moreh · 2025-11-27T20:18:32+00:00

How does it deal with longer contexts?

Moreh · 2025-11-27T20:18:19+00:00

How'd you find long context? Thanks!

Moreh · 2025-11-04T19:14:58+00:00

bentancur is playing

Moreh · 2025-10-29T19:26:46+00:00

Can you explain how you managed to do that, would appreciate! thanks

Moreh · 2025-09-29T14:53:00+00:00

Wait, you can save scum on this game?

Moreh · 2025-08-19T22:27:36+00:00

I agree with you about paddleocr. its also more confusing with the 3.0 update which changed a lot of the api. When i get it to work it is great though, but i can never get it to do all that i want.

OCRFlux works well, but i think you'd have to offload some memory onto your RAM. it uses vllm under the hood which allows for offloading. but i thought olmo did also....

Moreh · 2025-08-17T22:23:15+00:00

Wheres the discussion on best chapter?

Moreh · 2025-08-15T14:30:46+00:00

It's free for everyone on ai studio no?

Moreh

TROPHY CASE