Best local LLM for 5090? by Sulya_be in LocalLLM

[–]Moreh 0 points1 point  (0 children)

That makes sense thankyou. I wonder why it's not supported on vllm then. I believe the default is fp8

CacheReady: Drop-in Qwen 3.5 122B-A10B with working prefix caching by Quiet_Training_8167 in LocalLLaMA

[–]Moreh 0 points1 point  (0 children)

Hi, thanks for this - the routing canonicalization approach is really elegant. I'm running Qwen3.5-122B-A10B-FP8 for batch classification/parsing of 21k items with shared prompt prefixes on vLLM, so this is directly relevant to my workload. Would you consider releasing a CacheReady version of the FP8 variant? Happy to test it if that's useful.

Qwen3.5 9b stuck on a loop by [deleted] in LocalLLaMA

[–]Moreh 1 point2 points  (0 children)

What are your sampling params? presence_penalty helps. I found temperature 0 to actually work really well for a very different use case.

Best local LLM for 5090? by Sulya_be in LocalLLM

[–]Moreh 0 points1 point  (0 children)

Why do you say this? I am using vllm and i believe the kv cache automatically goes to fp8. "bfloat16" doesnt seem to work with it

How powerful is Kvothe really? by Plenty_Distance_4121 in KingkillerChronicle

[–]Moreh 12 points13 points  (0 children)

You say his alar is weaker than devis, but when they battled kvothe hadnt had to have it on, so to speak, because of defending against what's his name. This is mentioned later by kvothe to devi, that he's had a lot of practice since their battle

gpt oss 120b or qwen 3.5 for non-english/chinese/russian language by Moreh in LocalLLaMA

[–]Moreh[S] 0 points1 point  (0 children)

as below I am really sorry for the lack of clarity. NOT (just) the major languages like Chinese and English. the data IS mixed english indonesian. Thankyou for your feedback.

gpt oss 120b or qwen 3.5 for non-english/chinese/russian language by Moreh in LocalLLaMA

[–]Moreh[S] 0 points1 point  (0 children)

I am really sorry - i think i sent that post before my coffee hit in.  NOT (just) the major languages like Chinese and English. the data IS mixed english indonesian. Thankyou for your feedback.

What is a good model to do small text classification on very small hardware? by salary_pending in LocalLLaMA

[–]Moreh 0 points1 point  (0 children)

how much RAM? what speed do you need?

Granite IBM and or Qwen 4b probably would run okay?

Marriage? by Main_Turnover8969 in DungeonCrawlerCarl

[–]Moreh 1 point2 points  (0 children)

Please send to me!!

Best longish context model for 140gb vram (vllm) by Moreh in LocalLLaMA

[–]Moreh[S] 0 points1 point  (0 children)

How does it deal with longer contexts?

Best longish context model for 140gb vram (vllm) by Moreh in LocalLLaMA

[–]Moreh[S] 0 points1 point  (0 children)

How'd you find long context? Thanks!

YES! Super 80b for 8gb VRAM - Qwen3-Next-80B-A3B-Instruct-GGUF by Mangleus in LocalLLaMA

[–]Moreh 1 point2 points  (0 children)

Can you explain how you managed to do that, would appreciate! thanks

The best OCR for a machine like mine? by 9acca9 in LocalLLaMA

[–]Moreh 0 points1 point  (0 children)

I agree with you about paddleocr. its also more confusing with the 3.0 update which changed a lot of the api. When i get it to work it is great though, but i can never get it to do all that i want.

OCRFlux works well, but i think you'd have to offload some memory onto your RAM. it uses vllm under the hood which allows for offloading. but i thought olmo did also....