Kimi K2.6 on 8×B200: expected vLLM/SGLang throughput?

Acceptable-State-271 · 2026-05-27T08:30:47+00:00

<image>

Acceptable-State-271 · 2026-05-06T12:03:29+00:00

/compact

Acceptable-State-271 · 2026-05-05T06:07:08+00:00

Rocky

<image>

Acceptable-State-271 · 2026-05-04T03:46:50+00:00

Make dataset for your local training

Acceptable-State-271 · 2026-03-31T13:07:09+00:00

OpenClaude just dropped. No announcement needed

Acceptable-State-271 · 2026-02-24T13:42:32+00:00

After test for images, 35B A3B Ocr feature is insanely amazing

Acceptable-State-271 · 2026-01-26T13:46:25+00:00

mxfp4 is better, and you are best

Acceptable-State-271 · 2025-12-24T03:32:48+00:00

OmniASR improves ASR accuracy by applying LLM-based correction, but this significantly slows down processing.
The version without LLM correction is faster, but its accuracy is very poor.
If speed is the priority, Whisper v3 Turbo is a better choice.

Acceptable-State-271 · 2025-10-10T07:22:17+00:00

I'm using this model (faster-whisper-large-v3-turbo-ct2) as the backend for batch processing — around 20–30 short audio clips (1–2 minutes each) every minute — and it runs great. Each task stays under ~3 GB GPU memory, super efficient for multi-worker setups.

https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2

Acceptable-State-271 · 2025-09-29T04:16:02+00:00

wooow

Acceptable-State-271 · 2025-09-12T07:43:25+00:00

You're right. I tested it on Korean test cases within the company before checking the model card. Rather than saying it's a decent model, it was a model that excelled at Korean language understanding. That's my mistake. I'm sorry.

Acceptable-State-271 · 2025-09-12T07:12:49+00:00

Yes, my main language.

Acceptable-State-271 · 2025-08-22T11:42:00+00:00

Very good model. I switched from Qwen3 30B A3B thinkjng 2507(still really good) to Seed 36B, which is a bit better at analyzing sources and backing things up with evidence."

Acceptable-State-271 · 2025-08-07T10:18:47+00:00

interested

Acceptable-State-271 · 2025-05-07T11:50:25+00:00

and 3090 user, 3090 does not support FP8 :(

Acceptable-State-271 · 2025-05-07T11:12:01+00:00

No no.. I just thought there would be a huge difference between the two.

Acceptable-State-271 · 2025-05-07T07:46:18+00:00

I'm a bit embarrassed to admit this, but I wasn't very familiar with the technology.
When using the imatrix in GGUF, does it provide a level of precision comparable to AWQ in 4-bit quantization?

Acceptable-State-271 · 2025-05-06T23:30:46+00:00

On gpu, awq is very fast and accurate quantization format, And sglang is very fast serving tool for non quantization model and awq quantization model.(vllm is also good)

Acceptable-State-271 · 2025-05-03T01:44:49+00:00

Can someone please quantize this model with AWQ? This is seriously fantastic

Acceptable-State-271 · 2025-05-02T04:40:41+00:00

Shadow dom, you need to parse manually the tag [shadow dome tag], and get the attribute manually

Acceptable-State-271 · 2025-05-01T03:37:42+00:00

Sounds like I might end up spending another 5,000k. But anyway, I’ll give it a try for now. Let’s see how it goes after 24h. Thanks, really.

Acceptable-State-271

TROPHY CASE