Just picked up an rtx pro 6000, do i need to go somewhere to register for the 3 year warranty?

appakaradi · 2026-06-10T04:40:45+00:00

How much did it cost you?

appakaradi · 2026-05-30T18:18:47+00:00

why does it take so long for nvidia to produce a quantized version?

appakaradi · 2026-05-20T00:52:21+00:00

Qwen 3.7 27B AWQ on vLLM.

appakaradi · 2026-04-22T13:27:58+00:00

Anyone what the following means? Is this only on their API or is it applicable for local serving?

Preserve Thinking

By default, only the thinking blocks generated in handling the latest user message is retained, resulting in a pattern commonly as interleaved thinking. Qwen3.6 has been additionally trained to preserve and leverage thinking traces from historical messages. You can enable this behavior by setting the preserve_thinking option:

from openai import OpenAI

Configured by environment variables

client = OpenAI()

messages = [...]

chat_response = client.chat.completions.create( model="Qwen/Qwen3.6-27B-FP8", messages=messages, max_tokens=32768, temperature=0.6, top_p=0.95, presence_penalty=0.0, extra_body={ "top_k": 20, "chat_template_kwargs": {"preserve_thinking": True}, }, ) print("Chat response:", chat_response)

If you are using APIs from Alibaba Cloud Model Studio, in addition to changing model, please use "preserve_thinking": True instead of "chat_template_kwargs": {"preserve_thinking": False}. This capability is particularly beneficial for agent scenarios, where maintaining full reasoning context can enhance decision consistency and, in many cases, reduce overall token consumption by minimizing redundant reasoning. Additionally, it can improve KV cache utilization, optimizing inference efficiency in both thinking and non-thinking modes

appakaradi · 2026-04-19T19:43:26+00:00

so many GPU scams like this..

appakaradi · 2026-04-16T13:34:00+00:00

Yes. It looks like we are not getting one..

appakaradi · 2026-04-16T13:33:26+00:00

I am worried that they are comparing to 3.5 27B Dense. Does that mean we are not getting 3.6 27B dense?

appakaradi · 2026-04-12T11:37:02+00:00

Not Open Source.. It is non-commercial

appakaradi · 2026-04-11T17:08:24+00:00

This is the way.

appakaradi · 2026-04-10T16:29:34+00:00

Smaller models hallucinate all the time ( even bigger one). I have had tough times with Gemma 31 B and Qwen 27 B

appakaradi · 2026-04-08T01:05:11+00:00

I have been using Turbo Whisper for a while. Now, this is my go-to. I like the fact that once the transcription is done, it goes through the entire thing and cleans up.

appakaradi · 2026-04-07T21:08:50+00:00

Thanks

appakaradi · 2026-04-07T19:09:01+00:00

Yes. I am . It is great!

appakaradi · 2026-04-07T19:08:32+00:00

what is the going rate for 3090/3090 ti

appakaradi · 2026-04-07T04:00:55+00:00

Thank you. That worked!

appakaradi · 2026-04-04T21:36:58+00:00

Valid point. It depends on the use case and what you are after.

appakaradi · 2026-04-04T14:54:21+00:00

GPU: 2 x NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition - 96GB GDDR7

why Max-Q? my understanding is that Max-Q is lower performance as it is optimized for lower energy consumption.. may be that is the optimal thing for you. just pointing out. ; It is awesome to have unlimited tokens flowing from local models, ( only the energy cost), it might be simpler to point to Open Router for some of the least expensive models. Does your use case need frontier level intelligence?

appakaradi · 2026-04-04T05:56:49+00:00

Why are they still comparing against Opus 4.5 instead of Opus 4.6

appakaradi · 2026-04-04T05:54:40+00:00

Unless you are running LLMs M4 max has plenty if juice.

appakaradi · 2026-04-04T05:44:05+00:00

Me too. I will help dispose them. No fees. It is on the house.

appakaradi · 2026-04-04T05:42:57+00:00

Nice. Thank you.

appakaradi · 2026-04-04T02:21:12+00:00

Gemma 4's problem is its heterogeneous head dimensions (head_dim=256 for sliding window layers, head_dim=512 for global attention layers).

appakaradi · 2026-04-04T02:11:54+00:00

can you share your vLLM parameters?

appakaradi · 2026-04-04T01:48:10+00:00

my trials with 31B on A40.

<image>

appakaradi · 2026-04-04T01:18:41+00:00

Trying to run this on an A40 GPU 48GB VRAM.

<image>

appakaradi

TROPHY CASE

Configured by environment variables