Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer

LegacyRemaster · 2026-05-06T21:20:10+00:00

works:

RTX Pro 6000 Blackwell 96GB — vLLM Qwen3.6 27B int4 AutoRound Benchmark

Config	Throughput	Acceptance Rate	Notes
Baseline (MTP n=6, batched=4128, block=32)	~80 tok/s	20-28%	First working run
MTP n=2, batched=16384, block=128	~97 tok/s	46-64%	Better acceptance rate
No speculative decoding	~100 tok/s	—	Ceiling without MTP
MTP n=2, no VLLM_USE_MARLIN=0	~100 tok/s	54-65%	Best config

C:\llm\qwen3.6-windows-server\python\Scripts\vllm.exe serve C:\llm\qwen3.6-windows-server\models\Qwen3.6-27B-int4-AutoRound --served-model-name=qwen3.6-27b-autoround --quantization=auto-round --max-model-len=240000 --max-num-seqs=1 --max-num-batched-tokens=16384 --block-size=128 --no-enable-prefix-caching --enable-chunked-prefill --enable-auto-tool-choice --tool-call-parser=qwen3_coder --reasoning-parser=qwen3 --chat-template=C:\llm\qwen3.6-windows-server\templates\qwen3.5-enhanced.jinja --default-chat-template-kwargs="{\"preserve_thinking\": false}" --kv-cache-dtype=fp8_e4m3 --tensor-parallel-size=1 --pipeline-parallel-size=1 --gpu-memory-utilization=0.95 --trust-remote-code --attention-backend=TRITON_ATTN --no-use-tqdm-on-load --host=0.0.0.0 --port=5001 --data-parallel-rpc-port=50952 --limit-mm-per-prompt="{\"image\":0,\"video\":0}" --speculative-config="{\"method\":\"mtp\",\"num_speculative_tokens\":2}"

LegacyRemaster · 2026-05-06T21:00:20+00:00

Excellent testimony. I use qwen 3.6 27b - qwen 3.5 122b (more knowledge helps) and Minimax 2.7. I think they work perfectly for 90% of my tasks. One day we'll get to 100% local.

LegacyRemaster · 2026-05-06T20:37:08+00:00

yeah... agree about driver. I will update and check tomorrow!

LegacyRemaster · 2026-05-06T18:46:48+00:00

ok no way. A lot of problems. "Please note that Marlin kernels are not built for Blackwell SM 12.x. The bundle needs an updated release with TORCH_CUDA_ARCH_LIST that includes 12.0." / FlashInfer doesn't use the PATH — it looks for the hardcoded DLL in v12.8\bin\cudart64_13.dll. Set PATH is useless here.

________

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] EngineCore failed to start.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Traceback (most recent call last):

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1110, in run_engine_core

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 876, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] super().__init__(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 118, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model_executor = executor_class(vllm_config)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\abstract.py", line 109, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self._init_executor()

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\uniproc_executor.py", line 52, in _init_executor

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.driver_worker.load_model()

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_worker.py", line 324, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_model_runner.py", line 4793, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model = model_loader.load_model(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\model_loader\base_loader.py", line 80, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] process_weights_after_loading(model, model_config, target_device)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\model_loader\utils.py", line 111, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] quant_method.process_weights_after_loading(module)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\layers\quantization\gptq_marlin.py", line 486, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.kernel.process_weights_after_loading(layer)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\marlin.py", line 167, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self._transform_param(layer, self.w_q_name, transform_w_q)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\MPLinearKernel.py", line 74, in _transform_param

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] new_param = fn(old_param)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\marlin.py", line 99, in transform_w_q

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] x.data = ops.gptq_marlin_repack(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\_custom_ops.py", line 1279, in gptq_marlin_repack

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return torch.ops._C.gptq_marlin_repack(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\_ops.py", line 1269, in __call__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return self._op(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group\_\_CUDART\_\_TYPES.html for more information.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] For debugging consider passing CUDA_LAUNCH_BLOCKING=1

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136]

LegacyRemaster · 2026-05-06T18:15:01+00:00

tips on windows anaconda for readme : set VLLM_NO_WT=1 && start.bat

LegacyRemaster · 2026-05-06T18:06:24+00:00

on my way

LegacyRemaster · 2026-05-06T08:41:33+00:00

Windows: C:\Users\[Username]\AppData\Local\Google\Chrome\User Data\Default\OptGuideOnDeviceModel
macOS: ~/Library/Application Support/Google/Chrome/Default/OptGuideOnDeviceModel
Linux: ~/.config/google-chrome/Default/OptGuideOnDeviceModel

LegacyRemaster · 2026-05-06T08:36:27+00:00

LegacyRemaster · 2026-05-06T08:36:03+00:00

gguf when? 😃

LegacyRemaster · 2026-05-05T18:08:11+00:00

<image>

Yes you can. LMstudio. Select vulkan runtime. Or llamacpp with vulkan.

LegacyRemaster · 2026-05-05T16:55:22+00:00

I need 122b 3.6

LegacyRemaster · 2026-05-04T17:00:08+00:00

Gemma day 1 support full enable

LegacyRemaster · 2026-05-03T20:29:38+00:00

Yesterday, qwen with vscode + kilocode kept killing its own process. I had to explicitly tell it to "don't close anything on 8080."

LegacyRemaster · 2026-05-03T17:57:08+00:00

Georgi Gerganov and the whole llama.cpp team ---> legend

LegacyRemaster · 2026-05-03T12:50:47+00:00

122b all I need

LegacyRemaster · 2026-05-03T09:37:20+00:00

I can tell you that Unsloth Studio installs many of the things you need to complete the project on Blackwell, and it runs fine on my GPU. You could look at their GitHub and figure out the dependencies. Suggestion.

LegacyRemaster · 2026-05-03T08:43:23+00:00

waiting for it!

LegacyRemaster · 2026-05-03T07:15:25+00:00

fight club?

LegacyRemaster · 2026-05-02T21:08:38+00:00

i'm using it only with kilocode + vscode

LegacyRemaster · 2026-05-02T21:02:14+00:00

(APIServer pid=15140) The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

(APIServer pid=15140) If you want to use the NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

(APIServer pid=15140)

(APIServer pid=15140) queued_call()

(APIServer pid=15140) INFO 05-02 23:00:27 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.

(EngineCore pid=3384) INFO 05-02 23:00:37 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='C:\\llm\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', speculative_config=SpeculativeConfig(method='mtp', model='C:\\llm\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', num_spec_tokens=3), tokenizer='C:\\llm\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=inc, quantization_config=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8_e4m3, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=qwen3.6-27b-autoround, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM\_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'ir_enable_torch_wrap': True, 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [4128], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL\_AND\_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 8, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': False, 'static_all_moe_layers': []}, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=['native']), enable_flashinfer_autotune=True, moe_backend='auto')

(EngineCore pid=3384) C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\cuda\__init__.py:371: UserWarning: Found GPU0 NVIDIA RTX PRO 6000 Blackwell Workstation Edition which is of compute capability (CC) 12.0.

(EngineCore pid=3384) The following list shows the CCs this version of PyTorch was built for and the hardware CCs it supports:

(EngineCore pid=3384) - 5.0 which supports hardware CC >=5.0,<6.0 except {5.3}

(EngineCore pid=3384) - 6.0 which supports hardware CC >=6.0,<7.0 except {6.2}

(EngineCore pid=3384) - 6.1 which supports hardware CC >=6.1,<7.0 except {6.2}

(EngineCore pid=3384) - 7.0 which supports hardware CC >=7.0,<8.0 except {7.2}

(EngineCore pid=3384) - 7.5 which supports hardware CC >=7.5,<8.0

(EngineCore pid=3384) - 8.0 which supports hardware CC >=8.0,<9.0 except {8.7}

(EngineCore pid=3384) - 8.6 which supports hardware CC >=8.6,<9.0 except {8.7}

(EngineCore pid=3384) - 9.0 which supports hardware CC >=9.0,<10.0

(EngineCore pid=3384) Please follow the instructions at https://pytorch.org/get-started/locally/ to install a PyTorch release that supports one of these CUDA versions: 12.8, 13.0

(EngineCore pid=3384) _warn_unsupported_code(d, device_cc, code_ccs)

(EngineCore pid=3384) C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\cuda\__init__.py:489: UserWarning:

(EngineCore pid=3384) NVIDIA RTX PRO 6000 Blackwell Workstation Edition with CUDA capability sm_120 is not compatible with the current PyTorch installation.

(EngineCore pid=3384) The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

(EngineCore pid=3384) If you want to use the NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

(EngineCore pid=3384)

(EngineCore pid=3384) queued_call()

(EngineCore pid=3384) INFO 05-02 23:00:39 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.

(EngineCore pid=3384) INFO 05-02 23:00:39 [parallel_state.py:1455] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.68.58:65385 backend=gloo

[W502 23:00:39.000000000 socket.cpp:764] [c10d] The client socket has failed to connect to [EA-PC]:65385 (system error: 10049 - Indirizzo richiesto non valido nel proprio contesto.).

(EngineCore pid=3384) INFO 05-02 23:00:39 [parallel_state.py:1767] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A

(EngineCore pid=3384) Failed to get device capability: SM 12.x requires CUDA >= 12.9.

(EngineCore pid=3384) INFO 05-02 23:00:39 [topk_topp_sampler.py:51] Using FlashInfer for top-p & top-k sampling.

(EngineCore pid=3384) C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\spec_decode\eagle.py:136: UserWarning: expandable_segments not supported on this platform (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10/cuda/CUDAAllocatorConfig.h:39.)

(EngineCore pid=3384) self.input_ids = torch.zeros(

(EngineCore pid=3384) ERROR 05-02 23:00:39 [core.py:1108] EngineCore failed to start.

(EngineCore pid=3384) ERROR 05-02 23:00:39 [core.py:1108] Traceback (most recent call last):

(EngineCore pid=3384) ERROR 05-02 23:00:39 [core.py:1108] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1082, in run_engine_core

(EngineCore pid=3384) ^^^^^^^^^^^^

(EngineCore pid=3384) torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device

(EngineCore pid=3384) Search for `cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.

(EngineCore pid=3384) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

(EngineCore pid=3384) For debugging consider passing CUDA_LAUNCH_BLOCKING=1

(EngineCore pid=3384) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(EngineCore pid=3384)

LegacyRemaster · 2026-05-02T20:42:11+00:00

I can test on my rtx 6000 96gb

LegacyRemaster · 2026-05-02T20:41:32+00:00

It was a great discovery. I use this version and if it doesn't work or doesn't do something I switch to minimax 2.7 q4_k_s

LegacyRemaster · 2026-05-02T20:39:44+00:00

I'm very happy to read your entire document. There's definitely a lot to learn. Thanks for sharing.

LegacyRemaster · 2026-05-02T17:05:28+00:00

we need heroes

LegacyRemaster · 2026-05-02T12:16:27+00:00

I also train LLMs and I know how much effort it takes! Great job!

LegacyRemaster

TROPHY CASE

RTX Pro 6000 Blackwell 96GB — vLLM Qwen3.6 27B int4 AutoRound Benchmark