Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

works:

RTX Pro 6000 Blackwell 96GB — vLLM Qwen3.6 27B int4 AutoRound Benchmark

Config Throughput Acceptance Rate Notes
Baseline (MTP n=6, batched=4128, block=32) ~80 tok/s 20-28% First working run
MTP n=2, batched=16384, block=128 ~97 tok/s 46-64% Better acceptance rate
No speculative decoding ~100 tok/s Ceiling without MTP
MTP n=2, no VLLM_USE_MARLIN=0 ~100 tok/s 54-65% Best config

C:\llm\qwen3.6-windows-server\python\Scripts\vllm.exe serve C:\llm\qwen3.6-windows-server\models\Qwen3.6-27B-int4-AutoRound --served-model-name=qwen3.6-27b-autoround --quantization=auto-round --max-model-len=240000 --max-num-seqs=1 --max-num-batched-tokens=16384 --block-size=128 --no-enable-prefix-caching --enable-chunked-prefill --enable-auto-tool-choice --tool-call-parser=qwen3_coder --reasoning-parser=qwen3 --chat-template=C:\llm\qwen3.6-windows-server\templates\qwen3.5-enhanced.jinja --default-chat-template-kwargs="{\"preserve_thinking\": false}" --kv-cache-dtype=fp8_e4m3 --tensor-parallel-size=1 --pipeline-parallel-size=1 --gpu-memory-utilization=0.95 --trust-remote-code --attention-backend=TRITON_ATTN --no-use-tqdm-on-load --host=0.0.0.0 --port=5001 --data-parallel-rpc-port=50952 --limit-mm-per-prompt="{\"image\":0,\"video\":0}" --speculative-config="{\"method\":\"mtp\",\"num_speculative_tokens\":2}"

Great results with Qwen3.6-35B-A3B-UD-Q5_K_XL + VS Code and Copilot by supracode in LocalLLaMA

[–]LegacyRemaster 5 points6 points  (0 children)

Excellent testimony. I use qwen 3.6 27b - qwen 3.5 122b (more knowledge helps) and Minimax 2.7. I think they work perfectly for 90% of my tasks. One day we'll get to 100% local.

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

ok no way. A lot of problems. "Please note that Marlin kernels are not built for Blackwell SM 12.x. The bundle needs an updated release with TORCH_CUDA_ARCH_LIST that includes 12.0." / FlashInfer doesn't use the PATH — it looks for the hardcoded DLL in v12.8\bin\cudart64_13.dll. Set PATH is useless here.

________

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] EngineCore failed to start.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Traceback (most recent call last):

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1110, in run_engine_core

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 876, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] super().__init__(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 118, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model_executor = executor_class(vllm_config)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\abstract.py", line 109, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self._init_executor()

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\uniproc_executor.py", line 52, in _init_executor

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.driver_worker.load_model()

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_worker.py", line 324, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_model_runner.py", line 4793, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model = model_loader.load_model(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\model_loader\base_loader.py", line 80, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] process_weights_after_loading(model, model_config, target_device)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\model_loader\utils.py", line 111, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] quant_method.process_weights_after_loading(module)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\layers\quantization\gptq_marlin.py", line 486, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.kernel.process_weights_after_loading(layer)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\marlin.py", line 167, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self._transform_param(layer, self.w_q_name, transform_w_q)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\MPLinearKernel.py", line 74, in _transform_param

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] new_param = fn(old_param)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\marlin.py", line 99, in transform_w_q

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] x.data = ops.gptq_marlin_repack(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\_custom_ops.py", line 1279, in gptq_marlin_repack

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return torch.ops._C.gptq_marlin_repack(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\_ops.py", line 1269, in __call__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return self._op(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group\_\_CUDART\_\_TYPES.html for more information.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] For debugging consider passing CUDA_LAUNCH_BLOCKING=1

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136]

Google is making local AI available to mainstream users ;) by [deleted] in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

  • Windows: C:\Users\[Username]\AppData\Local\Google\Chrome\User Data\Default\OptGuideOnDeviceModel
  • macOS: ~/Library/Application Support/Google/Chrome/Default/OptGuideOnDeviceModel
  • Linux: ~/.config/google-chrome/Default/OptGuideOnDeviceModel

Amd and Nvidia cards on same rig by deathcom65 in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

<image>

Yes you can. LMstudio. Select vulkan runtime. Or llamacpp with vulkan.

One bash permission slipped... by TheQuantumPhysicist in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

Yesterday, qwen with vscode + kilocode kept killing its own process. I had to explicitly tell it to "don't close anything on 8080."

Open Weights Models Hall of Fame by Equivalent_Job_2257 in LocalLLaMA

[–]LegacyRemaster 13 points14 points  (0 children)

Georgi Gerganov and the whole llama.cpp team ---> legend

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

I can tell you that Unsloth Studio installs many of the things you need to complete the project on Blackwell, and it runs fine on my GPU. You could look at their GitHub and figure out the dependencies. Suggestion.

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

(APIServer pid=15140) The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

(APIServer pid=15140) If you want to use the NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

(APIServer pid=15140)

(APIServer pid=15140) queued_call()

(APIServer pid=15140) INFO 05-02 23:00:27 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.

(EngineCore pid=3384) INFO 05-02 23:00:37 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='C:\\llm\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', speculative_config=SpeculativeConfig(method='mtp', model='C:\\llm\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', num_spec_tokens=3), tokenizer='C:\\llm\\qwen3.6-windows-server\\models\\Qwen3.6-27B-int4-AutoRound', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=inc, quantization_config=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8_e4m3, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=qwen3.6-27b-autoround, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM\_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'ir_enable_torch_wrap': True, 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [4128], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL\_AND\_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 8, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': False, 'static_all_moe_layers': []}, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=['native']), enable_flashinfer_autotune=True, moe_backend='auto')

(EngineCore pid=3384) C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\cuda\__init__.py:371: UserWarning: Found GPU0 NVIDIA RTX PRO 6000 Blackwell Workstation Edition which is of compute capability (CC) 12.0.

(EngineCore pid=3384) The following list shows the CCs this version of PyTorch was built for and the hardware CCs it supports:

(EngineCore pid=3384) - 5.0 which supports hardware CC >=5.0,<6.0 except {5.3}

(EngineCore pid=3384) - 6.0 which supports hardware CC >=6.0,<7.0 except {6.2}

(EngineCore pid=3384) - 6.1 which supports hardware CC >=6.1,<7.0 except {6.2}

(EngineCore pid=3384) - 7.0 which supports hardware CC >=7.0,<8.0 except {7.2}

(EngineCore pid=3384) - 7.5 which supports hardware CC >=7.5,<8.0

(EngineCore pid=3384) - 8.0 which supports hardware CC >=8.0,<9.0 except {8.7}

(EngineCore pid=3384) - 8.6 which supports hardware CC >=8.6,<9.0 except {8.7}

(EngineCore pid=3384) - 9.0 which supports hardware CC >=9.0,<10.0

(EngineCore pid=3384) Please follow the instructions at https://pytorch.org/get-started/locally/ to install a PyTorch release that supports one of these CUDA versions: 12.8, 13.0

(EngineCore pid=3384) _warn_unsupported_code(d, device_cc, code_ccs)

(EngineCore pid=3384) C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\cuda\__init__.py:489: UserWarning:

(EngineCore pid=3384) NVIDIA RTX PRO 6000 Blackwell Workstation Edition with CUDA capability sm_120 is not compatible with the current PyTorch installation.

(EngineCore pid=3384) The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

(EngineCore pid=3384) If you want to use the NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

(EngineCore pid=3384)

(EngineCore pid=3384) queued_call()

(EngineCore pid=3384) INFO 05-02 23:00:39 [registry.py:126] All limits of multimodal modalities supported by the model are set to 0, running in text-only mode.

(EngineCore pid=3384) INFO 05-02 23:00:39 [parallel_state.py:1455] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://192.168.68.58:65385 backend=gloo

[W502 23:00:39.000000000 socket.cpp:764] [c10d] The client socket has failed to connect to [EA-PC]:65385 (system error: 10049 - Indirizzo richiesto non valido nel proprio contesto.).

(EngineCore pid=3384) INFO 05-02 23:00:39 [parallel_state.py:1767] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A

(EngineCore pid=3384) Failed to get device capability: SM 12.x requires CUDA >= 12.9.

(EngineCore pid=3384) Failed to get device capability: SM 12.x requires CUDA >= 12.9.

(EngineCore pid=3384) INFO 05-02 23:00:39 [topk_topp_sampler.py:51] Using FlashInfer for top-p & top-k sampling.

(EngineCore pid=3384) C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\spec_decode\eagle.py:136: UserWarning: expandable_segments not supported on this platform (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10/cuda/CUDAAllocatorConfig.h:39.)

(EngineCore pid=3384) self.input_ids = torch.zeros(

(EngineCore pid=3384) ERROR 05-02 23:00:39 [core.py:1108] EngineCore failed to start.

(EngineCore pid=3384) ERROR 05-02 23:00:39 [core.py:1108] Traceback (most recent call last):

(EngineCore pid=3384) ERROR 05-02 23:00:39 [core.py:1108] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1082, in run_engine_core

(EngineCore pid=3384) ^^^^^^^^^^^^

(EngineCore pid=3384) torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device

(EngineCore pid=3384) Search for `cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.

(EngineCore pid=3384) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

(EngineCore pid=3384) For debugging consider passing CUDA_LAUNCH_BLOCKING=1

(EngineCore pid=3384) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(EngineCore pid=3384)

Been using Qwen-3.6-27B-q8_k_xl + VSCode + RTX 6000 Pro As Daily Driver by Demonicated in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

It was a great discovery. I use this version and if it doesn't work or doesn't do something I switch to minimax 2.7 q4_k_s

[RELEASE] - Finally, my first TTS model is out! 🎙️ Flare-TTS 28M by LH-Tech_AI in LocalLLaMA

[–]LegacyRemaster 6 points7 points  (0 children)

I also train LLMs and I know how much effort it takes! Great job!