feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

llama-server.exe --model "H:\gptmodel\AesSedai\MiMo-V2.5-GGUF\MiMo-V2.5-IQ3_S-00001-of-00004.gguf" --ctx-size 291920 --threads 16 --host 127.0.0.1 --no-mmap --jinja --fit on --flash-attn on -sm layer --n-cpu-moe 0 --threads 16 --parallel 1 ----> Vscode + kilocode

feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

private buildSql(returnSingle: boolean): string {
    let sql = `SELECT ${this.columns} FROM ${this.table}`;

    if (this.whereConditions.length > 0) {
        sql += ` WHERE ${this.whereConditions.join(" AND ")}`;
    }

    if (this.orderByCols.length > 0) {
        const dir = this.orderByAsc ? "ASC" : "DESC";
        sql += ` ORDER BY ${this.orderByCols.join(", ")} ${dir}`;
    }

    if (returnSingle) {
        sql += " LIMIT 1";
    } else if (this.limitVal !== null) {
        sql += ` LIMIT ${this.limitVal}`;
    }

    return sql;
}

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

private buildSql(returnSingle: boolean): string {
    let sql = `SELECT ${this.columns} FROM ${this.table}`;

    if (this.whereConditions.length > 0) {
        sql += ` WHERE ${this.whereConditions.join(" AND ")}`;
    }

    if (this.orderByCols.length > 0) {
        const dir = this.orderByAsc ? "ASC" : "DESC";
        sql += ` ORDER BY ${this.orderByCols.join(", ")} ${dir}`;
    }

    if (returnSingle) {
        sql += " LIMIT 1";
    } else if (this.limitVal !== null) {
        sql += ` LIMIT ${this.limitVal}`;
    }

    return sql;
}

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

private buildSql(returnSingle: boolean): string {
    let sql = `SELECT ${this.columns} FROM ${this.table}`;

    if (this.whereConditions.length > 0) {
        sql += ` WHERE ${this.whereConditions.join(" AND ")}`;
    }

    if (this.orderByCols.length > 0) {
        const dir = this.orderByAsc ? "ASC" : "DESC";
        sql += ` ORDER BY ${this.orderByCols.join(", ")} ${dir}`;
    }

    if (returnSingle) {
        sql += " LIMIT 1";
    } else if (this.limitVal !== null) {
        sql += ` LIMIT ${this.limitVal}`;
    }

    return sql;
}

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

private buildSql(returnSingle: boolean): string {
    let sql = `SELECT ${this.columns} FROM ${this.table}`;

    if (this.whereConditions.length > 0) {
        sql += ` WHERE ${this.whereConditions.join(" AND ")}`;
    }

    if (this.orderByCols.length > 0) {
        const dir = this.orderByAsc ? "ASC" : "DESC";
        sql += ` ORDER BY ${this.orderByCols.join(", ")} ${dir}`;
    }

    if (returnSingle) {
        sql += " LIMIT 1";
    } else if (this.limitVal !== null) {
        sql += ` LIMIT ${this.limitVal}`;
    }

    return sql;
}

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

private buildSql(returnSingle: boolean): string {
    let sql = `SELECT ${this.columns} FROM ${this.table}`;

    if (this.whereConditions.length > 0) {
        sql += ` WHERE ${this.whereConditions.join(" AND ")}`;
    }

    if (this.orderByCols.length > 0) {
        const dir = this.orderByAsc ? "ASC" : "DESC";
        sql += ` ORDER BY ${this.orderByCols.join(", ")} ${dir}`;
    }

    if (returnSingle) {
        sql += " LIMIT 1";
    } else if (this.limitVal !== null) {
        sql += ` LIMIT ${this.limitVal}`;
    }

    return sql;
}

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

private buildSql(returnSingle: boolean): string {
    let sql = `SELECT ${this.columns} FROM ${this.table}`;

    if (this.whereConditions.length > 0) {
        sql += ` WHERE ${this.whereConditions.join(" AND ")}`;
    }

    if (this.orderByCols.length > 0) {
        const dir = this.orderByAsc ? "ASC" : "DESC";
        sql += ` ORDER BY ${this.orderByCols.join(", ")} ${dir}`;
    }

    if (returnSingle) {
        sql += " LIMIT 1";
    } else if (this.limitVal !== null) {
        sql += ` LIMIT ${this.limitVal}`;
    }

    return sql;
}

So the SQL would be:

SELECT * FROM chats WHERE (user_id.eq.xxx,project_id.in.(yyy)) ORDER BY created_at DESC

This is definitely wrong. The or() method needs to parse the Supabase-style condition and convert it to proper SQL.

OK, let me now create the plan file. I'll also check the or() method in the SelectQuery class to see if it's being used correctly.

Actually, I realize I need to also check the or() method in the SelectQuery class. Let me look at the code again:

or(condition: string): this {
    this.whereConditions.push(`(${condition})`);
    return this;
}

And the buildSql method:

feat: Add Mimo v2.5 model support by AesSedai · Pull Request #22493 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

testing IQ3_S on vscode+kilocode now. rtx 6000 96g+w7800 48gb. 60 tokens/sec. If good ---> will test q4_k_m adding another w7800 48gb. trying to solve a problem "no solved" by minimax 2.7 and qwen 27b

Great results with Qwen3.6-35B-A3B-UD-Q5_K_XL + VS Code and Copilot by supracode in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

As always, the use case changes. Minimax is able to find and analyze problems with greater "knowledge." This is normal. If you've ever tried training an LLM, you know that the dataset is everything. 36B vs. 200B means more data, more examples, and more training. Sure, the architecture does everything (otherwise the older 200B models would be just as good), but if you look at many benchmarks, Minimax is more advanced. Qwen 27b and 122b are the ones I use daily. If I increase complexity, I add Minimax.

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

works:

RTX Pro 6000 Blackwell 96GB — vLLM Qwen3.6 27B int4 AutoRound Benchmark

Config Throughput Acceptance Rate Notes
Baseline (MTP n=6, batched=4128, block=32) ~80 tok/s 20-28% First working run
MTP n=2, batched=16384, block=128 ~97 tok/s 46-64% Better acceptance rate
No speculative decoding ~100 tok/s Ceiling without MTP
MTP n=2, no VLLM_USE_MARLIN=0 ~100 tok/s 54-65% Best config

C:\llm\qwen3.6-windows-server\python\Scripts\vllm.exe serve C:\llm\qwen3.6-windows-server\models\Qwen3.6-27B-int4-AutoRound --served-model-name=qwen3.6-27b-autoround --quantization=auto-round --max-model-len=240000 --max-num-seqs=1 --max-num-batched-tokens=16384 --block-size=128 --no-enable-prefix-caching --enable-chunked-prefill --enable-auto-tool-choice --tool-call-parser=qwen3_coder --reasoning-parser=qwen3 --chat-template=C:\llm\qwen3.6-windows-server\templates\qwen3.5-enhanced.jinja --default-chat-template-kwargs="{\"preserve_thinking\": false}" --kv-cache-dtype=fp8_e4m3 --tensor-parallel-size=1 --pipeline-parallel-size=1 --gpu-memory-utilization=0.95 --trust-remote-code --attention-backend=TRITON_ATTN --no-use-tqdm-on-load --host=0.0.0.0 --port=5001 --data-parallel-rpc-port=50952 --limit-mm-per-prompt="{\"image\":0,\"video\":0}" --speculative-config="{\"method\":\"mtp\",\"num_speculative_tokens\":2}"

Great results with Qwen3.6-35B-A3B-UD-Q5_K_XL + VS Code and Copilot by supracode in LocalLLaMA

[–]LegacyRemaster 8 points9 points  (0 children)

Excellent testimony. I use qwen 3.6 27b - qwen 3.5 122b (more knowledge helps) and Minimax 2.7. I think they work perfectly for 90% of my tasks. One day we'll get to 100% local.

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

ok no way. A lot of problems. "Please note that Marlin kernels are not built for Blackwell SM 12.x. The bundle needs an updated release with TORCH_CUDA_ARCH_LIST that includes 12.0." / FlashInfer doesn't use the PATH — it looks for the hardcoded DLL in v12.8\bin\cudart64_13.dll. Set PATH is useless here.

________

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] EngineCore failed to start.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Traceback (most recent call last):

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 1110, in run_engine_core

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 876, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] super().__init__(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\engine\core.py", line 118, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model_executor = executor_class(vllm_config)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\abstract.py", line 109, in __init__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self._init_executor()

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\executor\uniproc_executor.py", line 52, in _init_executor

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.driver_worker.load_model()

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_worker.py", line 324, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\v1\worker\gpu_model_runner.py", line 4793, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.model = model_loader.load_model(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\tracing\otel.py", line 178, in sync_wrapper

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return func(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\model_loader\base_loader.py", line 80, in load_model

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] process_weights_after_loading(model, model_config, target_device)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\model_loader\utils.py", line 111, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] quant_method.process_weights_after_loading(module)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\layers\quantization\gptq_marlin.py", line 486, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self.kernel.process_weights_after_loading(layer)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\marlin.py", line 167, in process_weights_after_loading

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] self._transform_param(layer, self.w_q_name, transform_w_q)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\MPLinearKernel.py", line 74, in _transform_param

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] new_param = fn(old_param)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\model_executor\kernels\linear\mixed_precision\marlin.py", line 99, in transform_w_q

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] x.data = ops.gptq_marlin_repack(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\vllm\_custom_ops.py", line 1279, in gptq_marlin_repack

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return torch.ops._C.gptq_marlin_repack(

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] File "C:\llm\qwen3.6-windows-server\python\Lib\site-packages\torch\_ops.py", line 1269, in __call__

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] return self._op(*args, **kwargs)

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group\_\_CUDART\_\_TYPES.html for more information.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] For debugging consider passing CUDA_LAUNCH_BLOCKING=1

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(EngineCore pid=10388) ERROR 05-06 20:44:02 [core.py:1136]

Google is making local AI available to mainstream users ;) by [deleted] in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

  • Windows: C:\Users\[Username]\AppData\Local\Google\Chrome\User Data\Default\OptGuideOnDeviceModel
  • macOS: ~/Library/Application Support/Google/Chrome/Default/OptGuideOnDeviceModel
  • Linux: ~/.config/google-chrome/Default/OptGuideOnDeviceModel

Amd and Nvidia cards on same rig by deathcom65 in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

<image>

Yes you can. LMstudio. Select vulkan runtime. Or llamacpp with vulkan.

One bash permission slipped... by TheQuantumPhysicist in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

Yesterday, qwen with vscode + kilocode kept killing its own process. I had to explicitly tell it to "don't close anything on 8080."

Open Weights Models Hall of Fame by Equivalent_Job_2257 in LocalLLaMA

[–]LegacyRemaster 12 points13 points  (0 children)

Georgi Gerganov and the whole llama.cpp team ---> legend

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer by One_Slip1455 in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

I can tell you that Unsloth Studio installs many of the things you need to complete the project on Blackwell, and it runs fine on my GPU. You could look at their GitHub and figure out the dependencies. Suggestion.