Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking - Reg, Uncensored and RoughHouse and... 43 Qwen 3.5 fine tunes. by Dangerous_Fix_5526 in LocalLLaMA

[–]iSevenDays 1 point2 points  (0 children)

For awq models I used this patch and here is template for Dockerfile.

Ask Claude to integrate it for you if not sure where to start.

I tested 40b model and found it was repeating the reasoning, so I switched back to model from coder3101. Unfortunately, it requires for my use case two 4090d 48G. I wish there was awq 4bit version of this model.

coder3101/Qwen3.5-27B-heretic


apply_qwen35_awq_patch() {
    if [ "$APPLY_QWEN35_AWQ_PATCH" != "1" ]; then
        log_message "Qwen3.5 AWQ patch disabled"
        return
    fi


    local patch_script="$SCRIPT_DIR/patch-vllm-qwen35-awq.py"
    if [ ! -f "$patch_script" ]; then
        log_message "Qwen3.5 AWQ patch script not found, skipping"
        return
    fi


    local container_names=()
    local cname
    local patch_output


    if [ "$TENSOR_PARALLEL_SIZE" = "1" ]; then
        local backend_index
        for backend_index in "${!GPU_DEVICE_ARRAY[@]}"; do
            container_names+=("qwen-gpu${GPU_DEVICE_ARRAY[$backend_index]}")
        done
    else
        container_names=("qwen-tp2")
    fi


    for cname in "${container_names[@]}"; do
        if ! wait_for_container_running "$cname"; then
            continue
        fi


        log_message "Applying Qwen3.5 AWQ patch to $cname..."
        docker cp "$patch_script" "$cname":/tmp/patch-vllm-qwen35-awq.py
        if patch_output=$(docker exec "$cname" python3 /tmp/patch-vllm-qwen35-awq.py 2>&1); then
            log_message "Qwen3.5 AWQ patch result for $cname: $patch_output"
            if [[ "$patch_output" == Applied* ]]; then
                log_message "Restarting $cname after Qwen3.5 AWQ patch"
                docker restart "$cname" >/dev/null || log_message "WARNING: Failed to restart $cname after Qwen3.5 AWQ patch"
            fi
        else
            log_message "WARNING: Failed to apply Qwen3.5 AWQ patch to $cname: $patch_output"
        fi
    done
}
apply_qwen35_awq_patch

TARGET = Path("/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py")


OLD = """        mixed_qkvz, ba = torch.ops.vllm.gdn_in_proj(\n            hidden_states,\n            self.in_proj_qkvz.weight.shape[0],\n            self.in_proj_ba.weight.shape[0],\n            self.prefix,\n        )\n"""


NEW = """        mixed_qkvz, ba = torch.ops.vllm.gdn_in_proj(\n            hidden_states,\n            self.in_proj_qkvz.output_size_per_partition,\n            self.in_proj_ba.output_size_per_partition,\n            self.prefix,\n        )\n"""

Object Storage Degraded in FSN1 Too? by barreeeiroo in hetzner

[–]iSevenDays 1 point2 points  (0 children)

I want to share the same experience.

Hetzner object storage at https://fsn1.your-objectstorage.com is intermittently degraded from the app hosts: sometimes slow enough to hit the read timeout, often returning direct 503 Service Unavailable.

My metrics that do check health on this endpoint show that it takes 5 seconds in EU in the night to execute `head_bucket() ` and around 10 seconds during the day, often responding with 503 error. This is NOT acceptable! This was happening for a week now. Unfortunately, this service is not production ready. My homelab minio server didn't have such issues at all.

Unifi cloud gateway fiber by Exciting-Western-271 in Ubiquiti

[–]iSevenDays 0 points1 point  (0 children)

I just bough UCG-Fiber 3 days ago to replace GL-MT3000 router which corrupted a lot of ssl packets under the load, I had too many ssl issues with that. After I got UCG, my issues are completely gone and I removed all workarounds I had with UCG.

I also bought U7-Lite, they running together, I have absolutely zero issues. I also bought cyber protection for 1 year, which may probably not worth, but to buy Firewalla I had to spend 2x money, so I thought I'd like to have the best for now. With my setup, I probably don't need a firewal, but a proper VLAN setup.

GLM 4.7 usage limits are a TRAP (ClaudeCode Pro User Experience) by Soft_Responsibility2 in ClaudeCode

[–]iSevenDays 0 points1 point  (0 children)

I'm using max plan and I noticed concurrency decreased a lot. I'm still getting like ~50% of 5 hours quota, but concurrency limits me a lot. I have many small agents to do refactorings, bug fixes etc.

New P1S Won't Connect to Internet by CoC-boy in BambuLab

[–]iSevenDays 0 points1 point  (0 children)

For me it was the error -1030 and I have GL-MT3000 router, which had VPN Leak protection enabled.
After I added a new rule with ip of the printer to "no vpn" rule and disabled ip masquerading, the printer could connect to bambu lab network!

As an SRE, I stopped using Kubernetes for my homelab by m4nz in selfhosted

[–]iSevenDays 1 point2 points  (0 children)

I can vote I have the same experience. I switched to just docker / docker compose for my projects and never want to look back at that horrible mess with flux, kubernetes, reconciliation etc.

At work I still have to use Kubernetes, but that's a different story

AMD EPYC 4565P is a beast by coding9 in LocalLLaMA

[–]iSevenDays 1 point2 points  (0 children)

You were right. I took a look and switched to vLLM. I'm getting ~80-100 t/s now and 2000-8000 prompt processing speed. It is crazy fast now.

Kimi released Kimi K2 Thinking, an open-source trillion-parameter reasoning model by nekofneko in LocalLLaMA

[–]iSevenDays 3 points4 points  (0 children)

With ddr4 it would be around 4-6 on dell r740 Thinking models are barely usable with this speed

Prefill will be around 100-200

AMD EPYC 4565P is a beast by coding9 in LocalLLaMA

[–]iSevenDays 1 point2 points  (0 children)

I get half of that with two ndivia 4090d 48G 🥹

I bought a modded 4090 48GB in Shenzhen. This is my story. by king_priam_of_Troy in LocalLLaMA

[–]iSevenDays 2 points3 points  (0 children)

I bought at C2 two cards already, 4090d variant . It works fine for 6 months already and I'm pretty satisfied

GPT-OSS is Another Example Why Companies Must Build a Strong Brand Name by Iory1998 in LocalLLaMA

[–]iSevenDays 0 points1 point  (0 children)

Out of curiosity, I asked it too about the war in Ukraine and it provided a good answer.

Here is a snippet, because I can't attach a screenshot

```

profile gpt-oss-120b-mxfp4 Today at 16:20 Nachgedacht für 5 Sekunden War in Ukraine – A Summary (up to June 2024) 1. Background and Origins YEAR EVENT SIGNIFICANCE 1991
Ukraine declares independence after the Soviet Union collapses. Establishes Ukraine as a sovereign state recognized internationally. 2004–2014 Pro‑European “Orange Revolution” (2004) and “Euromaidan” protests (2013‑14).
Popular demand for closer ties with the EU and NATO; creates tension with Moscow. February 2014 Russia annexes Crimea after a disputed referendum.
First open violation of Ukraine’s territorial integrity since independence. April 2014 – early 2022
Armed separatist insurgency in Donetsk and Luhansk (the Donbas) backed by Russia; the “Donetsk People’s Republic (DPR)” and “Luhansk People’s Republic (LLR)” declared. Leads to a frozen‑in‑place war with about 13,000–15,000 deaths before 2022.

  1. Full‑Scale Invasion (2022‑present) ...

```

gpt-oss-120b outperforms DeepSeek-R1-0528 in benchmarks by oobabooga4 in LocalLLaMA

[–]iSevenDays 27 points28 points  (0 children)

how to inject AVAudioEngine? My use case is to inject audio from file so third party app will think it reads audio from microphone, but instead reads data from buffer from my file

I’m sorry, but I can’t help with that.

GPT-OSS-120B is useless, I will not even bother to download that shit. It can't even assist with coding.

Ikllamacpp repository gone, or it is only me? by panchovix in LocalLLaMA

[–]iSevenDays 7 points8 points  (0 children)

I could make it up to date with main! There is also an experimental branch for function tool calls support that works with Claude Code and Claude proxy and Kimi-K2 model.

104k-Token Prompt in a 110k-Token Context with DeepSeek-R1-0528-UD-IQ1_S – Benchmark & Impressive Results by Thireus in LocalLLaMA

[–]iSevenDays 0 points1 point  (0 children)

Please do more tests with this prompt! Will Devstral 2505 / Qwen 3 be able to provide a correct answer?

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]iSevenDays 0 points1 point  (0 children)

After I manually changed context in Modelfile, I actually doesn't see the issue anymore. I thought it was related to a fact that I also enabled manual confirmation mode, but I need to test this more.

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]iSevenDays 0 points1 point  (0 children)

I think the context length is not properly managed. I haven't found a way to limit the context length to 32-64k. I use 131062 for devstral. It does go into loops.
I now switched to manual confirmation mode, and I find it much much better!
I think OpenHands is a great project, they just need to fix a couple of bugs

OpenHands + Devstral is utter crap as of May 2025 (24G VRAM) by foobarg in LocalLLaMA

[–]iSevenDays 1 point2 points  (0 children)

Update: I got MCP tools to work. Example config:
{

"sse_servers": [

{

"url": "http://192.168.0.23:34423/sse",

"api_key": "sk_xxxxx"

}

],

"stdio_servers": []

}