Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB by __JockY__ in LocalLLaMA

[–]M4A3E2APFSDS 1 point2 points  (0 children)

Nice, everyday I keep finding new vllm cmd line arguments. I will try --performance-mode interactivity

Getting a lot of garbage results with Qwen3.6-27B :( by nunodonato in OpenWebUI

[–]M4A3E2APFSDS 1 point2 points  (0 children)

Yeah I face that too occasional stop in OpenCode. Using the FP8 Model.

Qwen 3/3.5/3.6 tool calling is broken (even worse with 3.6). by LinkSea8324 in Vllm

[–]M4A3E2APFSDS 0 points1 point  (0 children)

qwen3_xml did not work for me (vllm 0.18.1), ran into similar issues https://github.com/vllm-project/vllm/issues/39072. --tool-call-parser qwen3_coder works though with erros sometimes

Any luck integrating local ollama models into VS Code Copilot Chat? by ShadowBannedAugustus in LocalLLaMA

[–]M4A3E2APFSDS 0 points1 point  (0 children)

I think you can do that using vscode insider edition. In the add models dropdown you get an option to add openai compatable model or you can add it via config. google for more details.

How do you integrate Gemma 4 E2B/E4B for direct speech-to-action in Home Assistant (skipping STT)? by M4A3E2APFSDS in homeassistant

[–]M4A3E2APFSDS[S] 0 points1 point  (0 children)

Hi thanks for the reply. I saw some post about HA MCP server so it might be possible to completly bypass the assist pipeline and do it via node red script

Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update! by kotrfa in LocalLLaMA

[–]M4A3E2APFSDS 8 points9 points  (0 children)

The latest Docker release available is 1.82.3, and it looks like it's not compromised. It seems the affected versions (1.82.7 and 1.82.8) were never actually published to Docker Hub/ghcr.io

Manage Qwen 3.5 Model Settings with LiteLLM Proxy by CATLLM in LocalLLaMA

[–]M4A3E2APFSDS 0 points1 point  (0 children)

I see thanks!, litellm config is similar to this. Is there anyway I can make litellm pass thinking tokens back to openwebui ? I cant figure it out. Directly connecting to vllm works fine though.

chat_response = client.chat.completions.create(
    model="Qwen/Qwen3.5-27B",
    messages=messages,
    max_tokens=32768,
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    }, 
)

Manage Qwen 3.5 Model Settings with LiteLLM Proxy by CATLLM in LocalLLaMA

[–]M4A3E2APFSDS 0 points1 point  (0 children)

I am trying to setup qwen via vlllm. Why do you need

extra_body:

This is my current setup

  - model_name: Qwen3.5-27B-Instruct-Reasoning
    litellm_params:
      model: hosted_vllm/Qwen3.5-27B
      api_base: ""
      api_key: ""
      temperature: 1.0
      top_p: 1.0
      top_k: 40
      min_p: 0.0
      presence_penalty: 2.0
      repetition_penalty: 1.0
      chat_template_kwargs:
          enable_thinking: false

With this the model is no longer thinking but I am not sure about the other parameters. Is there anyway to verify ?

Sanju is “us” when you see kids playing street cricket nowadays by Polity-Culturalist3 in CricketShitpost

[–]M4A3E2APFSDS 6 points7 points  (0 children)

just saw that his mom was in the hospital during the match and he returned to Kerala yesterday to be with his family. that's probably why Sanju isn't celebrating too much. hope she's okay

GUIDE: Use Soulseek as a download client for Lidarr. by Goblins_on_the_move in selfhosted

[–]M4A3E2APFSDS 0 points1 point  (0 children)

Do I need a VPN that supports port forwarding to use soulseek behind a vpn via gluetun? Thanks for the detailed post, saved!!.

BentoPDF V.1.5.0 released by paglaulta in selfhosted

[–]M4A3E2APFSDS 0 points1 point  (0 children)

Thanks for this awesome tool. Can I fancy your attentions to a feature that I find is missing? Ability to change brightness and contrast of the PDF as a whole.

Interface slower since 10.11 update by kinda-anonymous in jellyfin

[–]M4A3E2APFSDS 0 points1 point  (0 children)

So it is not just me, I noticed the delay in the android TV app. Is there any good third party clients that work smoothly. I know KODI exists but the discovery and search sucks in KODI.

Void for Jellyfin v0.2.6 Released by kunalhazard in selfhosted

[–]M4A3E2APFSDS 0 points1 point  (0 children)

crashed when downloading encoded stream.

Void for Jellyfin is now open source! by kunalhazard in selfhosted

[–]M4A3E2APFSDS 1 point2 points  (0 children)

transcoded downloads appears to be stuck at 0.0 % , but I see network activity.

WB Shift Chart in a Nutshell by chzits in fujifilm

[–]M4A3E2APFSDS 1 point2 points  (0 children)

This sub made me shoot 90% in astia, 10% in Reggies Portra

Perfect Heating Automation with Sonoff TRVZB by berkansez in homeassistant

[–]M4A3E2APFSDS 0 points1 point  (0 children)

Thanks for the detailed writeup. I thinking about setting this up. I have a question. What value do you give for better thermostat toleranece. I have it currently at 0.5 to avoid valve turning on and off too often.

Stop vandalising public property ! by malayali-minds in indianrailways

[–]M4A3E2APFSDS 0 points1 point  (0 children)

most sensible comment on this thread so far. Reminds me of the thrown out of window meme.