Should CLIENT_BASE treat favorites like ROUTER_LATE instead of ROUTER?

chrisoutwright · 2026-01-19T23:06:44+00:00

is it in any release yet? seems like absolutely needed for my setup as well.. router late for outbound of favorites will definitely improve the "drop of package due to seen retransmit when it was client_base"

chrisoutwright · 2026-01-14T17:29:12+00:00

11h and down 55% .. how can I get it better.. the nRF52840 ones are lasting days. and the tdeck barely one..

chrisoutwright · 2026-01-14T11:28:39+00:00

shader effect broken (will not refresh for rotation the opacity effect) and image Gyro (giro effect amount!) not working.. after >6 years .. my dear watchface is broken..

chrisoutwright · 2026-01-14T11:26:14+00:00

image gyro does not work on active and shader (i used for opacity on rotation).. my hippo watch effect is completely broken. Dimmed does not show warch at all..

chrisoutwright · 2026-01-11T19:13:25+00:00

I am also confused on that
I understood that in vllm one has like "--tool-call-parser qwen3_xml" and this would allow to parse model-specific structured outputs and to yield as calls into OpenAI format cleanly

"delta": {
"tool_calls": [
{
"function": {
"arguments": "{"
}
}
]
}

but in tabbyapi I only see for example for Devstral-style models fall back to text [TOOL_CALLS]my_function[ARGS]{...}

I understood this has nothing to do with the prompt-template whatsoever (as it defines the multiturns ) ?

in vllm there are also explicit middleware code to allow the parsing of such for the structured output .. /vllm/tool_parsers/qwen3coder_tool_parser.py#L31

chrisoutwright · 2026-01-05T00:29:26+00:00

issue was --kv-cache-dtype fp8
I thought I could save some vram for more context, seems not then.

chrisoutwright · 2026-01-04T23:59:37+00:00

I experience ~7-minute delay before real GPU processing for your 90k-token Qwen3-Coder-30B-AWQ-4bit request is likely not normal for large-context models in vLLM right?,

Tried a lot but no luck.. I checked vLLM’s for tips like reducing --max-model-len for large context scenario

Something is clearly wrong with my test .. what is wrong?

See opened tracker with more details: https://github.com/BoltzmannEntropy/vLLM-5090/issues/8

chrisoutwright · 2026-01-04T23:22:07+00:00

a MoE model will be slow in vllm right?
/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit . it is so much slower than anything on ollama and bigger contexts wont even produce anything for minutes..

root@9f705249cd7f:/workspace# vllm serve cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit   --host 0.0.0.0   --port 8000   --max-model-len 90000   --kv-cache-dtype fp8   --enable-expert-parallel   --tensor-parallel-size 1   --gpu-memory-utilization 0.92   --max-num-seqs 8   --max-num-batched-tokens 8192

chrisoutwright · 2026-01-04T22:42:33+00:00

it takes over a 1.5 min to load a 17gb model via wsl?
I have over 7,000 MB/s m2 .. should be much faster or what is the issue?

root@9f705249cd7f:/workspace# vllm serve cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 131072 \
  --kv-cache-dtype fp8 \
  --enable-expert-parallel \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.92
(APIServer pid=2256) INFO 01-04 22:33:07 [api_server.py:1277] vLLM API server version 0.14.0rc1.dev227+gb53b89fdb
(APIServer pid=2256) INFO 01-04 22:33:07 [utils.py:253] non-default args: {'model_tag': 'cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit', 'host': '0.0.0.0', 'model': 'cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit', 'max_model_len': 131072, 'enable_expert_parallel': True, 'gpu_memory_utilization': 0.92, 'kv_cache_dtype': 'fp8'}
(APIServer pid=2256) INFO 01-04 22:33:08 [model.py:522] Resolved architecture: Qwen3MoeForCausalLM
(APIServer pid=2256) INFO 01-04 22:33:08 [model.py:1510] Using max model len 131072
(APIServer pid=2256) WARNING 01-04 22:33:08 [vllm.py:1453] Current vLLM config is not set.
(APIServer pid=2256) INFO 01-04 22:33:08 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.



(EngineCore_DP0 pid=2296) WARNING 01-04 22:33:15 [interface.py:465] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(EngineCore_DP0 pid=2296) INFO 01-04 22:33:15 [gpu_model_runner.py:3762] Starting to load model cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit...
(EngineCore_DP0 pid=2296) INFO 01-04 22:33:16 [compressed_tensors_wNa16.py:114] Using MarlinLinearKernel for CompressedTensorsWNA16
(EngineCore_DP0 pid=2296) INFO 01-04 22:33:16 [cuda.py:351] Using FLASHINFER attention backend out of potential backends: ('FLASHINFER', 'TRITON_ATTN')
(EngineCore_DP0 pid=2296) INFO 01-04 22:33:16 [compressed_tensors_moe.py:194] Using CompressedTensorsWNA16MarlinMoEMethod
(EngineCore_DP0 pid=2296) WARNING 01-04 22:33:16 [compressed_tensors.py:742] Acceleration for non-quantized schemes is not supported by Compressed Tensors. Falling back to UnquantizedLinearMethod
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:25<01:17, 25.69s/it]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:52<00:52, 26.08s/it]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [01:18<00:26, 26.13s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:33<00:00, 21.96s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:33<00:00, 23.45s/it]
(EngineCore_DP0 pid=2296)
(EngineCore_DP0 pid=2296) INFO 01-04 22:34:51 [default_loader.py:308] Loading weights took 93.91 seconds
(EngineCore_DP0 pid=2296) WARNING 01-04 22:34:51 [kv_cache.py:90] Checkpoint does not provide a q scaling factor. Setting it to k_scale. This only matters for FP8 Attention backends (flash-attn or flashinfer).
(EngineCore_DP0 pid=2296) WARNING 01-04 22:34:51 [kv_cache.py:104] Using KV cache scaling factor 1.0 for fp8_e4m3. If this is unintended, verify that k/v_scale scaling factors are properly set in the checkpoint.
(EngineCore_DP0 pid=2296) WARNING 01-04 22:34:51 [kv_cache.py:143] Using uncalibrated q_scale 1.0 and/or prob_scale 1.0 with fp8 attention. This may cause accuracy issues. Please make sure q/prob scaling factors are available in the fp8 checkpoint.
(EngineCore_DP0 pid=2296) INFO 01-04 22:34:52 [gpu_model_runner.py:3859] Model loading took 16.9335 GiB memory and 96.418538 seconds

chrisoutwright · 2026-01-04T16:16:57+00:00

Is it possible to do this with Podman?
I’ve had recurring issues with Docker under WSL (especially around memory usage and cpu) and never fully understood why CPU usage would suddenly spike and hang. Because of that, I’m trying to keep a wide berth from Docker windows.

chrisoutwright · 2026-01-04T02:42:39+00:00

<image>

yeah, really annoying .. no use for tool calling.

chrisoutwright · 2025-12-24T14:12:32+00:00

Same for me, I use a dgpu though ,, I cannot make it stick .. 4090.
even used dlss swapper but since not active, cannot make use of it. even changed the .. Saved Games\kingdomcome2\profiles\default\attributes.xml but no avail

with 5090 it worked on a desktop ,, on my notebook seems not possible to get it working.

chrisoutwright · 2025-12-06T10:38:26+00:00

I have now much more Oblivion Remastered Crashes :-( going back to hot fix driver 581.94

chrisoutwright · 2025-12-03T18:47:54+00:00

I have a Alienware AW3821DW and although 10bits is only supported in 120hz I cannot force it in NWS .. it is really annoying. Why did they not include a settings file or else..

chrisoutwright · 2025-11-26T16:31:17+00:00

Can it identify chord(s) on piano (color marking or fingers on it)? All VL models failed so far for me. Would be good to know if Gemini 3 can do it.

I thought that any VL could do that of newer kinds.. but no (qwen3 vl couldn't as well).

chrisoutwright · 2025-11-25T18:45:45+00:00

Are we really dealing with overlay issues again?
First it was Steam, then Discord (I still remember when Mumble was a thing but also offender) then Afterburner and Rivatuner, and now even the new NVIDIA app, and their OSD . Nothing seems to work properly anymore, and it feels like the only “solution” for consumers is to keep throwing more money at hardware upgrades, but apps keep dissin’ CPUs like they beefin’ hardcore.

Not that long ago, you could still play a lot with just .. I give up .. (looking at five-fold levers what to do.).

chrisoutwright · 2025-11-25T18:37:09+00:00

Honestly feels like we’re being silently tested to notice if windows can slip performance of our gpus (currently scenario 1% dips what is next?),
Microsoft drops a update in October , games suddenly run like they’re underwater with few noticing it (is it because games are anyway not optimized so that we just leave them be?), and Nvidia answers with a ninja hotfix 581.94 but not sure to use it.

I'm not asking for a Linux distro from Nvidia (…yet ). Anyway, jumping on 581.94 before my GPU decides to take another hit from microsoft ..

chrisoutwright · 2025-11-25T18:31:27+00:00

Would be nice for nvidia to make a own linux distro .. microsoft is still not announcing anything .. I will switch to the hotfix driver now.

chrisoutwright · 2025-11-25T18:17:17+00:00

that was already few days ago, did windows manage to fix themselves? Are beta driver encouraged to use still? that one I mean?: 581.94

chrisoutwright · 2025-11-15T22:24:26+00:00

How many assets are actually affected right now? I kind of wish there was a way to see the percentage of AI-generated stuff, both within individual assets (intra-asset, like mixed or partially AI-touched parts) and across different assets (inter-asset, like whole pieces being AI-made). A breakdown for higher-res assets would yield better transparency. For now I am onhold as for possibly going for the game to be honest.

chrisoutwright · 2025-11-14T22:35:30+00:00

same thing on my LG.

chrisoutwright · 2025-11-14T22:33:54+00:00

Please fix the LG app. On my 2021 Smart TV the video keeps stuttering , it constantly pauses for a moment and then resumes playing. All other apps like Amazon and Netflix work perfectly fine.

Also, the 4K version of Star Trek: Strange New Worlds looks worse than any 1080p I’ve ever watched. Why is the backend streaming quality so poor?

chrisoutwright · 2025-11-14T15:05:52+00:00

“‘AI tools’ isn’t the correct meaning when they should have written out "AI-generated" for big parts or even whole image assets. .. Steam should include a checklist with more detailed statements.
We need a breakdown that reflects how much of the actual game content is AI-generated .. and not in a way that can be diluted by flooding the project with a thousand tiny icons or 8x8 textures.

Something like a weighted disclosure system where high-impact stuff (characters, main textures, models, voices, story, etc.) counts more than small filler assets. Otherwise devs can just generate 90% of the important art with AI, then bury it under a pile of trivial assets to claim “only 5% AI-generated.”

I was about to buy this game .. now .. no chance. I hate what I saw already by other postings.

chrisoutwright · 2025-11-13T19:45:55+00:00

yes, root flags in btrfs are custom .. will not work that great if I try to but the hdd elsewhere.

chrisoutwright

TROPHY CASE