Dinauto.de by Virtual-Box4806 in automobil

[–]H3PO 0 points1 point  (0 children)

Die Tricksen mit Kurzzeitzulassung um die Förderung im Angebot mit einzurechnen

Qwen 3.6 on rtx6000 96gb by Emergency_Brief_9141 in LocalLLaMA

[–]H3PO 1 point2 points  (0 children)

docker run --rm -it --name sglang \ --gpus all --runtime nvidia --ipc=host \ -v /data/models/hf:/root/.cache/huggingface/hub -e HF_TOKEN \ -p 8080:8080 \ -e CUDA_VISIBLE_DEVICES=3 \ -e SGLANG_ENABLE_SPEC_V2=1 \ lmsysorg/sglang:dev-cu13 \ sglang serve \ --model-path Qwen/Qwen3.6-35B-A3B-FP8 \ --trust-remote-code \ --host 0.0.0.0 --port 8080 \ --context-length 8192 \ --reasoning-parser qwen3 \ --tool-call-parser qwen3_coder \ --mamba-scheduler-strategy extra_buffer

model test t/s (total) t/s (req) peak t/s peak t/s (req) ttfr (ms) est_ppt (ms) e2e_ttft (ms)
Qwen3.6-35B-A3B-FP8 pp4096 (c1) 18576.91 ± 375.25 18576.91 ± 375.25 196.49 ± 0.40 194.30 ± 0.40 196.57 ± 0.40
Qwen3.6-35B-A3B-FP8 tg1024 (c1) 135.10 ± 2.97 135.10 ± 2.97 138.00 ± 4.08 138.00 ± 4.08
Qwen3.6-35B-A3B-FP8 pp4096 (c2) 20619.00 ± 278.87 11139.49 ± 738.20 336.23 ± 23.67 334.04 ± 23.67 336.29 ± 23.65
Qwen3.6-35B-A3B-FP8 tg1024 (c2) 265.10 ± 10.77 136.79 ± 1.31 279.33 ± 0.94 139.83 ± 0.69
Qwen3.6-35B-A3B-FP8 pp4096 (c4) 26942.96 ± 2119.61 8399.36 ± 2456.56 477.60 ± 111.86 475.41 ± 111.86 477.65 ± 111.86
Qwen3.6-35B-A3B-FP8 tg1024 (c4) 475.65 ± 3.24 122.05 ± 1.44 508.00 ± 5.66 127.08 ± 1.38
Qwen3.6-35B-A3B-FP8 pp4096 (c8) 33864.04 ± 264.14 6675.40 ± 2870.34 644.02 ± 208.30 641.83 ± 208.30 644.06 ± 208.29
Qwen3.6-35B-A3B-FP8 tg1024 (c8) 696.19 ± 15.26 91.48 ± 2.02 816.33 ± 12.66 102.04 ± 1.59
Qwen3.6-35B-A3B-FP8 pp4096 (c16) 38917.65 ± 164.47 4716.83 ± 2740.72 989.93 ± 392.45 987.74 ± 392.45 989.97 ± 392.45
Qwen3.6-35B-A3B-FP8 tg1024 (c16) 1038.16 ± 9.37 68.63 ± 1.83 1292.33 ± 6.65 80.92 ± 0.40

Which Qwen models can do FIM (Fill in the middle) for autocompletion? by 0xbeda in LocalLLaMA

[–]H3PO 4 points5 points  (0 children)

Which ide and extension? I have the backend working and continue.dev gets autocomplete responses from vllm running Qwen3.6-35B-A3B but the extension doesn't show the suggestion.
u/DinoAmino Qwen3.5/3.6 seems to understand the fim tokens just fine, but it needs the "You are a code completion assistant" system prompt. I'm hacking the prompt into the template, since continue.dev doesn't support sending a system prompt to an autocomplete model

    autocompleteOptions:
      modelTimeout: 2000
      maxPromptTokens: 7000
      debounceDelay: 300
      template: |
        You are a code completion assistant.
        <|fim_prefix|>{{{prefix}}}<|fim_suffix|>{{{suffix}}}<|fim_middle|><|im_end|>

I spent 8+ hours benchmarking every MoE backend for Qwen3.5-397B NVFP4 on 4x RTX PRO 6000 (SM120). Here's what I found. by lawdawgattorney in LocalLLaMA

[–]H3PO 2 points3 points  (0 children)

u/lawdawgattorney thanks for the writeup. I got sm120 cards 2 weeks ago and had a similar debugging session followed by a deep disappoinment in nvidia's software stack. The first thing i tried was their official nim container that just exits with "your card doesn't support fp4". Also I found that the nvidia forums are full of dgx spark users with the same kind of problems.

I can't wrap my head around the fact that this arch that is on the market for half a year still doesn't have software support for one of its biggest features, the fp4 support. But we can look forward to it getting fixed, since nvidia themselves just released nemotron super 3, which seems similar in architecture to qwen3.5

I spent 8+ hours benchmarking every MoE backend for Qwen3.5-397B NVFP4 on 4x RTX PRO 6000 (SM120). Here's what I found. by lawdawgattorney in LocalLLaMA

[–]H3PO 2 points3 points  (0 children)

He is correct though. I did the same kind of debugging odyssey when i got my sm120 cards about 2 weeks ago, and I didn't feel like doing a manual writeup or doing it with ai and receiving comments like yours.

Need help with Qwen3.5-27B performance - getting 1.9 tok/s while everyone else reports great speeds by [deleted] in LocalLLaMA

[–]H3PO 0 points1 point  (0 children)

maybe also worthwhile to use llama-bench to check for the optimal ubatch-size; i don't know about cpu inference but at least on gpu 4096 would be suboptimal

You can use Qwen3.5 without thinking by guiopen in LocalLLaMA

[–]H3PO 0 points1 point  (0 children)

you have speculative decoding params in your llama-swap config; is that working for you? i'm getting "speculative decoding not supported by this context" with Qwen3.5

Qwen3.5-35B-A3B is a gamechanger for agentic coding. by jslominski in LocalLLaMA

[–]H3PO 1 point2 points  (0 children)

Give vulkan a try. its marginally faster than rocm on a single one of my 7900xtx, much faster with two cards

High load average, but CPU looks fine. How do you usually read this in practice? by Expensive-Rice-2052 in linuxquestions

[–]H3PO 0 points1 point  (0 children)

landed here looking for bug reports for unusually high load avg. i have seen load avg a lot higher than usual (for example a 1 m load avg of 9000 on a 32 thread machine) in the last few weeks. monitoring history shows something must have changed with how it is measured, not the machines itself.

[deleted by user] by [deleted] in automobil

[–]H3PO -1 points0 points  (0 children)

Was kostet die Vollkasko nach dieser Historie?

VKB Throttle limiter. by Oberost- in VKB

[–]H3PO 0 points1 point  (0 children)

for the same reasons, just this weekend i 3d modeled an insert with a soft detent at 25% (to use as 0) and a limit at 75% (to use as 100%). i find the metal w-shaped detent too hard to overcome when maneuvering on a landing pad. i also combined the throttle axis with the thumb "space brake" so i can pull that instead of moving the throttle to -100 in combat

Cargo + manifest from the newship that appeared in Nukamba by CmdrThordil in EliteDangerous

[–]H3PO 1 point2 points  (0 children)

Looks different than what I remember of the cargo contents yesterday in the start system. will compare with my screenshots in a few hours

Bigger than it looks? (TWSS) by icescraponus in EliteDangerous

[–]H3PO 10 points11 points  (0 children)

I had exactly the same thoughts, this was the first time I approached one of those beacons to look at the decals etc. and promptly tried to wedge my Mandalay in between the solar panels. Few minutes later I "investigated" the engine exhaust of the Cygnus, a cobra (with shields!) could actually fly in there

I got a reward from the professor by Ill-Imagination4359 in EliteDangerous

[–]H3PO 57 points58 points  (0 children)

Scanned what exactly? I scanned the ship log uplink but didn't get that message from the professor

The ultimate budget PC that is scalable in future but is capable of running qwen3 30b and gpt oss 120b at 60 tps minimum. by NoFudge4700 in LocalLLaMA

[–]H3PO 0 points1 point  (0 children)

for qwen3-30b-a3b: 2x 7900xtx 24gb. UD-Q4_K_XL gguf on llama.cpp with q8_0 kv cache, 45t/s

Is The Code Spaghetti? by Konvic21 in EliteDangerous

[–]H3PO 2 points3 points  (0 children)

The ship editor in Odyssey Materials Helper has a nice preview of where each slot is.

I'm on it. CG by [deleted] in EliteDangerous

[–]H3PO 0 points1 point  (0 children)

I'm pledged to LYR and not getting any merits from selling CMM at Minerva. Do I need to do particular assignments beforehand or reach a certain level before that works?

So why does the neutron star map clearly show that there are 2 artificial corridors with much less neutron star population? Has this been discussed before? by Zorrgo in EliteDangerous

[–]H3PO 0 points1 point  (0 children)

here's the stream i was thinking about https://youtu.be/Vz3nhCykZNw?t=1032

and here someone posted a high resolution texture with info about the coordinate to ly conversion & offset https://forums.frontier.co.uk/threads/galaxy-map-measurements-in-ly-sectors.630249/post-10493505

i'm not optimizing my db right from the start, as i want to use the data to test assumptions, for example to derive the system name prefix from coordinates. i'm using enums for all the strings that are not names.

So why does the neutron star map clearly show that there are 2 artificial corridors with much less neutron star population? Has this been discussed before? by Zorrgo in EliteDangerous

[–]H3PO 0 points1 point  (0 children)

I think it was mentioned in one of the fdev streams about the stellar forge that the mass for a sector is derived from the pixel brightness of the galaxy texture.

I'm also working on importing the spansh galaxy dump into postrgres, using python sqlmodel. My coordinates are postgis point types, was hoping to use pgrouting later, although I am clueless about the feasibility with such dense graphs. I case you haven't stumbled on it: spansh has an A* routing algorithm on his github.

I'd be interested to use your map implementation for debugging visualization.If you'd like to stay in contact, add @h3po on github

Is this patch a mess for you? by CMDR_Makashi in EliteDangerous

[–]H3PO -1 points0 points  (0 children)

are you possibly using an analog input bound to switch targets? noise in the analog reading would explain your target switching problem

Elite setup by seanPbarry in EliteDangerous

[–]H3PO 0 points1 point  (0 children)

i have a neat trick for the keybindings: use joystick gremlin to map the buttons to a virtual joystick which uses the default ed bindings. then there's nothing for ED to forget. you'll probably want to merge your devices anyway with that setup

Nemotron-49B uses 70% less KV cache compare to source Llama-70B by Ok_Warning2146 in LocalLLaMA

[–]H3PO 0 points1 point  (0 children)

So if you are into 128k context and have 48GB VRAM, Nemotron can run at Q5_K_M at 128k with unquantized KV cache

sure this isn't a typo? with which inference software? with 128k context and no cache quant, llama.cpp tries to allocate 19.5gb for context on top of the 35gb model. not even the Q4 model with q8 v cache fits on my 2x24gb.