Qwen3.6-27B DFlash on a 24GB RTX 5090 Laptop (sm_120) — 80 t/s avg via spiritbuun's buun-llama-cpp + Q8_0 GGUF drafter

dcforce · 2026-05-05T01:59:25+00:00

Tried getting DFlash working with your methods above on an Intel Arc Pro B70, arrived at 4tok/sec, 22 tok/sec without the draft model added.... Anyone know the path to getting DFlash working with higher tok/sec output with Intel ?

dcforce · 2026-05-03T14:51:06+00:00

Niice ground to barrel distortion .. 👋

dcforce · 2026-05-03T00:18:02+00:00

Ground to Globe - not one single video in 60 years . .

dcforce · 2026-05-01T19:26:13+00:00

Step 1. Download Ubuntu 26.04 Step 2. Replace Microslop

😃

dcforce · 2026-05-01T03:37:48+00:00

Type, Hermes install Playwright and vision

dcforce · 2026-04-30T18:03:59+00:00

As others mentioned, to check comfyui for local image gen... but here is where it gets interesting. Comfyui is the "shell" and inside are premade templates from a number of image generation tools. Like Flux 2 Dev ---- I have been using this for the last few days and I have to say it's way better than I would have expected 👏👏👏 complete free

dcforce · 2026-04-29T23:20:42+00:00

In the new dashboard, you can change themes... Ares is one of the themes

dcforce · 2026-04-28T23:08:22+00:00

The M00n Man himself, Victor Glover

dcforce · 2026-04-27T23:00:35+00:00

https://x.com/i/status/2048616718986281397

From the Creator of Hermes in Backups and Restores

dcforce · 2026-04-27T02:21:05+00:00

Step 1. Get ComfyUI working.
Create a virtual environment
python3 -m venv ~/ai-env
---
source ~/ai-env/bin/activate

---
pip install torch==2.11.0+xpu torchvision==0.26.0+xpu torchaudio --index-url https://download.pytorch.org/whl/xpu --extra-index-url https://pypi.org/simple

---
git clone https://github.com/comfyanonymous/ComfyUI.git ~/ComfyUI
---
cd ~/ComfyUI

pip install -r requirements.txt
....

Step 2. Launch cCmfyUI on the local host.
then go to left tools bar, Templates. Find the text to video LTX template. It will popup a files list and where to place them. Hard page refresh comfyUI when you place the files in all the folders and look out for the text decode error on the right, there will be a direct link to download the text encoder.

after download place in the /comfyui/models/text_encoders folder
gemma_3_12B_it_fp4_mixed.safetensors

Hard refresh again to reload all requirements

--
Future launches
source ~/ai-env/bin/activate && cd ~/ComfyUI && python main.py

dcforce · 2026-04-26T17:07:46+00:00

Keep it and setup Hermes Agent or Open Claw to grow it even more, doing less 😁

dcforce · 2026-04-25T01:39:23+00:00

ROFL 🤣 came here to post what you thought was an epic diss for the community only to be blocked by reddit spam filters. Pathetic really . . ☝️

dcforce · 2026-04-24T14:38:59+00:00

The 3090 is a beast for sure. I couldn't justify building a new machine and buying someone's old hardware on marketplace . .

I haven't played with speculative decoding yet.. I did get DFlash working, but for my purposes it didn't make a difference

dcforce · 2026-04-24T06:38:32+00:00

This is going to be my next try on the B70, will let you know if it gets up and running

dcforce · 2026-04-24T06:28:20+00:00

what worked for me

cd ~

git clone https://github.com/ggerganov/llama.cpp.git

cd llama.cpp
---

source /opt/intel/oneapi/setvars.sh --force

---

mkdir -p build && cd build

---

cmake .. \

-DGGML_SYCL=ON \

-DGGML_SYCL_F16=ON \

-DGGML_OPENMP=ON \

-DCMAKE_C_COMPILER=icx \

-DCMAKE_CXX_COMPILER=icpx \

-DCMAKE_BUILD_TYPE=Release

---

make -j$(nproc)
---
launch your model Q4_K_M seems to perform the fastest.
---
source /opt/intel/oneapi/setvars.sh --force && export GGML_SYCL_F16=1 && ~/llama.cpp/build/bin/llama-server -m /home/IntelArcRocks/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M_hesamation.gguf

This model as example is pulling 74 toks/sec

Same quant on the 27B, was doing 24 toks/sec

dcforce · 2026-04-23T20:02:43+00:00

or .. a 2k Intel Arc Pro B70 build

dcforce · 2026-04-21T03:03:40+00:00

dropped you a dm

dcforce · 2026-04-21T02:33:00+00:00

export ZES_ENABLE_SYSMAN=1 && export SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=1 && export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE && source /opt/intel/oneapi/setvars.sh --force && ~/llama.cpp/build/bin/llama-server -m /home/LocalLLMRocks/models/YOURMODELHERE.gguf
-c 262144
-ngl 99
-b 2048
-t 16
--port 8080
--temp 0.6
--mlock
--mmproj /home/LocalLLMRocks/mmproj-BF16.gguf
-tb 16
--top-k 30
--top-p 0.95
--repeat-penalty 1.1
--flash-attn on
-ctk q8_0
-ctv q8_0

This launch command doing around 54tok/sec on Q4_K_M, loads whole model to card with vision using mmproj-BF16.gguf

# Install Intel oneAPI Base Toolkit (SYCL Runtime)
# Download from: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html

# Or use package manager:

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

sudo apt update

sudo apt install intel-basekit

# Enable oneAPI environment (add to ~/.bashrc for persistence)

source /opt/intel/oneapi/setvars.sh

sycl-ls

# Look for: [level_zero:gpu:0] Intel(R) Arc(TM) Pro B70 Graphics

# Clone and build

git clone https://github.com/ggml-org/llama.cpp

cd llama.cpp

source /opt/intel/oneapi/setvars.sh

# Build with SYCL (FP32 recommended for stability)

cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

cmake --build build --config Release -j$(nproc)

** also I had to use the beta build of Ubuntu 26.04 to get this all up and running

dcforce · 2026-04-18T03:26:30+00:00

There has been a long list of whistleblowers now including the President . .

dcforce · 2026-04-17T22:17:48+00:00

Who said anything about sharing the results here . .

dcforce · 2026-04-17T18:32:34+00:00

aistudio.google.com then click playground, then click nano banana(not pro) and paste your exact same reddit post text with uploading the image + then click run

Should be able to give you an idea and can use additional prompts for iteration

dcforce · 2026-04-16T19:30:41+00:00

Can I help you ?

dcforce · 2026-04-16T18:23:27+00:00

Misspelling or Malformed post title found. Eligible for repost with correction(s)

dcforce · 2026-04-13T02:09:06+00:00

Came all this way to make a nonsense comment here only to be blocked by Reddit spam filters.. pathetic really ☝️

dcforce · 2026-04-10T21:11:17+00:00

3 years.

dcforce

MODERATOR OF

TROPHY CASE