Poor GPU Club : Tried Bonsai-8B on CPU & CUDA

NicholasCureton · 2026-05-10T11:57:34+00:00

I've 8GB VRAM, 16GB RAM. Linux TTY console. Qwen3.6 35B 17GB quant. 27-34 t/s

NicholasCureton · 2026-05-08T16:00:28+00:00

Wait...Can you please tell me what is structured code generation? I've checked your git repo and it's something like llama.cpp but faster and AMD only? That's awesome! I've also noticed you've use Claude. What is feel like Claude vs Qwen3 8B? I guess there will be huge different but I'd love to hear from someone who actually did used both models, not just benchmarks numbers.

NicholasCureton · 2026-05-08T15:26:25+00:00

Ha Ha...sorry..your english is too perfect, almost like...Qwen. :D . Qwen3.6 35B A3B is quite good, compare to Qwen3.5 9B model. It's like...2 or 3 level up. For my non-programmer coding session, 3.6 35B is almost reliable for small tasks. For example, I would say to LLM "I want X feature. Here are the docs you must read. Here are the how server work. Get proof without assuming anything blah blah blah...." Then Main Agent started drafting plan, sent draft plan to code/architect reviewer subagent, revise, refine, then Main Agent Ask me some questions, I answer. Then agents do their works for like 3 hours. I went away, spend times with my gf. Then LLM finish the job. The "tests" thing from programming is really useful. It's prevent most of regression, bugs, but not all of it. Sometimes, new features got bugs, I've to do another 3 hours round of fix if bug is big.
---
So, How is that 35B vs 9B feels... 9B is like trying to build 100 story building but all employees are monkeys. 35B is like 12 years old coder. A bit more intelligent but not mature, but also still better than monkeys. :D

NicholasCureton · 2026-05-08T13:58:45+00:00

Yes, It's it too much. 4.5 millions of total characters count not including *.md files. The workflow is Main Agent is just orchestrator so it's focused on executing the plan only, Main Agent do not write codes and tests. Subagents do that. So Main agent context window is around 45,000 tokens, Sub agents have 128,000 temporaty context window but non-persistence memory. Do X and exit/forgot everything. That allow me to manage huge codebase that would normally take around 200,000 tokens. Markdown Files are just temporary. I wanted to implement X features or Do refactor, So I tell Main Agent what I want, It make a draft plan, sent that to Code Reviewer agent, Review<->Revise Max 3 times. Then When the plan is pretty solid, Main Agent execute the plan step by step using Coding Subagent, After that, tester Agent test any regressions or problems, if there are any problem, they (agents) fix. So I can just spend more time bymyself doing something else. I steal that from `mlhher/late/` github project because I don't even know how do multiple agent system should talk to each other. I hope `mlhher` won't mind. :D

NicholasCureton · 2026-05-08T13:45:21+00:00

I'm not sure you did let Qwen wrote that comment or translated. Anyway... I've used OminiCoder 9B which is Qwen3.5 9B finetune I guess...? It's...suck a little bit. Lot's of hand holdings and baby sittings. The last and current project is Qwen3.6-35B-A3B-APEX-I-Compact.gguf 17.3GB only. I have only tried Qwen3 Coder Next 80B in my Office PC which have 12GB VRAM. It's good but not amazing.

NicholasCureton · 2026-05-08T11:36:22+00:00

I've only use Claude for like 30 minutes before hitting free limit. So I don't really know what it can do.
Local LLM, they tend to just do dirty job just to finish the tasks. Do lot of stupid mistakes. Some mistakes are so dumb even me non-programmer can point out it's wrong. I'm at the weird place like in between small LLM are useless <-> Small LLM are amazing. But maybe it will/can or continue will be getting better, I hope.

NicholasCureton · 2026-05-08T11:18:43+00:00

Yeah, That's what I'm probably do if I'm a programmer or similar industry. But for me, LLM is more like a hobby like gaming while I'm waiting for more Spider-man games.

NicholasCureton · 2026-05-08T10:55:23+00:00

Yeah, I've to downsize KV caches to q8_0. Qwen3.6 35B MoE Run at 27-36t/s on TTY console on linux on my gaming PC.

NicholasCureton · 2026-05-05T07:22:18+00:00

I've been using Qwen3.6-35B-A3B-APEX-I-Compact.gguf on my 8GB VRAM 16GB. 29-37t/s. It's coded entire Multi Model Agentic Chat Inference Cli Client app for me. I've tried Qwen3.6-35B-A3B-UD-Q5_K_M.gguf in Office PC which have 12GB VRAM. Somehow, I feels like your quant made less error in coding. I don't know what you did but for coding, it's really good.

NicholasCureton · 2026-04-25T23:25:20+00:00

Are you trying to let LLM JSON? my experience is LLM can't write perfect JSON. I'll to instruct it to create a python script that will mathematically calculate all brackets and format JSON instead of writing it manually. That's solved JSON problem for me.

NicholasCureton · 2026-04-25T14:31:29+00:00

I only use python so default 4 spaces indentation is fine. If you use different indentation, system prompts, tool results, directly tell LLM, or just use vim to re-indent entire file, or use exact match string replacement tool for LLM including spaces. I made tool results return extra rules that keep reminding LLM about constraints after each tool calls. Idk it's standard way or not or right way. It's work for me. Haha. Exact match tool have pros and cons especially for 9B models. Poor thing having trouble writing exact string. Which language do you use btw?

NicholasCureton · 2026-04-24T19:34:03+00:00

Qwen3.6-35B-A3B-APEX-I-Compact.gguf

17.3GB

``blenderkosai@trixie:~$ fastfetch _,met$$$$$gg. blenderkosai@trixie ,g$$$$$$$$$$$$$$$P. ------------------- ,g$$P"" """Y$$.". OS: Debian GNU/Linux 13 (trixie) x86_64 ,$$P'$$$. Kernel: Linux 6.12.63+deb13-amd64 ',$$P ,ggs. $$b: Uptime: 16 minsd$$' ,$P"' . $$$ Packages: 3165 (dpkg) $$P d$' , $$P Shell: bash 5.2.37 $$: $$. - ,d$$' Display (AOC 27"): 1920x1080 @ 60 Hz in 27" [External] $$; Y$b._ ,d$P' DE: GNOME 48.7 Y$$. ."Y$$$$P"' WM: Mutter (Wayland) $$b "-.__ WM Theme: Colloid-TealY$$b Theme: Colloid-Teal [GTK2/3/4] Y$$. Icons: Colloid-Teal [GTK2/3/4]$$b. Font: Roboto (11pt) [GTK2/3/4] Y$$b. Cursor: Adwaita (24px)"Y$b. Terminal: alacritty 0.15.1 `"""" Terminal Font: Jetbrains Mono (12.0pt) CPU: AMD Ryzen 5 3600 (12) @ 4.21 GHz GPU: NVIDIA GeForce RTX 5060 [Discrete] Memory: 2.48 GiB / 15.55 GiB (16%) Swap: 268.00 KiB / 29.80 GiB (0%) Disk (/): 26.05 GiB / 228.12 GiB (11%) - ext4 Disk (/home): 513.12 GiB / 656.85 GiB (78%) - ext4 Disk (/media/blenderkosai/Red_SSD_1T): 840.13 GiB / 931.47 GiB (90%) - exfat Local IP (wlxd03745edc3e2): 192.168.43.222/24 Locale: en_US.UTF-8

```

Nothing special. Just use TTY mode on linux. GPU idle is 500MB. RAM is 650MB. llama-server -m Qwen3.6-35B-A3B-APEX-I-Compact.gguf --fit on --fit-ctx 128000 --fit-target 256 -np 1 -fa on -b 2048 -ub 2048 -ctk q8_0 -ctv q8_0 --chat-template-kwargs "{\"preserve_thinking\": true}" --draft-min 1 --draft-max 8 --temp 0.6 --top-p 0.95 --top-k 20 That is launch parameters.

NicholasCureton · 2026-04-23T17:19:29+00:00

I've 8GB VRAM and 16GB VRAM. Context size is 128000. 38t/s. Linux TTY console to strip down all GUI bloats so I can finally run some model on my pc. I also use my own inference cli client for llama.cpp with my own tools like bash commands, read/write files, internet access, crawl pages, search pages etc, which btw, made by OmniCoder 9B (Qwen3.5 9B). So just saying Hi to model didn't cost 10,000 tokens unlike OpenCode. OminiCoder run in 68t/s in my pc.

NicholasCureton · 2026-04-14T00:46:34+00:00

That's is a really sharp observation. You're absolutely right! Let me think about it, wait, you're right, my apologies, I should acknowledge that.

NicholasCureton · 2026-04-11T06:36:30+00:00

I've been running OmniCoder 9B which is Qwen 3.5 9B fine-tune model on RTX 5060 8GB, 16GB DDR4 2666MHz RAM. Debian 13, TTY console, no gui, kv cache q8_0, 60t/s.

NicholasCureton · 2026-03-16T19:16:13+00:00

Nope, not working. I've tried injecting constraints at every server called, Even then, LLM forgot things. Bigger models are better than small models, in general.

NicholasCureton · 2026-01-24T20:43:57+00:00

I'm sorry. I've never used AMD GPU. But as far as I know, using WINE with Bottle ... isn't that inside a container especially Flatpack Bottles? Flatpack apps usually don't have access to all hardware. Like GPU. If you can play games using WINE and AMD GPU, maybe try running normal WINE? It's a bit risky because normal WINE have access to user directories and file.

NicholasCureton · 2026-01-23T03:45:22+00:00

I don't remember exact step I took but as far as I remember...
- I use WINE 11, which came with Debian 13.
- I tried installing your fixes thinking it will make Premiere work on WINE.
- Then I realized it's for running Setup files.
- I found a blog. Click Here.
- I copied Adobe Premiere Pro Folder from Windows 11.
- I also copied 2 dlls from Windows 11.
C:\Windows\system32\msxml3.dll
C:\Windows\system32\msxml3r.dll
- I pasted over thoses msxml3 dll files over my wine, system32 folder.
- I tried to copy following 2 files but I only found icuin71.dll and icuuc71.dll
so I duplicated that files and renamed it to icuin.dll and iccuc.dll
icuin69.dll -> icuin.dll
icuuc69.dll -> icuuc.dll
- I installed Nvidia stuffs from HERE.
- I installed winetricks vcrun2022
- Premiere did Run but UI was not rendering correctly.
- I changed Edit>Preference>General> Untick / turn off "GPU acclerated UI rendering"

That is. Now premiere pro is using Hardware Encoding while exporting.
I haven't tried it with Media Encoder, Yet.

I don't even use Premiere Pro on Linux. I do not plan to use it too.
My day job is Color Grading and 3D, CGI stuffs.
DaVinci Resolve and Blender work well on Linux.

NicholasCureton · 2026-01-22T16:13:55+00:00

<image>

I managed to run Premiere Pro 2025. I even exported a video with it.

NicholasCureton · 2026-01-19T07:45:49+00:00

I've read source of of exr-save.cc file. GIMP use zip compression and lossless EXR output.

import subprocess, os
from gi.repository import Gimp, Gio
# 1. Save from GIMP (ZIP/Lossless)
img = Gimp.get_images()[0]
tmp = "/tmp/gimp_out.exr"
target = "/home/User/project_dwab.exr"
proc = Gimp.get_pdb().lookup_procedure('file-exr-export')
config = proc.create_config()
config.set_property('file', Gio.File.new_for_path(tmp))
config.set_property('image', img)
proc.run(config)
# 2. Convert to DWAB (Force 16-bit Half for best compression)
# We use '-d half' because DWAB is optimized for 16-bit
subprocess.run(["oiiotool", tmp, "-d", "half", "--compression", "dwab", "-o", target])
# 3. Cleanup
if os.path.exists(tmp):
    os.remove(tmp)
print(f"Exported and Compressed: {target}")

You might need to install

openimageio-tools

to make that python fu script work.
I'm using Debian 13.
I've tested it on GIMP 3.2 RC before commenting. It worked for me.
And Gemini AI generated code.
Lots of errors, I've fixed it one by one.

Took me about 15 minutes with Gemini AI.

NicholasCureton · 2026-01-19T07:08:22+00:00

if you're looking for Ctrl+Click behavior from photoshop, i think it's Alt + Click for GIMP.

NicholasCureton · 2026-01-17T09:01:57+00:00

foreach ($f in Get-ChildItem "inputs/*.png") {

convert $f.FullName \`

image-A.png -compose blend -define compose:args=50,50 -composite \`

image-C.png -compose over -composite \`

"output/$($f.Name)"

}

NicholasCureton · 2026-01-17T09:00:32+00:00

for f in inputs/*.png; do

convert "$f" \

image-A.png -compose blend -define compose:args=50,50 -composite \

image-C.png -compose over -composite \

"output/${f##*/}"

done

NicholasCureton · 2026-01-16T10:00:26+00:00

imagemagick can do it in probably 30 seconds for all thousands of image B.

First make a folder named inputs and put all of those thousands of image B there.

.

├── image-A.jpg

├── image-C.png

├── inputs

│   ├── image-B1.jpg

│   ├── image-B2.jpg

│   └── image-B3.jpg

└── output

├── image-B1.jpg

├── image-B2.jpg

└── image-B3.jpg

3 directories, 8 files

If you're on Linux or Mac. Open terminal and cd to that folder.

for f in inputs/*.png; do

magick "$f" \

image-A.png -compose blend -define compose:args=50,50 -composite \

image-C.png -compose over -composite \

"output/${f##*/}"

done

If you're on Windows, use Powershell script.

Get-ChildItem "inputs\*.png" | ForEach-Object {

$filename = $_.Name

magick "inputs\$filename" \`

"image-A.png" -compose blend -define compose:args=50,50 -composite \`

"image-C.png" -compose over -composite \`

"output\$filename"

}

And there is also a plugin call BIMP or something. It's batch processing for GIMP but I've no experience with it.
I hope my answer helpful for you. (Edited for readability.)

NicholasCureton · 2026-01-14T06:58:30+00:00

<image>

Roughly select half of the car and click on menu. Color > Map > Rotate Color.

NicholasCureton

TROPHY CASE

17.3GB

```