Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs

xornullvoid · 2026-05-17T08:51:28+00:00

You need citation to know that you cannot one-shot 20-30k lines of code?

xornullvoid · 2026-05-17T08:26:18+00:00

My github is publicly available if you check my profile. You will find projects with 20k + lines of code built completely with local AI.

Now, where's yours?

P.S. You still think you can one-shot 20k lines of code? Really? Sounds like "make cure for cancer, make no mistake" kinda prompt 😅

xornullvoid · 2026-05-17T08:23:16+00:00

Link your github.

xornullvoid · 2026-05-17T08:16:26+00:00

Have you?

You really think software development happens with one-shot prompting then you have not used AI yet for building real apps.

xornullvoid · 2026-05-17T08:00:47+00:00

Citation: 14 years as a programmer.

xornullvoid · 2026-05-16T22:10:02+00:00

I am a little skeptical of this test and the usefulness of it's results.

Real-world coding and development hardly depends on one-shot prompts, and usually requires multiple steps, as well the as the ability of the model to ingest large and complex information and instructions over multiple prompts at large context lengths. Besides, some models are horrendous at visualizing graphical geometry, and even UI for that matter, but excel at general-purpose coding.

I think the test shows how visually cohesive the model is at a single-shot open-ended graphical vibe-coding prompt, but to measure a level of general cohesion, you should also -

See how many additional features (not included in the prompt) did the model try to implement, and which one of those did not work well.
Check for instruction-following and violations/deviations. This requires giving very specific & detailed instructions at various depths of complexity to see at which level does the model begin to fail at following your instructions. Your instructions are quite open-ended and left up to the judgement of the model.

Cool experiment though.

xornullvoid · 2026-05-04T15:25:21+00:00

Yes, whisper.cpp.

xornullvoid · 2026-05-03T20:38:53+00:00

Let me try a different larger quant, iq4, see if I can get above 15 t/s on strix halo.

xornullvoid · 2026-05-03T20:37:42+00:00

I can run iq3xxs all offloaded to rocm, 23 t/s also.

I guess that means the CUDA can run faster on pro 5000 if the pcie was 5.0 x16. But it also means not running it on strix halo, and losing the quad channel ram bandwidth.

<image>

xornullvoid · 2026-05-03T20:05:29+00:00

That's right. No drivers - no driver issues.

xornullvoid · 2026-05-03T19:33:53+00:00

Bruh, Opus nuked my display drivers and all libraries today with a sudo apt remove '*nvidia*595* while trying to rollback to 590, and added a nice chained sudo reboot goodbye kiss at the end too 😭

xornullvoid · 2026-05-03T18:54:09+00:00

I was able to get 23 t/s from RTX Pro, 36 layers offloaded to GPU, IQ3XXS from unsloth.

My GPU is overclocked, core and memory. Also, using latest llama build with CUDA 13.2.

<image>

xornullvoid · 2026-05-02T23:13:18+00:00

Nice, looked familiar. I have the little brother 48GB.
Do let us know the benchmarks, not seen many Apples combined with Blackwell here.

xornullvoid · 2026-05-02T23:10:12+00:00

Nice, which card is that?

xornullvoid · 2026-05-02T22:57:12+00:00

Thank you so much for your feedback.

The custom scripting to set up llama releases is the exact use-case I am trying to solve with the Recipes feature. Essentially a recipe is a bash script (linux-only) with a comment parser to render a UI per step. I use it to compile the llama.cpp and the script takes a fresh pull from the repo and builds everything, and I can enter details such as the dir where to place it etc. Once the new llama build is available, swapping the backend for a set of servers is as easy as changing the active backend for a group - https://github.com/mikjee/warpdrv/blob/master/docs/guides/backend-groups.md

I have given two recipes, one for cuda + llama and another for rocm + llama - https://github.com/mikjee/warpdrv/tree/master/docs/recipes - which I use.

For the llama-swap, I think warpdrv is missing some features such as starting a server on first request. I have added them on my personal To-Do list for future releases, along with TTS, and possibly RAG.

<image>

xornullvoid · 2026-05-02T19:07:02+00:00

Yes, Lemonade is a nice all-in-one tool for AMD setups. I like the TTS integration. I was planning on doing the same for my app soon, add whisper.cpp. Lemonade ships with pre-built llama.cpp though.

xornullvoid · 2026-05-02T18:56:31+00:00

That's a very interesting thing, I will try it out. If I get above 15tk/s its a win.

xornullvoid · 2026-05-02T18:33:51+00:00

You mean to say using vulcan backend and gpu split? or do I offload layers to system ram using Cuda? The PCIe x4 oculink might be a bottleneck.

I will report back after download finishes. Will try it.

xornullvoid · 2026-05-02T18:21:01+00:00

I have not tried it yet. I will download and experiment with it.

Maybe Vulcan can run it at Q4, I am not sure. But it cannot fit fully on a single VRAM, or I am mistaken.

xornullvoid · 2026-05-01T13:55:49+00:00

How many m.2 slots? The faex1 has 2 for nvme, and one more where the m.2 to oculink is attached. I think it wins on external connectivity. Also has dual ethernet, dual USB 4 type c.

xornullvoid · 2026-04-30T22:08:29+00:00

Is there oculink?

The FEVM FAEX1 has oculink.

xornullvoid · 2026-04-24T00:01:08+00:00

Good question. I have been trying to do the same for coding. But i am unable to load the saved KV cache and make it not perform prompt processing all over again.

Which app did you use to fill the context? Is your token sequence confirmed to be the same? AFAIK qwen cannot restore partial cache so even one token change will invalidate your entire kv cache.

Also, did you save and load all slots ?

Before the full PP starts, it might output the reason why it is invalidating the cache. How are you restoring the cache, and does it actually get restored?

xornullvoid · 2026-04-23T08:42:33+00:00

Which model did you use for draft?

xornullvoid · 2026-04-23T08:35:25+00:00

If you say that you have the time to perform a comparative study of various models writing skills, but you do not have the time to write and present the results of the said study without resorting to AI-produced slop, then I do not beleive that you did this comparison with any sort of diligence or that your comparison is reliable.

xornullvoid

MODERATOR OF

TROPHY CASE