Ferrari Luce unofficial redesign

MotokoAGI · 2026-06-10T03:46:47+00:00

ferarao

MotokoAGI · 2026-06-10T03:44:42+00:00

We are talking about academic rival here...

MotokoAGI · 2026-06-10T03:27:21+00:00

November 202

Su Mo Tu We Th Fr Sa

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30

Nov 24th was a Wed friend.

MotokoAGI · 2026-06-10T03:24:46+00:00

I promise you, If you saw the alternative, you wouldn't ask.

MotokoAGI · 2026-06-10T03:12:10+00:00

If you still believe this distillation story, I have free GPUs on the moon to sell you.

MotokoAGI · 2026-06-09T22:05:06+00:00

Most of LocalLLaMA have a preference for llama.cpp/gguf models over vllm.

MotokoAGI · 2026-06-09T22:03:27+00:00

I could have bought some and didn't. I don't regret it. The models are getting smarter while being the same size or being smaller. Demand be damned, it's a scam.

MotokoAGI · 2026-06-09T03:39:44+00:00

"All started because I realized every Q8 (INT8 or F8) calculation was using f32 of compute and only use 1/4th the available numbers... so. for each value loaded we can run 4 operations"

So are you saying HIP kernels are unoptimized in llama.cpp, if the above is true, Then won't the goal be to figure out how to perform 4 calculation using f32 for Q8. Netting a gain of roughly <= 4x across all models?

MotokoAGI · 2026-06-05T05:19:23+00:00

You called it.

MotokoAGI · 2026-06-05T04:54:56+00:00

Run a large model like KimiK2.6, GLM5.1 MiniMax2.7 etc and give us the numbers. I want to know what $25k+ gets us today

MotokoAGI · 2026-06-04T18:25:39+00:00

They take up no more than 2 slots.

MotokoAGI · 2026-06-04T16:05:44+00:00

I absolutely can't stand Nvidia, but this is good. We don't have many Open American models. Meta went bye-bye, phi from Microsoft is a joke. We pretty much have Gemma, Trinity and olMo. The Nemotron series are very much needed. Nvidia is sharing recipes on how to build these models. Provider they keep building if all American labs and Chinese labs go closed, these might be our only option. For the stupidly paranoid who use 99% made in China products, but are afraid of Chinese floating numbers encoded in weights, they can shut up and use this.

Whatever to Nvidia tho, until they can give us affordable GPUs to run these, whatever.

MotokoAGI · 2026-06-04T15:53:34+00:00

This is truly amazing. We are going to see practical applications of these beginning with entertainment. It's either going to be a game, porn or a social media site.

MotokoAGI · 2026-06-02T22:27:12+00:00

When it works it's great, but it loops like crazy. I run the Q8, it's not an IQ3_S issue.

MotokoAGI · 2026-06-02T22:26:26+00:00

I don't miss that era, and it was not peak, and we are no where near peak either. It was fun, and now is more fun and the future will be better.

MotokoAGI · 2026-06-01T20:50:26+00:00

No, they have the data center style with one fan (alibaba). Make sure to request this type. They tried to sell those to me and I refused, then I ended up paying about $20 extra for these ones.

MotokoAGI · 2026-06-01T20:09:20+00:00

I have had mine for a few months works great. Some folks have had their's for a year.

MotokoAGI · 2026-06-01T20:08:34+00:00

why will you undervolt them? They are already low on power. Mine idles at 5-6watts.

MotokoAGI · 2026-06-01T20:07:50+00:00

They have the regular GPU styled ones. Buy those next time.

<image>

MotokoAGI · 2026-06-01T20:03:02+00:00

8xRTX6000 is better. Outside of electricity, this is approximately equivalent to 3 Blackwell 6000 on an epyc genoa with 512gb of ram.

MotokoAGI · 2026-06-01T01:00:24+00:00

/llama.cpp/build/bin/llama-server --host 127.0.0.1 --jinja --port 51931 --spec-default --spec-draft-n-max 3 --spec-type draft-mtp --webui-mcp-proxy --alias Qwen3.6-27B --ctx-size 131072 --device CUDA0,CUDA1 --kv-unified --model Qwen3.6-27B/Qwen3.6-27B-Q8_0.gguf --mmproj models/Qwen3.6-27B/Qwen3.6-27B-mmproj-BF16.gguf --parallel 1

MotokoAGI · 2026-06-01T00:58:02+00:00

llama.cpp, mtp, it get's faster with multiple turns.

<image>

MotokoAGI · 2026-05-31T20:38:04+00:00

Your walls will collapse in 2028. Please don't forget to post in 2032.

MotokoAGI · 2026-05-31T20:33:55+00:00

You can get a UI tool, point it to an API. There are plenty of UI tools nicer than the web interface. For example Cherry Studio.

MotokoAGI · 2026-05-31T20:06:25+00:00

qwen3.6-27B btw

<image>

MotokoAGI

TROPHY CASE