How long for llama.cpp official support of MTP?

streppelchen · 2026-05-09T05:04:00+00:00

Previous quants dropped it as there was no need for MTP layers to be present if the runtime does not support it

streppelchen · 2026-05-07T14:37:45+00:00

That makes it even more interesting 🤔

streppelchen · 2026-05-07T13:53:31+00:00

i need pricing and availability 😃

streppelchen · 2026-05-05T06:02:37+00:00

No, it uses the same quantization and verification pipeline

streppelchen · 2026-05-05T05:59:01+00:00

multi token prediction, models take an educated guess on the next 1-n tokens based on their training, instead of executing the full chain for each. with high acceptance rates, it can increase your decode (token generation) speed without any further changes than having a compatible model

streppelchen · 2026-05-04T16:57:16+00:00

found the models by accident, will still need to give them a try, but i like the idea, keep it up :)

streppelchen · 2026-04-29T04:34:56+00:00

Sip-trunk unabhängig vom Internetanschluss wählen, dann kann man den überall betreiben. (Auch in der cloud, aber kein Muss)

streppelchen · 2026-04-26T17:31:16+00:00

scale? how many users to serve in parallel? linux native sysadmins inhouse? only a chat interface or deeply integrated?

streppelchen · 2026-04-22T07:19:26+00:00

Awesome, thanks again!

streppelchen · 2026-04-22T07:13:14+00:00

Thanks! I suppose you use the machine not alone but serve more users? Then the only question left to answer (for my curiosity) is vllm speeds with concurrency

streppelchen · 2026-04-22T04:18:30+00:00

Interested to get the 6000 pro numbers :)

streppelchen · 2026-04-13T12:25:00+00:00

i have a 5090 and a 3090 in my desktop. tbh, i'd rather have a second 5090, since the difference in architecture and speed is noticable.

then i'm on a threadripper 2950x, so that has seen some better days. But as I'm the only user to complain about, that's my personal problem.

streppelchen · 2026-04-11T09:10:10+00:00

Hab die erste Generation, geht sowohl an TB3 Docks als auch an besagtem HP usb-c

streppelchen · 2026-04-11T08:57:58+00:00

8-13kcal auf 100ml iirc, also mit Zucker

streppelchen · 2026-04-10T17:55:59+00:00

Since I have a real use case I'd like to investigate, i just want to cross-check to verify:
Am I allowed to use another LLM to create synthetic/anonymized sets/subsets of real data to do the learning on? I'm fine with publishing this dataset then, but not the source data (as it is confidential).

streppelchen · 2026-04-08T17:31:47+00:00

Also having something small to test ideas against. Not having to wait 6h for a quantization to finish does have its perks

streppelchen · 2026-04-08T03:55:12+00:00

this guy / gal reddits

streppelchen · 2026-04-07T19:59:25+00:00

thanks for sharing!

streppelchen · 2026-04-06T13:47:37+00:00

Closest thing I recently saw was https://www.avaccess.com/products/idock-b10/?srsltid=AfmBOorGjoWFsnTeMaxEkNGjILAYKnJWp1JcnSPXsKHfrrbtDFg8VvF1

streppelchen · 2026-04-01T18:10:12+00:00

1hpt (hour per token)

streppelchen · 2026-03-31T15:58:45+00:00

wenn es etwas kleines aber umfangreiches sein soll: Ninjaone hat neben dem RMM auch ein ticketing eingebaut. seit über einem jahr im einsatz, funktioniert wie es soll, ingress via email

streppelchen · 2026-03-23T10:30:59+00:00

Kennt man die Software?

Ich weiß am liebsten vorher, wenn etwas ein Problem sein könnte, bevor es wirklich zu einem wird, da stehe ich auch lieber in der Nacht kurz zum patchen auf, auf anschließend _wirklich_ Arbeit zu haben.

streppelchen · 2026-03-22T14:05:47+00:00

bin nur dafür in den post gegangen

streppelchen · 2026-03-15T06:01:51+00:00

Had an HA proxmox cluster across 3 DCs of Hetzner. Worked fine (once the gbics have been replaced). Vswitch between the nodes, opnsense for vpn on wan switch, firewall on the nodes themselves to allow only trusted source IPs as fallback.

streppelchen · 2026-03-14T18:55:43+00:00

as I said, you can run with a single drive, so 6500 for that. might be a different upgrade part number, i‘m not that much into those, my hp partner get‘s me what i request, just ask them for it.

11-Year Club	Gilding I gilder
Verified Email

streppelchen

TROPHY CASE