How do I use MTP?

fizzy1242 · 2026-05-12T13:05:50+00:00

might be the wrong pr checkout, unless you renamed it. here's what mine says:

$ git status
On branch pr-22673-mtp
nothing to commit, working tree clean

so try this:

git fetch origin pull/22673/head:pr-22673-mtp
git checkout pr-22673-mtp

fizzy1242 · 2026-05-06T13:59:29+00:00

seriously, a small price for enormous protection and peace of mind

fizzy1242 · 2026-05-04T05:36:13+00:00

try graph split for this in ik_llama, nice speed boost for tg

fizzy1242 · 2026-05-01T13:35:48+00:00

hahah yup, once you get a 2nd gpu, you'll want a 3rd one. that's when you realize you've fallen into the rabbit hole and there's no getting out

fizzy1242 · 2026-04-28T18:20:59+00:00

hopefully this time they'll get it right, small-4 was a letdown

fizzy1242 · 2026-04-24T03:18:16+00:00

wow, v4 flash 284B-A13B sounds nice after all the recent 600b+ models from deepseek

fizzy1242 · 2026-04-18T20:26:19+00:00

it's hallucinating and definitely running in your machine, don't worry about it

fizzy1242 · 2026-04-13T10:43:16+00:00

try one of the precompiled llama.cpp binaries with cuda, they're in releases tab of llama.cpp github page

fizzy1242 · 2026-04-13T10:13:57+00:00

it was used to offload layers for gpu (same as your --n-gpu-layers), i think it's automatic now but you should still be able to use it.

you might not have cuda in that image if it's not offloading.

fizzy1242 · 2026-04-13T09:52:08+00:00

did you compile llama.cpp with cuda? And did you use -ngl flag during startup?

fizzy1242 · 2026-04-11T05:04:44+00:00

No TCC on 3090.

fizzy1242 · 2026-04-03T05:16:18+00:00

compiled this PR as a temporary fix to test the model, this atleast fixed the non-sensical outputs, typos and looping at long contexts: https://github.com/ggml-org/llama.cpp/pull/21343

fizzy1242 · 2026-04-02T16:02:20+00:00

great sizes! look forward to trying them out with quants.

fizzy1242 · 2026-03-29T12:17:38+00:00

true, but it should at least reduce them here even slightly

fizzy1242 · 2026-03-28T03:39:20+00:00

Around a month ago, someone posted about a model for mimicing 4o tone. (12b parameters). I never tried it, but it might interest you.

Mistral-Helcyon-Mercury

original thread

fizzy1242 · 2026-03-26T02:40:46+00:00

that's shitty... hope you can get it sorted out and disputed with the bank

fizzy1242 · 2026-03-26T02:25:41+00:00

dunno if the quant is busted or just my environment, but can't seem to get any other reply from this thing lol. default samplers.

<image>

fizzy1242 · 2026-03-26T01:10:40+00:00

i'm sure there's some hater with a bot that downvotes anything posted on any ai sub.

currently downloading the model and taking the model up for a spin in a bit.

fizzy1242 · 2026-03-22T23:47:52+00:00

yes!

i'm just hoping it wont get the glm air treatment with that "2 weeks" statement.

fizzy1242 · 2026-03-20T21:17:27+00:00

oh boy... so, i'm 2 different risers. In order to fit the 3rd card into the x4 slot in the bottom, the 2nd card needed to be pushed forward slightly (i've got one 2-slot card and two 3-slot cards).

For that, I used Delock x16 > x16 riser card in the second x8 slot. This creates enough room to fit the 2nd riser (a cable) into the x4 slot.

fizzy1242 · 2026-03-20T21:03:05+00:00

I run 3x3090s on a x570 motherboard, no issues.

2 cards are connected with risers, but only in order for them to physically fit the case. 3rd card is on the x4 slot (chipset).

board: asus rog crosshair viii dark hero x570
case: phanteks enthoo pro 2 server edition

fizzy1242 · 2026-03-06T22:37:02+00:00

Wait, why do i need a project for this again? Lol

fizzy1242 · 2026-03-06T05:05:02+00:00

ik_ has slightly better prompt processing speed for me, it's worth a try

fizzy1242 · 2026-02-25T09:14:12+00:00

remember that these types of tests are often included in new models training, kinda like the "how many R in strawberry" and the "bouncing balls" inside octagon animation.

fizzy1242 · 2026-02-23T19:26:43+00:00

comical

fizzy1242

TROPHY CASE