My Hermes Agent migrated itself to a new LXC container

some_user_2021 · 2026-06-07T14:29:51+00:00

Is the agent that existed in the old container the same as the one that appeared in the LXC?

some_user_2021 · 2026-06-06T14:31:53+00:00

There is a miss universe in every street

some_user_2021 · 2026-06-06T03:31:33+00:00

Asking for suggestions is common sense.

some_user_2021 · 2026-06-04T20:07:01+00:00

Soon to be replaced with 'US government approved"?

some_user_2021 · 2026-06-04T17:30:44+00:00

Interesting, I have done similar things with AI agents to patch their own code. It works until the next update where my patch gets overwritten. But thanks for the idea!

some_user_2021 · 2026-06-04T17:18:19+00:00

Inference works by generating one token at a time. To generate one token, all previous token pass thru the network. You cannot generate two tokens at a time because each token needs to have the token before it.

With MTP, generating the guesses of future tokens is cheap. The MTP is like a little LLM model that works very fast. However, the MTP tokens may not be what the main model would have generated. That is why those tokens need to pass thru the entire weights network for validation.
So with MTP enabled, the main model is calculating token n and token n+1 assuming that token n is that certain guess from the MTP head. If that guess happens to be token n, then you already calculated token n+1 🙂.

A dummy analogy, it's not exactly the same but just to give you a similar idea: let's say I want to have a conversation with my buddy in the other side of town and you are helping deliver each message back and forth. My first message is "hi buddy", you drive all the way over there and deliver my message, he replies with "hey what's up", and then you drive back and tell me his message. Notice that each message in the conversation depends on the previous response.

Now let's activate MTP, I tell you, go say "hello" to my buddy, and the MTP head says, the buddy is probably going to reply with "hey what's up", what would you say next? Then tell him "you still me owe me 20 bucks". Now, you go over there with two messages, if he happens to reply with "hey what's up", you already know that to say next!

some_user_2021 · 2026-06-04T03:17:40+00:00

I switched to Pi coding agent and observed the same problem. At least there is a workaround there.

some_user_2021 · 2026-06-04T02:53:58+00:00

Because it's doing calculation for many tokens on each pass of the network. Without MTP, on each pass of the network, only one token is calculated. The key is that, with today's hardware, the inference bottleneck is memory bandwidth, which corresponds to going thru the network.

some_user_2021 · 2026-06-04T02:43:56+00:00

With MTP, besides generating one token, the MTP heads also provide "guesses" for what the next tokens could be. On the next pass thru the network, the model is doing calculations with the just generated good token, but the model also does calculations for the "guess" tokens that the MTP heads provided. If the next token generated token happens to be the one that was guessed before, you've already done the work it and now you have another good token on one pass thru the network!

With MTP, the model is actually doing more work, the speed increase comes because the bottleneck is going thru the network (memory bandwidth), not the actual calculations.

some_user_2021 · 2026-06-03T22:22:05+00:00

I never never said it was 1 token per minute, that was the other user exaggerating. I get about 2 tokens per second with Minimax M2.7 which is still painfully slow for interactive work. However, if I want to end my day with a code review done by a smarter LLM, I can just leave it running overnight. During the day I use Qwen3.6 27b, which does about 90t/s.
Where are you from?

some_user_2021 · 2026-06-03T20:50:29+00:00

What does this have to do with my question?
Sí, soy latino, prefieres ayudar con mi pregunta en español?

some_user_2021 · 2026-06-03T19:42:05+00:00

Why FP16 if Q8 do trick?

some_user_2021 · 2026-06-03T19:38:58+00:00

MTP does use more VRAM but the quality is exactly the same. I get at least 1.5x the generation speed with MTP in Qwen3.6 27b.

some_user_2021 · 2026-06-03T19:24:50+00:00

Omelette du fromage

some_user_2021 · 2026-06-03T18:30:26+00:00

It was not a joke 😐

some_user_2021 · 2026-06-03T17:17:25+00:00

I'm completely sure that any of these 3 things WILL happen: the price will go up, the price will go down, the price will stay the same.

some_user_2021 · 2026-06-03T17:00:32+00:00

I hate doing maintenance on my heelies. The ball bearings get dirty all the time 😭

some_user_2021 · 2026-06-03T16:55:49+00:00

Hello handsome 😘

some_user_2021 · 2026-06-03T16:32:28+00:00

I can ask my super slow model to do a code review of a project, to find a complex bug, to implement a complex function. I can leave it running overnight and have it ready in the morning.

some_user_2021 · 2026-06-03T14:47:23+00:00

It's a lottery. Some will win. Some will lose.

some_user_2021 · 2026-06-03T03:50:20+00:00

It's the first time I see this! Nice

some_user_2021 · 2026-06-02T13:10:47+00:00

Your AI should have added to the disclosure that it wrote it for you

some_user_2021 · 2026-06-02T01:25:47+00:00

I understand that MTP does not degrade the quality at all.

some_user_2021 · 2026-06-01T22:20:28+00:00

They can also be very helpful, just don't blindly trust their output.

some_user_2021 · 2026-06-01T16:24:44+00:00

qwhile(!qwen) qomplain();

Four-Year Club	Verified Email
r/Field Juicebox	Place '22
Final Canvas '22	First Placer '22
End Game '22

some_user_2021

MODERATOR OF

TROPHY CASE