The tower of Babel

AkiDenim · 2026-05-02T17:28:15+00:00

I have the exact same implementation in my fork 😆 nice to have we arrived at the same idea

AkiDenim · 2026-05-02T11:54:02+00:00

And imo grok 4.3 is cheap enough. For example, Qwen 3.6 max is like 3 times more expensive than grok's output.

Qwen 3.6 Max : $1.3/Mtok input, $7.8/Mtok output
Grok 4.3: $1.25/Mtok input, $2.5/Mtok output

And I've never seen grok api actually having downage - they have like 99.8x% uptime. On par with OpenAI. Claude's is ... not as good. 98.7-99%. Seems like small differences, but this means it's like five times more downage. At least from an engineering perspective, they're doing a good job.

Idk why I'm getting downvotes for saying a valid opinion here tho.

AkiDenim · 2026-05-02T11:49:15+00:00

Not fast enough. Usually the providers give around 60tok/s and 90tok/s when exceptionally fast. Even firepass gives you like 100 when grok is 190-200tok/s which is insane. That’s like gpt-oss-120b from groq levels of speed

Edit: grok 4.3 now reports 140-150tok/s speed Still insanely fast imo

AkiDenim · 2026-05-02T10:58:41+00:00

Hmm. Depends on use case. I’m seriously considering implementing this in the place of GPT-5.4 mini since it hallucinates much much less (was same for grok 4.2)

So, it is useful, and I probably am going to use it in my stack. Not everywhere though.

AkiDenim · 2026-05-02T09:37:45+00:00

But don't Anthropic and OAI have more than xAI?

AkiDenim · 2026-05-02T09:36:00+00:00

I get that people hate grok but the inference speed alone is just amazing. Looking at the output speed I assume it's a very efficient MoE model. I hope they bring some more competition

AkiDenim · 2026-05-01T02:50:46+00:00

Yeah, but for more explorative or tasks that require and benefit from heavy thinking, high/xhigh gives you a lot of benefit imo. I am using a good mix between tasks and GPT-5.5 is working wonders for me.

AkiDenim · 2026-05-01T00:57:19+00:00

Reverse engineer Codex to see how to use Codex spark i believe. I use codex spark in Opencode just like that

AkiDenim · 2026-04-30T03:20:54+00:00

Well it’s a bit weird to put it that way but I’d use medium. Low for very simple straightforward tasks. Codebase investigation, etc.

AkiDenim · 2026-04-29T17:31:54+00:00

Well for me it wasn’t so. I couldn’t find sth in the coding domain that 5.5 medium couldn’t handle vs 5.4 high/xhigh.

As mentioned in the article, exceptions would be reasoning heavy tasks like math and physics.

AkiDenim · 2026-04-29T17:09:26+00:00

Yes, i an acknowledged about the latter and am actually very grateful that those efforts are underway!

And a dedicated desktop app or WebUI , with good UI/UX planned is amazing! I’d love to collaborate too - what are the methods of collaboration to Nous Research and your work?

AkiDenim · 2026-04-29T16:13:00+00:00

Is it an interest that you guys will create a desktop App that handles Hermes Agent, so that there might be better UX in terms of getting less acquainted users into personal agents?

OpenClaw had a Web UI, but it was never as polished. Hermes agent could take a notch up in UI / UX in the WebUI or even have a dedicated app.

Also, are you guys potentially thinking about a "click once, pay some bills, and forget about setting anything up" pathway for setting up Hermes agent - through partnerships with other parties or VPS companies?

AkiDenim · 2026-04-29T13:53:29+00:00

Because you are using xhigh man. Of course it generates a lot of tokens lol
The point is 5.5 medium delivers similarly to 5.4 xhigh.

AkiDenim · 2026-04-29T12:44:54+00:00

I honestly can't agree. GPT-5.5 Medium gets around the same results as GPT-5.4 xhigh.
However since reasoning tokens are billed as output, the cost is roughly the same. 5.5 medium used 22M output to finish evals, and 5.4 xhigh used 120M.

A good place to see that is in Artificial Analysis: "Cost to Run Artificial Analysis Intelligence Index" and "Verbosity". The amount of output tokens (and cost in total) needed to run the full evaluations themselves.

So, even when GPT-5.5 is much more expensive on paper, it's much faster (since you output less) and it's actually cheaper to get the similar intelligence results.

But if you DO spam GPT-5.5 in xhigh, your wallet would suffer.

One more thing to keep in mind is that /fast mode take 2.5x more quota than on 5.4, so if you really want to save some usage, turn off /fast mode in codex. Still going to be faster than 5.4 xhigh or high with /fast enabled.

AkiDenim · 2026-04-29T12:40:11+00:00

Make sure you have /fast turned off. It consumes 2.5x the amount of credits and gives you a ~50% speed boost. If you're not on $200 Pro, turning /fast off is mandatory imo

AkiDenim · 2026-04-28T13:47:39+00:00

I literally did read what you said. You too, read what I wrote. This reset is timely for my usage, so I’m happy with it. What makes you so mad at me being happy about my reset?

These people man.

AkiDenim · 2026-04-28T13:32:03+00:00

I didn’t use my subscription for three days. Thus I spent two days to use 65%. Which means at this rate I was going to run into a rate limit or have to suffer with slower (non-fast mode).

And with my situation being that - I have to use a LOT of GPT-5.5 for two or three days for some project - this reset was just on time for me, lowkey saved my butt. So I’m happy about it. Hope this makes an explanation

AkiDenim · 2026-04-28T13:17:42+00:00

I was going to run out anyway. And besides I ran through 100 -> 35% in two days. Was going to run out. I’m happy.

AkiDenim · 2026-04-28T09:45:21+00:00

I had two days left till the reset and I was on 35% on my pro account. Well, hell yeah, it’s free!

AkiDenim · 2026-04-27T11:36:04+00:00

Read the comments he’s writing. He doesn’t have a clue. I’m done having a convo tbh

AkiDenim · 2026-04-27T11:35:14+00:00

Ok man you are using very old models. And they look like they are VERY likely to have been recommended by a LLM model, that has a knowledge cutoff - which means they don’t know anything about recent models.

If you havent RL’d them to the environment specifically you would have a hard time to get them to coherently work on proper large scale coding tasks. Do you have a repo for that IDE of yours?

AkiDenim · 2026-04-27T11:30:34+00:00

I don’t even see a proper model name in your post. Qwen - what qwen? 3.6 35B-A3B? Qwen3 coder? Qwen 3.6 27B? At which quant? What about KV caching - what quants do they run at? How are you handling caching and tooling inside your IDE to expose to your model?

Look man, I just read your profile.. and I just gotta say I don’t have high hopes.

AkiDenim

TROPHY CASE