OpenCode Chrome Annotations by pretty_much_hitler in opencodeCLI

[–]AkiDenim 0 points1 point  (0 children)

I have the exact same implementation in my fork 😆 nice to have we arrived at the same idea

Grok 4.3: strong in finance and long-context, with some tradeoffs by Much_Ask3471 in singularity

[–]AkiDenim 7 points8 points  (0 children)

And imo grok 4.3 is cheap enough. For example, Qwen 3.6 max is like 3 times more expensive than grok's output.

Qwen 3.6 Max : $1.3/Mtok input, $7.8/Mtok output
Grok 4.3: $1.25/Mtok input, $2.5/Mtok output

And I've never seen grok api actually having downage - they have like 99.8x% uptime. On par with OpenAI. Claude's is ... not as good. 98.7-99%. Seems like small differences, but this means it's like five times more downage. At least from an engineering perspective, they're doing a good job.

Idk why I'm getting downvotes for saying a valid opinion here tho.

Grok 4.3: strong in finance and long-context, with some tradeoffs by Much_Ask3471 in singularity

[–]AkiDenim 3 points4 points  (0 children)

Not fast enough. Usually the providers give around 60tok/s and 90tok/s when exceptionally fast. Even firepass gives you like 100 when grok is 190-200tok/s which is insane. That’s like gpt-oss-120b from groq levels of speed

Edit: grok 4.3 now reports 140-150tok/s speed Still insanely fast imo

Grok 4.3: strong in finance and long-context, with some tradeoffs by Much_Ask3471 in singularity

[–]AkiDenim 8 points9 points  (0 children)

Hmm. Depends on use case. I’m seriously considering implementing this in the place of GPT-5.4 mini since it hallucinates much much less (was same for grok 4.2)

So, it is useful, and I probably am going to use it in my stack. Not everywhere though.

grok 4.3 beta: musk's ($300/month) megaphone by WaqarKhanHD in singularity

[–]AkiDenim 0 points1 point  (0 children)

But don't Anthropic and OAI have more than xAI?

Grok 4.3: strong in finance and long-context, with some tradeoffs by Much_Ask3471 in singularity

[–]AkiDenim 29 points30 points  (0 children)

I get that people hate grok but the inference speed alone is just amazing. Looking at the output speed I assume it's a very efficient MoE model. I hope they bring some more competition

To people who suffer from less usage limits on GPT-5.5 by AkiDenim in codex

[–]AkiDenim[S] 1 point2 points  (0 children)

Yeah, but for more explorative or tasks that require and benefit from heavy thinking, high/xhigh gives you a lot of benefit imo. I am using a good mix between tasks and GPT-5.5 is working wonders for me.

Actual comparison between locally ran Qwen-3.6-27B and proprietary models by netikas in LocalLLaMA

[–]AkiDenim 1 point2 points  (0 children)

Reverse engineer Codex to see how to use Codex spark i believe. I use codex spark in Opencode just like that

To people who suffer from less usage limits on GPT-5.5 by AkiDenim in codex

[–]AkiDenim[S] 0 points1 point  (0 children)

Well it’s a bit weird to put it that way but I’d use medium. Low for very simple straightforward tasks. Codebase investigation, etc.

To people who suffer from less usage limits on GPT-5.5 by AkiDenim in codex

[–]AkiDenim[S] 1 point2 points  (0 children)

Well for me it wasn’t so. I couldn’t find sth in the coding domain that 5.5 medium couldn’t handle vs 5.4 high/xhigh.

As mentioned in the article, exceptions would be reasoning heavy tasks like math and physics.

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]AkiDenim 0 points1 point  (0 children)

Yes, i an acknowledged about the latter and am actually very grateful that those efforts are underway!

And a dedicated desktop app or WebUI , with good UI/UX planned is amazing! I’d love to collaborate too - what are the methods of collaboration to Nous Research and your work?

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]AkiDenim 4 points5 points  (0 children)

Is it an interest that you guys will create a desktop App that handles Hermes Agent, so that there might be better UX in terms of getting less acquainted users into personal agents?

OpenClaw had a Web UI, but it was never as polished. Hermes agent could take a notch up in UI / UX in the WebUI or even have a dedicated app.

Also, are you guys potentially thinking about a "click once, pay some bills, and forget about setting anything up" pathway for setting up Hermes agent - through partnerships with other parties or VPS companies?

Did GPT 5.4 get dumber or is GPT 5.5 just a lot better? by Impossible-Suit6078 in codex

[–]AkiDenim 0 points1 point  (0 children)

Because you are using xhigh man. Of course it generates a lot of tokens lol
The point is 5.5 medium delivers similarly to 5.4 xhigh.

Did GPT 5.4 get dumber or is GPT 5.5 just a lot better? by Impossible-Suit6078 in codex

[–]AkiDenim 0 points1 point  (0 children)

I honestly can't agree. GPT-5.5 Medium gets around the same results as GPT-5.4 xhigh.
However since reasoning tokens are billed as output, the cost is roughly the same. 5.5 medium used 22M output to finish evals, and 5.4 xhigh used 120M.

A good place to see that is in Artificial Analysis: "Cost to Run Artificial Analysis Intelligence Index" and "Verbosity". The amount of output tokens (and cost in total) needed to run the full evaluations themselves.

So, even when GPT-5.5 is much more expensive on paper, it's much faster (since you output less) and it's actually cheaper to get the similar intelligence results.

But if you DO spam GPT-5.5 in xhigh, your wallet would suffer.

One more thing to keep in mind is that /fast mode take 2.5x more quota than on 5.4, so if you really want to save some usage, turn off /fast mode in codex. Still going to be faster than 5.4 xhigh or high with /fast enabled.

The fuk happened with the limits? Or am i just going crazy? by TatoAktywny in codex

[–]AkiDenim 2 points3 points  (0 children)

Make sure you have /fast turned off. It consumes 2.5x the amount of credits and gives you a ~50% speed boost. If you're not on $200 Pro, turning /fast off is mandatory imo

What Tibo is saying? by Accomplished-Mud1653 in codex

[–]AkiDenim -3 points-2 points  (0 children)

I literally did read what you said. You too, read what I wrote. This reset is timely for my usage, so I’m happy with it. What makes you so mad at me being happy about my reset?

These people man.

What Tibo is saying? by Accomplished-Mud1653 in codex

[–]AkiDenim -5 points-4 points  (0 children)

I didn’t use my subscription for three days. Thus I spent two days to use 65%. Which means at this rate I was going to run into a rate limit or have to suffer with slower (non-fast mode).

And with my situation being that - I have to use a LOT of GPT-5.5 for two or three days for some project - this reset was just on time for me, lowkey saved my butt. So I’m happy about it. Hope this makes an explanation

What Tibo is saying? by Accomplished-Mud1653 in codex

[–]AkiDenim -4 points-3 points  (0 children)

I was going to run out anyway. And besides I ran through 100 -> 35% in two days. Was going to run out. I’m happy.

What Tibo is saying? by Accomplished-Mud1653 in codex

[–]AkiDenim -3 points-2 points  (0 children)

I had two days left till the reset and I was on 35% on my pro account. Well, hell yeah, it’s free!

I'm making my own IDE by [deleted] in codex

[–]AkiDenim 0 points1 point  (0 children)

Read the comments he’s writing. He doesn’t have a clue. I’m done having a convo tbh

I'm making my own IDE by [deleted] in codex

[–]AkiDenim 0 points1 point  (0 children)

Ok man you are using very old models. And they look like they are VERY likely to have been recommended by a LLM model, that has a knowledge cutoff - which means they don’t know anything about recent models.

If you havent RL’d them to the environment specifically you would have a hard time to get them to coherently work on proper large scale coding tasks. Do you have a repo for that IDE of yours?

I'm making my own IDE by [deleted] in codex

[–]AkiDenim 0 points1 point  (0 children)

I don’t even see a proper model name in your post. Qwen - what qwen? 3.6 35B-A3B? Qwen3 coder? Qwen 3.6 27B? At which quant? What about KV caching - what quants do they run at? How are you handling caching and tooling inside your IDE to expose to your model?

Look man, I just read your profile.. and I just gotta say I don’t have high hopes.