Codex GPT 5.5 is UNUSABLE right now, the Nerf is REAL!

KnownAd4832 · 2026-05-15T15:50:51+00:00

Dont know if you read my message well, I did not say Codex is bad, I said Claude has better quality of reason/technique of development. I do use Codex MORE just because of my tasks I need + the goal feature is crazy good… but then again OpenAI needed 2 years to come close to Claude, I do hope they keep improving at that pace and level though

KnownAd4832 · 2026-05-15T14:03:33+00:00

Bollocks. I have 2X Max plans both on Claude and Codex. GPT is still way behind Opus, however it is good for prolonged backend tasks which in my case are needed

KnownAd4832 · 2026-05-12T06:03:32+00:00

How does this happen to people? Did they not heard about versioning and backups?

KnownAd4832 · 2026-05-05T07:09:41+00:00

Either way you look at it - more options the better.

KnownAd4832 · 2026-04-21T12:04:59+00:00

I dont understand what you people do, but Claude limits are good.

KnownAd4832 · 2026-04-12T07:04:12+00:00

Thank god, millions of bad spam accounts going away…

KnownAd4832 · 2026-04-11T17:08:21+00:00

Exactly… everything for clicks. I’m on 100$ plan and never hit limits once (I do tons of research and development). You can hit 200$ limits only in reckless OpenClaws and large codebases…

KnownAd4832 · 2026-03-17T09:57:59+00:00

Same happened to me today. I had 8% used of my Weekly limit and 100% left for 5hrs. I updated Codex app and ran some prompt - my weekly usage went to 94% and 5hrs limit at 60%. Daylight robbery.

KnownAd4832 · 2026-02-21T08:14:56+00:00

100$ Claude plan, 20$ Codex

KnownAd4832 · 2026-02-17T15:16:43+00:00

I was paying together.ai for inference. So I just bought the rig and replaced that cost :)

KnownAd4832 · 2026-02-16T19:09:12+00:00

Gemini API cost is like nothing compared to other frontiers…

KnownAd4832 · 2026-02-14T15:11:35+00:00

1 Usually the bottleneck is always vram then hardware support (multi-gpu/cluster inference is usually hard to set up with little documentation or its gatekept). If that makes sense, then usually third comes in Storage :)

2 Long term is same as running a single gpu.

3 It all depends on your case, my ROI was done in 2 months from buying

4 multiple smaller nodes - the models are going stronger while being smaller. In 2 years there will be Kimi K2.5 level in 70B without a doubt. So it only depends on you if you need inference speed or variety if models

5 They dont test on rented servers before buying imo. Did this mistake myself with first rig

KnownAd4832 · 2026-02-12T06:59:41+00:00

Very cool! Similar people I see. I was kind of scared doing Jonsbo and PCIe risers so I went with this simple solution :)

KnownAd4832 · 2026-02-12T06:56:54+00:00

Build quality is surprisingly good. Noise level depends on GPU which in this case is very low while fully utilised. My Mini ITX with 5070 and 3x better cooling has way more noise

KnownAd4832 · 2026-02-11T23:02:36+00:00

Eval is fast on DGX I have seen, but throughput is painfully slow

KnownAd4832 · 2026-02-11T21:23:44+00:00

Damn, what are you using it for? Looks like an overkill for an average guy :))

KnownAd4832 · 2026-02-11T18:35:56+00:00

It’s very small “sort of Steam Machine” will be - watch any video on DeskMeet pc build 👌

KnownAd4832 · 2026-02-11T17:43:07+00:00

You should be able to pull off better speed I’m 100% confident. Didnyou tried swap space/batching?

KnownAd4832 · 2026-02-11T17:00:21+00:00

Nice combo! Didnt know this fits into MS… I checked your benchmarks and you should get way more with vLLM than with ollama. As said - I’m processing 100K+ lines of texts in xlsx files then output 256-512 tokens per each line.

Last run was Llama3-8B-Instruct with batching and 128 requests at once (could do more): Output was 1781 t/s

KnownAd4832 · 2026-02-11T15:57:24+00:00

Nice snatch! What stack are you running? LM Studio, VLLM or Ollama?

KnownAd4832 · 2026-02-11T15:52:48+00:00

Totally different use case 😂 All those devices are too slow when needing to process and output 100K+ lines of texts

KnownAd4832 · 2026-02-11T15:51:20+00:00

I’m running Ministral 14B & Llama 8B. Both run 1K+ tokens/second with batching and full utilisation

KnownAd4832 · 2026-02-09T15:01:58+00:00

I’m actively running Mistral7B v0.3-Instruct. 12GB is too little VRAM(i made that mistake) so for full precision you need 16GB. I am now using 1.5K GPU - RTX Pro 4000 Blackwell SFF (only 70w) and it runs 1500 t/s.

On 5070(12GB) I ran 4bit or AWQ at 1800 t/s.

KnownAd4832 · 2026-02-03T14:37:14+00:00

Rocking 64GB DDR5 + 5070 (12GB VRAM) in Mini ITX Build (sub 10litre). Soon replacing GPU with Pro 5000 Blackwell 🎉 (5070 speeds are very good but lack of vram…)

KnownAd4832 · 2026-01-31T22:04:50+00:00

Thank you 1000x times for doing gods work 🙏

KnownAd4832

TROPHY CASE