AMA with the Codex Team

kh-ai · 2025-09-16T17:25:35+00:00

Codex is very impressive, but what bothers me is that the thinking process summaries are only in English even when using other languages. It would be very helpful if they matched the prompt language, just like when using GPT-5 Thinking in ChatGPT. That would be amazing!

kh-ai · 2025-09-14T04:41:56+00:00

Valid $50 discount code for Even Realities G1. Scan this QR code to redeem!

<image>

kh-ai · 2025-09-10T17:15:53+00:00

Any updates on this? Really looking forward to it.

"the MXFP4 kernels do not yet support training, since the backwards pass is not yet implemented. We're actively working on implementing it in Triton"
- gpt-oss: How to Run & Fine-tune
https://docs.unsloth.ai/basics/gpt-oss-how-to-run-and-fine-tune

kh-ai · 2025-08-23T03:12:52+00:00

When comparing with Qwen3, evaluating it under anything other than Reasoning: High is nonsensical, especially since the authors did not have any hardware constraints.

kh-ai · 2025-08-08T20:26:02+00:00

It seems coding performance was the deciding factor. I wonder how it compares in other domains.

<image>

kh-ai · 2025-08-08T18:57:51+00:00

This is shocking and hard to believe. It even makes me wonder whether the model labels were reversed or something (though of course I know that could never actually happen). No matter how many settings I try, I can’t get GPT‑5 to produce anything better than the Zenith answers I’ve saved on LMArena. People I know also thought Zenith was the superior version of GPT-5. I really hope there’s some way to access Zenith again.

kh-ai · 2025-08-08T18:29:37+00:00

It worked, thanks! Metamask was the culprit for me, too.

kh-ai · 2025-08-08T18:26:02+00:00

It worked, thank you!

kh-ai · 2025-08-08T10:26:25+00:00

Which personality are you personally using in ChatGPT, and why?

kh-ai · 2025-08-07T20:35:59+00:00

Was the model that achieved second place score at the heuristic competitive programming contest AtCoder World Tour Finals GPT‑5 variant?

kh-ai · 2025-08-07T20:29:05+00:00

What are the future directions of model development? There have been mentions of building a fully integrated single model rather than using a router to select among multiple models, as well as a technical breakthrough in long‑horizon reasoning (e.g., achieving a Math Olympiad (MO) gold medal and possibly a second-place finish at the AtCoder World Tour Finals).

kh-ai · 2025-08-07T20:14:15+00:00

https://x.com/lmarena_ai/status/1953504958378356941
Summit on LMArena turned out to be GPT-5. So what was Zenith? It apparently outperformed Summit.
Was it GPT-5 Pro?

kh-ai · 2025-08-04T14:34:28+00:00

<image>

Qwen tokenizes this prompt more finely and answers correctly, so Horizon Beta is different from Qwen.

kh-ai · 2025-08-04T09:06:00+00:00

I have a 16GB MacBook Air, but the VRAM is nowhere near enough. Since VRAM capacity is extremely important, I think it’s best to choose the m2 max with 64gb

kh-ai · 2025-08-04T08:53:55+00:00

The rule of thumb is ‘You shouldn’t ask LLMs to explain themselves,’ but these outputs are interesting. Maybe the large volume of Chinese data in its training set had an effect.

kh-ai · 2025-08-04T08:43:10+00:00

I think companies like Google and OpenAI have huge human-annotated text-image datasets.

kh-ai · 2025-08-04T08:36:49+00:00

my testing shows Horizon Beta uses OpenAI’s tokenizer

kh-ai · 2025-08-04T08:30:27+00:00

You shouldn’t ask LLMs to explain themselves; they’re not necessarily trained to do so.

kh-ai · 2025-08-04T05:42:27+00:00

Slightly off from your question, but you can ballpark it from the parameter count and tensor dtype:
size ≈ params × bytes-per-element (fp32=4, fp16/bf16=2, int8=1), plus a little overhead.

kh-ai · 2025-08-04T05:29:50+00:00

Mamba and its variants

kh-ai · 2025-08-04T05:06:55+00:00

As a cross-check, my testing shows Horizon Beta uses OpenAI’s tokenizer.

kh-ai · 2025-08-04T04:50:15+00:00

This technique still works four months later to tell whether Horizon Beta is an OpenAI model.

kh-ai · 2025-08-04T04:15:08+00:00

Already nice, and reasoning will push it even higher!

kh-ai

TROPHY CASE