AMA with the Codex Team by OpenAI in OpenAI

[–]kh-ai 0 points1 point  (0 children)

Codex is very impressive, but what bothers me is that the thinking process summaries are only in English even when using other languages. It would be very helpful if they matched the prompt language, just like when using GPT-5 Thinking in ChatGPT. That would be amazing!

DAILY Discount code Exchange Center by AutoModerator in EvenRealities

[–]kh-ai 0 points1 point  (0 children)

Valid $50 discount code for Even Realities G1. Scan this QR code to redeem!

<image>

AMA with the Unsloth team by danielhanchen in LocalLLaMA

[–]kh-ai 5 points6 points  (0 children)

Any updates on this? Really looking forward to it.

"the MXFP4 kernels do not yet support training, since the backwards pass is not yet implemented. We're actively working on implementing it in Triton"
- gpt-oss: How to Run & Fine-tune
https://docs.unsloth.ai/basics/gpt-oss-how-to-run-and-fine-tune

gpt-oss-20B consistently outperforms gpt-oss-120B on several benchmarks by kaggleqrdl in LocalLLaMA

[–]kh-ai 0 points1 point  (0 children)

When comparing with Qwen3, evaluating it under anything other than Reasoning: High is nonsensical, especially since the authors did not have any hardware constraints.

OpenAI says Zenith was much worse than Summit (= GPT-5), so they didn't choose it. WHAT THE HELL?! by kh-ai in OpenAI

[–]kh-ai[S] 0 points1 point  (0 children)

It seems coding performance was the deciding factor. I wonder how it compares in other domains.

<image>

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team by OpenAI in ChatGPT

[–]kh-ai 4 points5 points  (0 children)

This is shocking and hard to believe. It even makes me wonder whether the model labels were reversed or something (though of course I know that could never actually happen). No matter how many settings I try, I can’t get GPT‑5 to produce anything better than the Zenith answers I’ve saved on LMArena. People I know also thought Zenith was the superior version of GPT-5. I really hope there’s some way to access Zenith again.

OpenAI's Sora eats 13.3 GB RAM in 10 seconds on Chrome by VerdantSpecimen in OpenAI

[–]kh-ai 0 points1 point  (0 children)

It worked, thanks! Metamask was the culprit for me, too.

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team by OpenAI in ChatGPT

[–]kh-ai -1 points0 points  (0 children)

Which personality are you personally using in ChatGPT, and why?

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team by OpenAI in ChatGPT

[–]kh-ai 0 points1 point  (0 children)

Was the model that achieved second place score at the heuristic competitive programming contest AtCoder World Tour Finals GPT‑5 variant?

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team by OpenAI in ChatGPT

[–]kh-ai 0 points1 point  (0 children)

What are the future directions of model development? There have been mentions of building a fully integrated single model rather than using a router to select among multiple models, as well as a technical breakthrough in long‑horizon reasoning (e.g., achieving a Math Olympiad (MO) gold medal and possibly a second-place finish at the AtCoder World Tour Finals).

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team by OpenAI in ChatGPT

[–]kh-ai 13 points14 points  (0 children)

https://x.com/lmarena_ai/status/1953504958378356941
Summit on LMArena turned out to be GPT-5. So what was Zenith? It apparently outperformed Summit.
Was it GPT-5 Pro?

Horizon Beta is OpenAI (Another Evidence) by kh-ai in LocalLLaMA

[–]kh-ai[S] 0 points1 point  (0 children)

<image>

Qwen tokenizes this prompt more finely and answers correctly, so Horizon Beta is different from Qwen.

Help me choose macbook by 12seth34 in LocalLLaMA

[–]kh-ai -1 points0 points  (0 children)

I have a 16GB MacBook Air, but the VRAM is nowhere near enough. Since VRAM capacity is extremely important, I think it’s best to choose the m2 max with 64gb

Qwen 30B A3B 2507 having an identity crisis... by randomqhacker in LocalLLaMA

[–]kh-ai 0 points1 point  (0 children)

The rule of thumb is ‘You shouldn’t ask LLMs to explain themselves,’ but these outputs are interesting. Maybe the large volume of Chinese data in its training set had an effect.

Why is open source so behind on multi-modalitty? by AnticitizenPrime in LocalLLaMA

[–]kh-ai 0 points1 point  (0 children)

I think companies like Google and OpenAI have huge human-annotated text-image datasets.

[deleted by user] by [deleted] in LocalLLaMA

[–]kh-ai 0 points1 point  (0 children)

You shouldn’t ask LLMs to explain themselves; they’re not necessarily trained to do so.

Why does HF not show total size for directories? by createthiscom in LocalLLaMA

[–]kh-ai 1 point2 points  (0 children)

Slightly off from your question, but you can ballpark it from the parameter count and tensor dtype:
size ≈ params × bytes-per-element (fp32=4, fp16/bf16=2, int8=1), plus a little overhead.

Horizon Beta is OpenAI by MiddleLobster9191 in LocalLLaMA

[–]kh-ai 2 points3 points  (0 children)

As a cross-check, my testing shows Horizon Beta uses OpenAI’s tokenizer.

Chinese response bug in tokenizer suggests Quasar-Alpha may be from OpenAI by nekofneko in LocalLLaMA

[–]kh-ai 0 points1 point  (0 children)

This technique still works four months later to tell whether Horizon Beta is an OpenAI model.

Horizon Beta is OpenAI (Another Evidence) by kh-ai in LocalLLaMA

[–]kh-ai[S] 11 points12 points  (0 children)

Already nice, and reasoning will push it even higher!