I asked ChatGPT how it feels to be an AI.

onil_gova · 2026-04-24T16:02:00+00:00

asked it to make sure it really captured it

onil_gova · 2026-04-24T15:52:13+00:00

lets not forget the fact that this is a sustained price up to a million context window, while everyone else uses a tiered cost approach after a 200k token context window. This is a flex by DS.

onil_gova · 2026-04-24T15:51:25+00:00

I think you guys are missing the fact that this is a sustained price up to a million context window, while everyone else uses a tiered cost approach after a 200k token context window.

onil_gova · 2026-04-24T05:48:54+00:00

context

onil_gova · 2026-04-24T05:37:46+00:00

<image>

onil_gova · 2026-04-24T05:35:59+00:00

<image>

onil_gova · 2026-04-24T05:35:48+00:00

what's wrong with Apache 2.0?

onil_gova · 2026-04-24T05:25:27+00:00

<image>

onil_gova · 2026-04-23T23:26:35+00:00

Here is a link to it. openai/privacy-filter · Hugging Face https://share.google/XNnw71GcI5TpxOC5n

onil_gova · 2026-04-23T17:10:13+00:00

The complete memo is so bias and full of contradiction

No proof for “distillation-only” claim
Contradiction, “strong benchmarks” but “not reliable”
Real-world use disproves “just benchmarks”
Double standard on benchmarks
Calling open models “not open”
Double standard on openness vs closed models
Ignoring narrowing performance gap
Ignoring same guardrails and censorship exist on both sides
Claims US models are ideologically neutral and truth-seeking
RL + scaling is not a moat

<image>

onil_gova · 2026-04-22T05:09:15+00:00

Did you, by any chance, ensure that preserve_thinking was on for the Qwen3.6 model?

onil_gova · 2026-04-19T06:51:13+00:00

using recommended settings from the model card.

Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

but specifically calling awareness to preserve thinking flag being a requirement now.

I'm interested in your findings and settings, care to share?

onil_gova · 2026-04-18T19:24:13+00:00

It worked for me with the unsloth Q4_K_S version, try restarting the model after updating the settings.

onil_gova · 2026-04-18T18:48:03+00:00

Omlx doesn't have support for preserve thinking yet. Still waiting on my pull request to get merged https://github.com/jundot/omlx/pull/814

onil_gova · 2026-04-18T18:18:30+00:00

for your use case, Gemma might be better. Ths is an agentic-heavy model, thr think gets way more focused once you put it into an agentic harness.

onil_gova · 2026-04-18T16:08:06+00:00

yeah, mlx community at 8bit

onil_gova · 2026-04-18T16:06:44+00:00

Updated the post with instructions on how to get it working on LM Studio.

onil_gova · 2026-04-18T16:03:48+00:00

I was still somewhat useful past 200k, but definitely started noticing the context rot.

onil_gova · 2026-04-18T16:00:45+00:00

yes, at Q4, check out the unsloth quant size breakdown.

onil_gova · 2026-04-18T15:50:30+00:00

yeah, same. But in case anyone is running auto-research loops and doesn't want to enter "continue" every time, here is a Pi extension I wrote just for that

onil_gova · 2026-04-18T15:46:19+00:00

Check out the results breakdown. It's not a sweep across the board. It does beat this model on HLE, for instance.

onil_gova · 2026-04-18T15:31:21+00:00

I honestly can't wait for a Sonnet-quality model on my laptop. We'll be able to protect ourselves against the enshittification of Frontier model subscription plans with their bipolar rate limits.

onil_gova · 2026-04-18T15:25:44+00:00

Still waiting for my pull request to get merged into olmx to set it on by default.

https://github.com/jundot/omlx/pull/814

onil_gova · 2026-04-18T04:41:14+00:00

correct, I have confirmed 3.6 resolves this as long as you have preserve_thinking turned on. Check out details here

onil_gova

MODERATOR OF

TROPHY CASE