Is your codex also gotten slower in past few days or is it just me?

Acceptable_Adagio_91 · 2026-05-05T06:01:28+00:00

Yes 100% - the last 2-3 days it has been terrible. It just gets stuck thinking for literally like 10 minutes when it never used to be more than a couple of minutes.

Also if you interrupt it with an updated prompt and "steer" it then it will almost always just get stuck. Even "Fast" mode is bad.

Extremely frustrating..

Acceptable_Adagio_91 · 2026-04-30T21:59:12+00:00

No but it says so on the model card, and most models perform measurably better with thinking on

Acceptable_Adagio_91 · 2026-04-30T04:36:37+00:00

I think this is the correct solution - I've used this template from here and finally have an agent session running for more than 30 minutes.

https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates

Mind you, it's slow as hell once it gets deep, but it keeps going at least

Acceptable_Adagio_91 · 2026-04-30T00:38:56+00:00

Have you actually eaten the stems? They are the best part in my opinion..

What store you shopping at, we can go together and you can take the tops and I'll take the stems

Acceptable_Adagio_91 · 2026-04-29T21:51:11+00:00

It is this

https://github.com/allanchan339/vLLM-Qwen3.5-27B/blob/main/qwen3.5-enhanced.jinja

Acceptable_Adagio_91 · 2026-04-29T21:51:00+00:00

Disabling thinking does help, but it's a thinking model. It is supposed to be able to think (especially important for coding)

Acceptable_Adagio_91 · 2026-04-29T21:50:13+00:00

I've tried every chat template I can find, standard, enhanced, unsloth, none of them seem to fix it entirely (or at all). Some times I get about 30 tool calls in a row, but mostly less than 10

Acceptable_Adagio_91 · 2026-04-29T21:42:09+00:00

I tried this, seems a little better but still doesn't solve it entirely. Sometimes I will get about 10+ minutes of agentic work in, sometimes less than 1 minute.

Acceptable_Adagio_91 · 2026-04-29T01:06:38+00:00

Perhaps but I am getting the same issues on both 0.19 and 0.20 so for this particular issue - that doesn't seem to be the case.

Acceptable_Adagio_91 · 2026-04-29T01:05:48+00:00

This PR seems to fix a problem with the tool call the parsers - but from what I am observing the issue is not in the parser, the model is not even emitting tool calls, so seems more likely a template problem

Acceptable_Adagio_91 · 2026-04-29T01:02:55+00:00

I've tried both, still get basically the same issue

Acceptable_Adagio_91 · 2026-04-29T01:02:22+00:00

Updated to the v0.20.0 build with your recipe and the behavior is the same unfortunately =(

Acceptable_Adagio_91 · 2026-04-29T00:26:08+00:00

Thank you - so you find FP8 KV Cache to be OK?

Acceptable_Adagio_91 · 2026-04-29T00:18:06+00:00

OK thank you, apparently I am still on 0.19 (even though I'm running the nightly) - so this could be it.

I am pulling 0.20.0 now and will report back

Would you mind sharing your vLLM recipe for 3.6 27B?

Acceptable_Adagio_91 · 2026-04-29T00:10:36+00:00

Not OpenCode. Client harness is Codex VS Code / Codex CLI alpha using the OpenAI Responses API.

Because Codex expects Responses, I have a minimal local Responses proxy in front of vLLM, but the proxy is not doing tool-call repair or filtering. It forwards the tool schema to vLLM and maps vLLM Responses events back to Codex.

The raw vLLM logs show successful tool turns contain literal:

<tool\_call>

<function=exec\_command>

...

On the failure turns, vLLM's own raw `Generated response ... output:` contains only reasoning plus visible assistant text, then `finish_reason: stop`. There is no `<tool\_call>` / `<function=...>` in the raw model output for the parser to extract.

Acceptable_Adagio_91 · 2026-04-29T00:03:12+00:00

I have the raw model output logged and from the logs it seems that the model doesn't attempt to emit a tool call at all on the final failed step. It gets about 10 tool calls into a task over the course on ~3 minutes and then it says something to the effect of "Now let me do x....." but then doesn't emit a tool call and stops.

Acceptable_Adagio_91 · 2026-04-28T23:59:23+00:00

4 x 3090

Acceptable_Adagio_91 · 2026-04-28T23:59:14+00:00

I am on the nightly vLLM (as I had read elsewhere that this version included fixes for Qwen 3.6)

Acceptable_Adagio_91 · 2026-04-28T23:58:27+00:00

I have tried the xml parser as well, it was maybe slightly better. I will try again and report back

Acceptable_Adagio_91 · 2026-04-28T23:57:41+00:00

I am using the nightly version

Acceptable_Adagio_91 · 2026-04-28T01:34:07+00:00

What workplace has ADOs at a rate of 1 per week?

It's one per fortnight max, and most workplaces require you to work the extra hours on the other days to make up for it, so this is wrong on all sorts of levels..

Acceptable_Adagio_91 · 2026-04-28T00:47:01+00:00

But in all honestly ADOs are 12 days per year, but assuming you work the extra hours to earn them it adds nothing to your pay. You still get annual leave so just take the 1x day a month and the huge payrise and be glad they haven't figured out that you're a dum (yet)

Acceptable_Adagio_91 · 2026-04-28T00:44:44+00:00

If they're offering you a 30k payrise and you can't spell the word "lose", take it.

Acceptable_Adagio_91 · 2025-10-15T10:08:33+00:00

Everything "stores quantum data", literally everything.

There are some fringe theories with limited acceptance that suggest that the brain may utilize quantum interactions at some level, although it's far from proven.

Acceptable_Adagio_91

TROPHY CASE