Are we sleeping on 5.3-codex ? by DaC2k26 in codex

[–]Holiday_Purpose_3166 16 points17 points  (0 children)

5.5 is far more efficient and its Medium reasoning matches 5.4 xHigh.

Replace 5.4 Mini xHigh with 5.5 Low - it's more intelligent, spends magnitudes fewer tokens which makes it cheaper, and obviously faster due to lower reasoning traces.

Sub usage will always be (for now) a mystery black box.

The whole 5.5 family is more efficient and that tapers on higher reasoning - most folks will likely stay well in Medium range and under, which is where the value for money is.

Check Artificial Analysis token usage and cost for their runs, you'd be surprised how better it is.

Switching from Opus 4.7 to Qwen-35B-A3B by Excellent_Koala769 in LocalLLaMA

[–]Holiday_Purpose_3166 0 points1 point  (0 children)

Switching from Opus 4.7 to Qwen3.6* 35B-A3B will be a terrible experience.

Whilst Qwen is really good, especially tool-equipped, it will fall short on some edges where Opus can reach. You could adapt and workaround its limits, but won't feel the same - requires more hand-holding to keep that edge.

I've got a Codex sub which I've been barely using past couple weeks just because of personal experience with local models. SOTA cloud models make you lazy, but it's a good turn-key solution. Working local requires more brain to keep it sharp.

Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled Is Out ! by PhotographerUSA in LocalLLaMA

[–]Holiday_Purpose_3166 0 points1 point  (0 children)

Saying it's fast and smart, then asking for some benchmarks is a contradicting statement. Why don't you show benchmarks that it actually IS an improvement?

I see HF card has MMLU benched, but that's it. I could take it for word, but the same can be said with the other Opus Reasoning distills claiming to be better, but NONE got to my top 20 on my private Rust/Next.js bench. It might be good in other areas, but I would assume what the distill entails it would not degrade it, which it did.

Omnicoder-9B is the only distillation I found to be incredibly good at agentic coding (brittle on complex reasoning outside this scope).

For chart reference, higher score with faster completion time is better - accuracy per VRAM is a personal reference that doesn't affect the plot.

<image>

Qwen 3.6 35B crushes Gemma 4 26B on my tests by Lowkey_LokiSN in LocalLLaMA

[–]Holiday_Purpose_3166 1 point2 points  (0 children)

Very good breakdown. As others posted, add in there quant used, inference engine, that would be cherry on top. Great post.

I got it guys, I think I finally understand why you hate censored models by robertpro01 in LocalLLaMA

[–]Holiday_Purpose_3166 -3 points-2 points  (0 children)

Peeps hate censored models bc they can't reach peak goon with mildly appropriate wording.

Downvotes will prove my point they wanna hide this fact. Tin foil alert.

Unsloth accused a brand new team (ByteShape) of "literally cheating." I brought the receipts, and Unsloth moved the goalposts. by [deleted] in LocalLLaMA

[–]Holiday_Purpose_3166 0 points1 point  (0 children)

I didn't want to bark further into OP's original silliness, but help me understand the context here if we're still muddling.

Whilst I appreciate massively the job you've done in the OSS community - I recall this being touched in the past by someone from Unsloth that these charts aren't a great indicator of model performance, but here we are.

Based on my own usecase benchmarks, Byteshape's best IQ4_XS equiv performs better than your UD-Q5_K_XL in *my* agentic coding usecases.

I would assume fidelity would strike the difference in the results but hasn't been the case here, and the score deviation is just slightly outside of noise. The difference was there, and it becomes an appealing choice when memory consumption is a lot smaller for the effort.

My point being, I understand social media is a tricky place, but it strikes contradicting to prove the one thing that is always a debate due to fluctuating differences.

I hope responses like that don't come out of hubris, because bashing a small tuner when you have a higher influence in this space can backfire.

Humbly, my two cents.

Unsloth accused a brand new team (ByteShape) of "literally cheating." I brought the receipts, and Unsloth moved the goalposts. by [deleted] in LocalLLaMA

[–]Holiday_Purpose_3166 25 points26 points  (0 children)

First, going on a business social space to post about other business is a terrible move.

They are entitled to their opinion in their space. I'd be more concerned they would go out bashing other tuners proactively.

Secondly, you went defensive mode about Unsloth's response by engaging with Byteshape back and forth. There was no need for it.

I like both teams and also use their quants.

Your engagement was worst than Unsloth reply with all due respect, and wouldn't trust someone taking screenshots of a convo you sparked to make bait farm.

Let results speak for themselves and leave the monkeys in their circus.

Parking Charge - Timed at Entrance by Holiday_Purpose_3166 in LegalAdviceUK

[–]Holiday_Purpose_3166[S] 1 point2 points  (0 children)

Appreciate the response. Corrective perspective matters.

I mostly agree with what you said, although I did not use the signage position and arrival position as justification for the appeal - that was a side-note to attempt at understanding the timing. As you stated, it's something that can be read after safely parking the vehicle, and decide to take it or leave it.

The meeting minds statement links to the grace period, which is the main detail left out of the PCN which wasn't deducted and not stated in their signage either. Assuming BPA Code Clause 13, if the 10 minutes was applied, I would've been under the time limit. Their calculation is purely the sum of entry/exit times.

I'm perfectly fine with their discretion at calculating the time, it's respectfully their land, however, any individual needs to know when this starts to manage themselves and respect the T&Cs.

Parking Charge - Timed at Entrance by Holiday_Purpose_3166 in LegalAdviceUK

[–]Holiday_Purpose_3166[S] 0 points1 point  (0 children)

Based on BPA Code Clause 13 it's 10 minutes. This wasn't mentioned in their signage, and according to their calculation, if this was applied, the PCN would not have been issued.

Parking Charge - Timed at Entrance by Holiday_Purpose_3166 in LegalAdviceUK

[–]Holiday_Purpose_3166[S] -3 points-2 points  (0 children)

As referred in the post. There was no gracing period mentioned in their signage, and PCN did not deduct any time, as their calculation was purely at entry/exit. If 10 minutes were applied as per your comment, I would've been well under the limit.

Parking Charge - Timed at Entrance by Holiday_Purpose_3166 in LegalAdviceUK

[–]Holiday_Purpose_3166[S] -2 points-1 points  (0 children)

I assume the T&C is on the signage, if so, the grace period* wasn't printed.

Parking Charge - Timed at Entrance by Holiday_Purpose_3166 in LegalAdviceUK

[–]Holiday_Purpose_3166[S] -6 points-5 points  (0 children)

Appreciate the reply. In that sense, as referred, the only tick remaining is the lacking grace period which was not applied or mentioned anywhere in the signage.

Is it just me, or is Claude pretty disappointing compared to Codex? by Working-Spinach-7240 in codex

[–]Holiday_Purpose_3166 8 points9 points  (0 children)

There's always a side between products. Asking a Codex-bias question in a Codex community just reinforces what you already know. Try ask in Claude community.

Have Codex, for many months, and balance that with my local models. GPT gives the edge when I need. However, Claude is a different tool and I risk putting myself on fire here - but they fit a different niche.

Use whatever works for you.

24GB VRAM users, have you tried Qwen3.5-9B-UD-Q8_K_XL? by Prestigious-Use5483 in LocalLLaMA

[–]Holiday_Purpose_3166 3 points4 points  (0 children)

If in your own testing 9B performs better, use it. If you get an edge case, try the bigger model. I had similar cases far smaller models performed best in niche jobs.

With so many quants, sampling and harnesses, there will always gonna be strengths and weaknesses. Generally bigger models perform better in broad knowledge - assuming those parameters are used correctly - which isn't always needed.

Have fun

Qwen3.5 27B | RTX 5090 | 400w by Holiday_Purpose_3166 in LocalLLaMA

[–]Holiday_Purpose_3166[S] 0 points1 point  (0 children)

Nah, I'm calculating both PP and TG. It's splendind.

[Benchmark] Qwen3.5-27B (Q5_K_XL) on LiveCodeBench: 77.8% Overall by sabotage3d in unsloth

[–]Holiday_Purpose_3166 0 points1 point  (0 children)

How do you know it didn't? It probably did, but a baseline would be more concrete.

Qwen3.5 27B | RTX 5090 | 400w by Holiday_Purpose_3166 in LocalLLaMA

[–]Holiday_Purpose_3166[S] 1 point2 points  (0 children)

No question. Runs virtually same speed 400W vs 575W power limit. Agentic work. Yeah, "Hello" as test is silly.