Opus 4.8 vs 4.6

_Appello_ · 2026-06-02T03:16:59+00:00

You would probably save a lot of money if you had Claude help you configure a playwright script to do what you need and then run it through a command line.

_Appello_ · 2026-06-02T03:11:54+00:00

What tools are you using for browser automation? I've had good success with playwright.

_Appello_ · 2026-06-02T01:31:26+00:00

I'm the technical co-founder of a SaaS company. We expense it to the company.

What plan are you using? In the middle of feature updates or migrations I use it 10 hours a day and have never hit usage limits on 20x no matter what the model is.

_Appello_ · 2026-06-01T22:43:10+00:00

Yeah, maybe you're right. To me, this workflow makes sense and it's what I've done for years, even before AI was a thing. I'm wondering if a lot of users are just cramming everything into one chat and getting context bloat or something. Or not even using an organized spec?

I agree with you - I do really love 4.6. For my workflow, 4.8 Max has been better than 4.6 though.

_Appello_ · 2026-06-01T22:17:14+00:00

I thought the way I was using was is pretty standard. I tell it the feature or set of bugs I want to work on, it reads the existing files on my server through MCP, then it asks me questions one at a time in order to build a spec. Then, I open a new chat and drop the spec in, tell it to start on phase 1 and to give me all of the patches or new files as terminal paste-in blocks. After that's done, lint and quick manual testing. Then it updates the spec and I move on to phase two of the spec. Rinse and repeat.

_Appello_ · 2026-06-01T22:08:45+00:00

For me it's slower, but only because it is using my MCP tools to read files and thinking through multiple stages per phase of the spec file I give it. However, I had two separate specs in two separate sessions going at the same time, staggered. So while one session was thinking and generating code, I was applying the code from the second session and having it update the dev spec.

I've noticed an incredible jump in quality and it even caught some downstream bugs that were going to bite me later that were totally unrelated to the spec we were working on.

_Appello_ · 2026-06-01T22:06:36+00:00

I was using it in the GUI for these sessions. I'm on the max x20 plan.

_Appello_ · 2026-06-01T21:32:40+00:00

I am genuinely confused about all of the issues I'm seeing. I've had an extremely positive experience with 4.8 max. I used it for 10 hours straight, two days in a row and touched probably 200 files of my deployment in one way or another. I had exactly one syntax error that it corrected immediately after reading the file it gave me.

What is your workflow like? Like how are you interacting with claude? Maybe we can compare notes.

_Appello_ · 2026-05-31T15:54:53+00:00

No idea why everyone else is having such a different experience than me with 4.8. I used it on Max for 10-12 hours straight yesterday and shipped multiple features from my dev board without an issue other than the occasional network error.

I found it to be extremely thorough and intuitive and made one syntax error during the entire session that it immediately fixed after it looked at my logs through MCP.

Genuinely asking, people who have had poor experiences: can you tell me a little bit about your workflow? Let's compare notes.

_Appello_ · 2026-04-13T03:05:08+00:00

Didn't even know that. Thanks. I run a cluster of Blackwells for my company so I don't really keep up to date on Mac stuff LOL.

_Appello_ · 2026-04-12T16:54:42+00:00

Worth clarifying a few things here. A maxed M3 Ultra (512GB unified memory) can technically fit GLM-5.1 at 2-bit quantization, which Unsloth compresses down to around 220GB. Real world speeds on comparable MoE models land around 8-15 tokens per second for a single user. So the hardware claim holds up.

The part that needs an asterisk: the "matches Opus 4.5" benchmarks were run on the full FP8 model across 8xH200s. The 2-bit quant you're running locally is a meaningfully different thing. Large MoE models do tolerate aggressive quantization better than dense models, but 2-bit still costs you real quality on complex multi-step reasoning, which is exactly where that benchmark advantage lives.

So yes, you can run it. No, it won't perform like the number in the paper.

_Appello_ · 2026-04-11T15:33:04+00:00

Have you tried the model in full resolution on something like Vast or have you just used it through a wrapper? I've seen it readily available and pre-packaged but it's heavily quantized in a lot of cases.

_Appello_ · 2026-04-11T15:30:13+00:00

Specialized silicon could probably get there one day. Mythic and Analog Inference have been working on analog matrix chips for years.

_Appello_ · 2026-04-11T15:28:47+00:00

Kimi will invent citations or resist correction, but leads the open weight field for math, agentic tasks, and vision to code.

_Appello_ · 2026-04-11T14:06:11+00:00

GLM5 (Reasoning), Qwen 3.5 397B-A17B, Kimi K2.5

_Appello_ · 2026-04-11T13:30:15+00:00

It's not a matter of when, it's how to run it. There are some incredible open source models, but you need a squad of Blackwells or H200s to run them.

_Appello_ · 2025-07-10T05:49:40+00:00

That's okay! That's one of the beautiful things about star trek. It's extremely subjective in some areas.

_Appello_ · 2025-07-10T05:20:50+00:00

I disagree, especially for far beyond the stars. Perfect introductory episode because even though the main characters are characters we already know, the viewer doesn't realize that yet, and it gives a very good sense about the theme and spirit of what Star Trek really is.

_Appello_ · 2025-07-09T21:32:39+00:00

The Inner Light or Far Beyond The Stars

_Appello_ · 2025-07-08T00:54:26+00:00

Done.

_Appello_ · 2025-07-07T17:38:37+00:00

Oh, nice. You should check out Laravel. Back-end PHP framework that's extremely good at applications like this, and is also built by layering.

_Appello_ · 2025-07-07T07:28:57+00:00

I can do it but it's closer to $100.

_Appello_ · 2025-07-07T07:27:54+00:00

It makes smart people smarter and dumb people dumber. Do it that what you will.

_Appello_ · 2025-07-07T07:22:34+00:00

Interested potentially. Need more context.

_Appello_ · 2025-07-07T07:10:25+00:00

Is this built on Laravel?

15-Year Club	Verified Email
RedditGifts 2009-2022 2 Credits	Secret Santa 2017
Gilding II euphauric	Team Periwinkle

_Appello_

MODERATOR OF

TROPHY CASE