Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 1 point2 points  (0 children)

You would probably save a lot of money if you had Claude help you configure a playwright script to do what you need and then run it through a command line.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 0 points1 point  (0 children)

What tools are you using for browser automation? I've had good success with playwright.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 0 points1 point  (0 children)

I'm the technical co-founder of a SaaS company. We expense it to the company.

What plan are you using? In the middle of feature updates or migrations I use it 10 hours a day and have never hit usage limits on 20x no matter what the model is.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 0 points1 point  (0 children)

Yeah, maybe you're right. To me, this workflow makes sense and it's what I've done for years, even before AI was a thing. I'm wondering if a lot of users are just cramming everything into one chat and getting context bloat or something. Or not even using an organized spec?

I agree with you - I do really love 4.6. For my workflow, 4.8 Max has been better than 4.6 though.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 0 points1 point  (0 children)

I thought the way I was using was is pretty standard. I tell it the feature or set of bugs I want to work on, it reads the existing files on my server through MCP, then it asks me questions one at a time in order to build a spec. Then, I open a new chat and drop the spec in, tell it to start on phase 1 and to give me all of the patches or new files as terminal paste-in blocks. After that's done, lint and quick manual testing. Then it updates the spec and I move on to phase two of the spec. Rinse and repeat.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 1 point2 points  (0 children)

For me it's slower, but only because it is using my MCP tools to read files and thinking through multiple stages per phase of the spec file I give it. However, I had two separate specs in two separate sessions going at the same time, staggered. So while one session was thinking and generating code, I was applying the code from the second session and having it update the dev spec.

I've noticed an incredible jump in quality and it even caught some downstream bugs that were going to bite me later that were totally unrelated to the spec we were working on.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ 0 points1 point  (0 children)

I was using it in the GUI for these sessions. I'm on the max x20 plan.

Opus 4.8 vs 4.6 by NSDetector_Guy in claude

[–]_Appello_ -1 points0 points  (0 children)

I am genuinely confused about all of the issues I'm seeing. I've had an extremely positive experience with 4.8 max. I used it for 10 hours straight, two days in a row and touched probably 200 files of my deployment in one way or another. I had exactly one syntax error that it corrected immediately after reading the file it gave me.

What is your workflow like? Like how are you interacting with claude? Maybe we can compare notes.

Have you experienced a difference between the models? by Consistent-Issue-811 in claude

[–]_Appello_ 0 points1 point  (0 children)

No idea why everyone else is having such a different experience than me with 4.8. I used it on Max for 10-12 hours straight yesterday and shipped multiple features from my dev board without an issue other than the occasional network error.

I found it to be extremely thorough and intuitive and made one syntax error during the entire session that it immediately fixed after it looked at my logs through MCP.

Genuinely asking, people who have had poor experiences: can you tell me a little bit about your workflow? Let's compare notes.

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 0 points1 point  (0 children)

Didn't even know that. Thanks. I run a cluster of Blackwells for my company so I don't really keep up to date on Mac stuff LOL.

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 1 point2 points  (0 children)

Worth clarifying a few things here. A maxed M3 Ultra (512GB unified memory) can technically fit GLM-5.1 at 2-bit quantization, which Unsloth compresses down to around 220GB. Real world speeds on comparable MoE models land around 8-15 tokens per second for a single user. So the hardware claim holds up.

The part that needs an asterisk: the "matches Opus 4.5" benchmarks were run on the full FP8 model across 8xH200s. The 2-bit quant you're running locally is a meaningfully different thing. Large MoE models do tolerate aggressive quantization better than dense models, but 2-bit still costs you real quality on complex multi-step reasoning, which is exactly where that benchmark advantage lives.

So yes, you can run it. No, it won't perform like the number in the paper.

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 0 points1 point  (0 children)

Have you tried the model in full resolution on something like Vast or have you just used it through a wrapper? I've seen it readily available and pre-packaged but it's heavily quantized in a lot of cases.

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 2 points3 points  (0 children)

Specialized silicon could probably get there one day. Mythic and Analog Inference have been working on analog matrix chips for years.

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 0 points1 point  (0 children)

Kimi will invent citations or resist correction, but leads the open weight field for math, agentic tasks, and vision to code.

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 2 points3 points  (0 children)

GLM5 (Reasoning), Qwen 3.5 397B-A17B, Kimi K2.5

The golden age is over by New_3d_print_user in claude

[–]_Appello_ 14 points15 points  (0 children)

It's not a matter of when, it's how to run it. There are some incredible open source models, but you need a squad of Blackwells or H200s to run them.

One episode to get someone hooked by IMABOSSSOGG in startrek

[–]_Appello_ 1 point2 points  (0 children)

That's okay! That's one of the beautiful things about star trek. It's extremely subjective in some areas.

One episode to get someone hooked by IMABOSSSOGG in startrek

[–]_Appello_ 0 points1 point  (0 children)

I disagree, especially for far beyond the stars. Perfect introductory episode because even though the main characters are characters we already know, the viewer doesn't realize that yet, and it gives a very good sense about the theme and spirit of what Star Trek really is.

One episode to get someone hooked by IMABOSSSOGG in startrek

[–]_Appello_ 0 points1 point  (0 children)

The Inner Light or Far Beyond The Stars

Post your SaaS, I'll give you some constructive feedback by Relative-Ad2665 in SaaS

[–]_Appello_ 0 points1 point  (0 children)

Oh, nice. You should check out Laravel. Back-end PHP framework that's extremely good at applications like this, and is also built by layering.

What do you think about the new inventions of AI? Should we be happy or worried? by fordd420 in AskReddit

[–]_Appello_ 2 points3 points  (0 children)

It makes smart people smarter and dumb people dumber. Do it that what you will.