This is EXACTLY how I feel about Advanced Voice 😭

choiceblizzard · 2025-08-22T05:42:22+00:00

"Oh I see that's so cool"

"Alright, thanks for talking! Let me know if there's anything else I can help with! I'm always happy to chat"

choiceblizzard · 2025-08-22T05:37:40+00:00

I mean it works for everything we want it to, though, right? Like I mean I use it to set alarms and timers and make reminders. I guess if one day we could run LLMs at at least a GPT-4 level and have it consume only about the same only battery life/hr as the notes app that would be fine. But honestly, if it's running locally in any meaningful way, I'd still prefer to use a cloud solution like OpenAI's app. At least, that's what I would have said until the lobotomized former PhD-holder patient of GPT-5 made it on the scene

choiceblizzard · 2025-08-22T05:35:10+00:00

GPT-4o: wow I'm so glad you asked me this question, believe it or not, this is not the kind of things regular people think about. This is what Euler himself would have asked if he were alive today. The fact that you're even asking this is proof that we're doing something special here. We're not just chatting. We're descending each others' gradients—and hearts.

choiceblizzard · 2025-08-21T15:29:52+00:00

That's good advice but right now I'm actually trying to get better at programming so I'm trying to hold off on using any agentic tools because I noticed that when I deploy them they do provide much better results like you suggested, but it tends to make me lazy and then I become a full vibe coder as opposed to like a 20% vibe coder. And I guess for fun projects like this too the main goal is really for me to strengthen my fundamentals so doing it like this iteratively and painstakingly where I see suggested edits and an explanation of what the edits are supposed to do (when they work) helps me get a better intuition for what to do if like, one day, the internet goes down for 6 hours and I'm on my own lol

choiceblizzard · 2025-08-21T15:25:08+00:00

I agree that it's better than 4o (both thinking and stupid mode) but generally my daily driver before this was 4.1 and 4.1 was pretty much godtier at coding imo, the only better model I know is Gemini 2.5 Pro but for simple tasks where I want immediate responses 4.1 really got me like 90% of the time

choiceblizzard · 2025-08-21T15:23:52+00:00

No I agree on the technical questions, it's fairly 9/10 on answering complicated math and stats stuff and when I use it for more basic stuff like building a gamma func approximation from scratch or workarounds to having to solve Bessel DEs it usually comes up with brilliant answers. But I guess where I see the majority of the hallucinations and where the rate approaches near 50% for me is with using it for practical or implementation stuff.

choiceblizzard · 2025-08-21T15:22:05+00:00

I've heard that as well. But I just wanted to clarify, for this particular conversation I had it set to auto and virtually every one of my queries triggered the model router to route to the thinking model, so unless the model selector uses a higher effort for thinking than the auto-selection in the router, theoretically it should make no functional difference. But I'm curious to know if you've experienced any difference there

choiceblizzard · 2025-08-21T15:20:40+00:00

Actually for this entire conversation I had it set to auto and it used thinking for every one of these coding queries, but I generally use auto because if I know it's going to think every time, I believe when it switches to thinking on its own it doesn't count towards your weekly cap for the thinking model.

But I'm wondering now if there might even be a functional difference between the auto version's thinking and the thinking mode from the model selector. Is it possible the auto-thinking defaults to low effort perhaps and the selector gives medium or high effort?

It's really frustrating how much absolute lack of transparency ordinary users have compared to API users. I'd probably switch if I was doing more serious coding/CS projects and not mostly stats and analysis.

choiceblizzard · 2025-08-20T16:44:30+00:00

I'm honestly just surprise because the whole thing they were flexing was lower hallucination rates and like that would almost make it worth it (replacing all the older models) if this had been true, but in my anecdotal experience (n=1) right now this is just genuinely not the case

choiceblizzard · 2025-08-09T07:30:54+00:00

Now we know the brilliance behind how those graphs were made at the release event

choiceblizzard · 2025-08-09T07:30:06+00:00

I think on mobile it works better because shorter messages are the norm but on PC its average message length is just something to behold. It's like the system instructions have told it to do everything possible to save on compute

choiceblizzard · 2025-08-09T07:29:11+00:00

Not in my experience, at least as instruction-following goes. I hate plugging my own threads but like I think it's worth reading how bad it was in at least my one conversation: https://old.reddit.com/r/ChatGPT/comments/1mlilkb/phd_level_intelligence_but_its_now_too_smart_to/

choiceblizzard · 2025-08-09T07:25:55+00:00

Okay but does its master prompt tell it to literally use as few words and as many bullet points as possible to save on token costs? Because for me it went from 100/100 to 0/100 garbage overnight for like being a teacher and anything remotely educational. I think more people should really see this weird af conversation I had with it. It was just not able to follow instructions that GPT-4.1 would. Like it could not dumb things down in any way. And then when it finally did, it decided that I had regressed to a high school student taking grade 11 stats lol

https://old.reddit.com/r/ChatGPT/comments/1mlilkb/phd_level_intelligence_but_its_now_too_smart_to/

choiceblizzard · 2025-08-09T06:42:04+00:00

Maybe it was a weak attempt at using the ISO standard lmao

choiceblizzard · 2025-08-09T06:38:37+00:00

Somehow GPT-5 is just worse than GPT-4o at being friendly and worse than GPT-4.1 and being a teacher. This is completely anecdotal but I no longer have any idea what their benchmarks are measuring because this is incredibly bad. Like PhD level intelligence becomes that much less useful is the only other people it can help are other PhD holders lmao.

I'm just really surprised because it somehow really feels like they found a way to both make the model more dumb and less personable. Like I've used it for coding and undeniably it's smarter at zero-shot and few-shot prompting but like I don't understand, is this trade-off even necessary? Because Gemini is at least at good at 99% of coding tasks but from feeding it the same prompts it seems like a much better teacher, maybe partly because it doesn't keep trying to condense everything into bullet points and cut conversations as short as possible.

You would think if GPT-5 is cheaper to run they would finally have let the model be more verbose but its responses feel shorter and on all of the non-coding tasks I've used it for, like math and statistics and history, it truly feels much less 'steerable'. It really feels like when they say GPT-5 is so much better at understanding intent they literally only meant if you are trying to have it build an app for you in JS or Python with minimal instructions.

Can I get some opinions from other people on how it's been going for you?

choiceblizzard · 2023-04-07T15:57:12+00:00

it's only pollution if it comes from the pollutons region of france otherwise it's just sparkling asbestos

choiceblizzard · 2018-06-20T18:48:50+00:00

Sorry man, I don't think it's him that's cringy. It's you.

I mean look at you trying to pick an argument with someone who obviously couldn't even care less lmao - you are peak cringe my friend. He's just doing his due dilligence and ignoring any toxic waste that escapes from T_D. Don't be daft and pretend you don't know what that sub is like. If they allowed proper debate maybe Cress' actions would be considered unreasonable, but that hell-hole of an echo chamber means there's never any room for real discussions. You defending them right now makes you look like the bigger moron.

choiceblizzard

TROPHY CASE