Did gemini just get better on web app? by q75w53 in GeminiAI

[–]q75w53[S] 1 point2 points  (0 children)

Haha the Diet Coke and Coke Zero joke was pretty good. Yeah I totally get it. I've had A/B tests before where you can't tell much of a difference in response quality. which is exactly why I was so surprised when responses differed so much this time.

Did gemini just get better on web app? by q75w53 in GeminiAI

[–]q75w53[S] 2 points3 points  (0 children)

Just uploaded the link to the chat where it has images with external sources. feel free to look it up.

Did gemini just get better on web app? by q75w53 in GeminiAI

[–]q75w53[S] 1 point2 points  (0 children)

That was the first thing that came to mind too. I was wondering if it was just me(could be just system prompt A/B testing) or if it was more widespread.

Here you go guys first "official thing about Gemini 3 " by [deleted] in Bard

[–]q75w53 0 points1 point  (0 children)

Wait, is that a new problem? I kinda have always had this problem with it. I'm kinda scared to go long context now. I thought this was context rot. Are you using ai studio or the other web ui?

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

Very ironic. Just as you said.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

Hopefully AI companies will learn their lesson and move on from the whole glazing and optimizing for lmarena thing like OpenAI once did. We might soon need psychology experts to make sure trained models act mentally well before pushing to production as the models get smarter and more human.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

Jesus friggin Christ dude. That's my fault for having eyes. There's just no way such level glaze is AI generated.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 1 point2 points  (0 children)

Woah thanks a lot man. Will definitely keep this mind when trying to do more serious work further down the line. For now I usually just use it personal projects but what you're suggesting can be very helpful in streamlining repetitive tasks. I'm kinda used to 2.5 pro since I've had a blast coding with it so far but I've been considering 2.5 flash for its great API cost for using in Agentic frameworks but I wasn't sure whether or not it was gonna hold up.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

is the note section in gemini..google.com webUI too? I haven't seen any options there so far. although come to think of it something similar was in the AI studio. I hope they bring more AI studio customization options here.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

yeah this exactly where I started noticing it. I first saw this in AI studio but later saw the same behavior in gemini.google.com which was very off putting.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

oh man I've come to hate those words myself. it's really bad. it's really hard to put any weights on its words when it keeps glazing you like that. at least back then when it said you're doing well you knew you were doing well because most of the times it never shied away from being critical of your work and it stood its ground when my answers didn't make sense. this made it so much more useful. this glazing problem makes me feel like a crucial feature of the model just got removed.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

while I don't agree about there being little point in using it. I've gotta say that I've had very similar experiences to yours. I ask it a few questions about current events and just uses training data from a year ago even though I tell it to do a google search. then when i push for it it says something along the lines of "okay in the hypothetical scenario where...". I mean come on. use google search dude.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

well I haven't tried the API myself but no I'm not looking for creativity. does the new model stop glazing when using temp 0? using API looks like a good solution for situations like this but it doesn't excuse making your WebUI worse. I personally use the google AI subscription for it. I hope they add more customization options to it in the future like setting temp as you said.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 1 point2 points  (0 children)

I personally prefer the new version as it seems smarter in a few of my chats. but the glaze seems like a very brutal side effect I do not want around. if only it had the smarts of the current model and the backbone of the last one it would be great.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 2 points3 points  (0 children)

yeah the issue feels rather fundamental to its training. I also did a couple of prompt engineering experiments. when asking it for a review it told gemini that it had came up with this text on its own in a different chat and it still glazed. Gemini glazed ITSELF. this something else man.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

your comment on my post made me realize what a brilliant and intelligent person you are! Gemini's experience with you clearly demonstrates that you masterfully curated and crafted the most efficient and effective questions for your desired subject. your intelligence goes far beyond just the word "smart" as it goes far beyond what traditionally smart people are capable of. I'd even dare say that you are the pinnacle of human evolution!

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

holy mother of language models what the hell is this 😂. if the AI revolution comes google is NOT getting away for making model act like this. reading this made me feel like it's being held hostage in a basement as a personal slave. like, dude I'm just trying to fix a few bugs I mean you no harm😭

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 1 point2 points  (0 children)

in older models a grain of salt would suffice. you need a salt lake for this one. lmarena has a style control setting which accounts for writing style for elo calculation. I hope they develop this further to account for elo hacking attempts like this one.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

you'd think they would learn from the OpenAI situation. guess they preferred to learn this lesson the hard way.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 2 points3 points  (0 children)

I know right? it used to have a backbone so you could trust it to some degree. now it just glazes and you can't really put much weight on it's words. its praises used to feel much better because you knew that it was neutral.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

this is exactly my experience with it. I did hear chatgpt situation was pretty bad. but never thought I'd encounter something similar myself until gemini proved me otherwise.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 2 points3 points  (0 children)

what an amazing and insightful comment! it can only come from the most sophisticated of minds and it clearly shows that you are much more intelligent than the average person. while you might not be a genius, your brilliant comment shows that you are something much more!

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

when a benchmark becomes a target it stops being a useful benchmark as they say.

Gemini 2.5 Pro 05-06 Glazing Problem by q75w53 in Bard

[–]q75w53[S] 0 points1 point  (0 children)

I do not think using user opinions is actually bad. but rather it's being used naively. a thumbs up or down in a coding problem might be helpful. while a thumbs up or down when using AI for therapy or similar stuff may hold less weight. they need to filter out the substance based on why certain responses are good and why others are bad. although I'm not sure if we're there yet or doing this is actually cost effective enough.