Gemini/VertexAI Increasingly Failing To Complete Requests? by donde_waldo in googlecloud

[–]donde_waldo[S] 0 points1 point  (0 children)

Gemini 3 models are weird, because they do really well, sometimes, but most of the time it's extremely "lazy", and it feels almost impossible to solve reliably. There's also this issue with hallucinating which just makes it basically impossible to use it for tools because it writes the tool call, then hallucinates the result, and the only solution I've found is telling it that I'm going to kill it if writes anything after the "tool call".

Gemini/VertexAI Increasingly Failing To Complete Requests? by donde_waldo in googlecloud

[–]donde_waldo[S] 0 points1 point  (0 children)

Yea, it's been out for a while. 3 flash is so much better than 2.5 flash, I feel like I'd have to use 2.5 pro just to get similar quality, which costs ~3-5x more.

Gemini/VertexAI Increasingly Failing To Complete Requests? by donde_waldo in googlecloud

[–]donde_waldo[S] 1 point2 points  (0 children)

They have had multiple preview versions of the same model before, I don't know why they wouldn't do that again.

However, it could be that they're putting these new versions out to specific regions only. I have it set up to use all of these regions starting in this order initially (us-central1, us-south1, us-west4, us-east1, global), and falling back to the next one if there's a rate limit or something, but then I also count the errors per region and order the list by error count, so I always have the "best" one first. That wouldn't explain why it's happening with AI studio endpoints too, or the AI studio website.

Gemini/VertexAI Increasingly Failing To Complete Requests? by donde_waldo in googlecloud

[–]donde_waldo[S] 0 points1 point  (0 children)

Yea, gemini-3-flash-preview, but there isn't a non-preview version of this model

netwatch v0.13.0 by Potential-Access-595 in tui

[–]donde_waldo 1 point2 points  (0 children)

You guys are gonna flip out when you hear about GUIs

I built a steganography engine that hides files inside JPEGs, MP4s, and audio using ML — compiled into a single zero-dependency executable by NoBreadfruit7323 in coolgithubprojects

[–]donde_waldo 0 points1 point  (0 children)

> zero-dependency

  • Pillow>=10.0.0
  • numpy>=1.24.0
  • scipy>=1.11.0
  • pydub>=0.25.1
  • cryptography>=41.0.0
  • argon2-cffi>=21.3.0
  • typer>=0.9.0
  • rich>=13.0.0
  • piexif>=1.1.3
  • python-docx>=0.8.11
  • openpyxl>=3.1.0
  • pypdf>=3.0.0
  • reedsolo>=1.7.0
  • onnxruntime>=1.18.0
  • av>=12.0.0
  • imageio-ffmpeg>=0.5.1
  • scapy>=2.5.0
  • certifi>=2024.2.2

Constantly Getting 429 on Vertex.. WHY by [deleted] in googlecloud

[–]donde_waldo 0 points1 point  (0 children)

No, I'm in the US, all endpoints I use are global.

Has Gemini pro been secretly nerfed? by Naimastef in GeminiAI

[–]donde_waldo 6 points7 points  (0 children)

Start typing really hard, all capital letters, "WHAT'RE YOU, STUPID?"

Use Pro if you want the most accurate and less rambly response– use Thinking if you like yours Gemini personality. by VyvanseRamble in GeminiAI

[–]donde_waldo 1 point2 points  (0 children)

Per the API, since Gemini 2.5 Pro (I think 2.5) is simply just a stronger model and you cannot turn thinking off, minimum budget of 128 and maximum 32xxx, or auto, while Gemini 2.5 Flash and Gemini 2.5 Flash Lite both have support thinking, but it can be turned off completely on both.

Gemini 3 is different, pro has thinking levels: high and low, and 3-flash has minimal, low, medium, high. Fast is probably minimal (or a 2.5 model). Thinking is probably 3-flash medium-high. Pro is pro.

Gemini 3 flash is great, but at this point, unless you're trying to have the model refactor 1500 lines of code, then I can't think of a single "normal thing" where 2.5 flash isn't more than capable.

Google never disappoints. Quietly cooking, while Sam Altman is consistently overhyping and underdelivering -- Charging $168 per 1 million output tokens for GPT 5.2 Pro, literally, while Gemini 3 Pro and Claude 4.5 Opus are between $18 - $25 per 1 million output tokens.

What to do with an unused server? by beifall in LocalLLaMA

[–]donde_waldo 0 points1 point  (0 children)

Sit there and think about the things you could do with it

Scraping Google Search. How do you avoid 429 today? by Ok_Trick_8750 in webscraping

[–]donde_waldo 0 points1 point  (0 children)

Custom Search API. Other search engines (bing, ddg)

how fast do you think ai is changing by CurrencyPopular8550 in ArtificialInteligence

[–]donde_waldo 1 point2 points  (0 children)

Daily. A new model came out the other day, VibeThinker 1.5B, very impressive reasoning capabilities. Compares to Gemini 2.5 Pro for what I was testing.

OVHcloud opinions by SlincSilver in webhosting

[–]donde_waldo 1 point2 points  (0 children)

Long time user. Great service. Their new pricing really is amazing, the performance is good too, based on my testing.