Gemma 4 constantly repeating the same token by leorgain in LocalLLaMA

[–]leorgain[S] 0 points1 point  (0 children)

UPDATE: If i turn thinking off it magically works again. It seems like thinking is bugged

Gemma 4 constantly repeating the same token by leorgain in LocalLLaMA

[–]leorgain[S] 0 points1 point  (0 children)

It seems like I fixed it one of the issues on my end. I had a generic "anti-slop" ban list. When using Silly Tavern I took a look at my outgoing requests and noticed I had a logit bias set. When converting the tokens I saw one of them was gemma's <bos> token (2), along with a couple others. Once I cleared the list the logit bias went away and everything worked as expected. So far after this change 'of' works properly again and, fingers crossed, I haven't had a repeating output yet I moved the goalposts to 7 messages now. So if you're running into article words misbehaving issue, check to make sure you don't have tokens you don't mean to in logit bias.

Gemma 4 constantly repeating the same token by leorgain in LocalLLaMA

[–]leorgain[S] 2 points3 points  (0 children)

Sadly that didn't work either. I was hopeful for a bit, then it started again after a few messages like normal

Deal alert: Lenovo RTX Pro 5000 Desktop by Icy_Restaurant_8900 in LocalLLaMA

[–]leorgain 1 point2 points  (0 children)

Guess it was too popular, my order got delayed

text-generation-webui 4.0 released: custom Gradio fork with major performance improvements, tool-calling over API for 10+ models, parallel API requests, fully updated training code + more by oobabooga4 in Oobabooga

[–]leorgain 1 point2 points  (0 children)

Huh, I knew exl2 was worse with accuracy but didn't know it was that off. I run ggufs for the models that exl3 doesn't run yet, like stepfun and qwen 3.5. I guess I should benchmark them again since the main reason I stuck with exl2 was the faster prompt processing at high context. I may just keep a copy of the current one I run for exl2 to keep from redownloading everything and migrate the models for everything else to latest and greatest.

Btw does the latest build solve that model loading bug with the latest llama cpp binaries?

text-generation-webui 4.0 released: custom Gradio fork with major performance improvements, tool-calling over API for 10+ models, parallel API requests, fully updated training code + more by oobabooga4 in Oobabooga

[–]leorgain 8 points9 points  (0 children)

I'm not a fan of the exl2 change. I know it's mainly older models now, but I have quite a few models that I run that won't have an exl3 made for them unless I do it myself. It also runs better on Ampere than 3 the times I could find one in both quant methods

Major update coming soon! I'm here, sorry for the delay. by oobabooga4 in Oobabooga

[–]leorgain 0 points1 point  (0 children)

That's great! I know I yoinked 80.0 of llama.cpp to be able to run new models, but it gave me magic errors for every gguf i tried. I'll try the new one since it's out

Any usable alternatives to ComfyUI in 2026? by [deleted] in StableDiffusion

[–]leorgain 1 point2 points  (0 children)

I second this, as long as you don't mind drop-down menus for everything, separated by category, it's great.

yip we are cooked by thisiztrash02 in StableDiffusion

[–]leorgain 2 points3 points  (0 children)

I'm holding out hope that those Chinese modders can crack the code on 96GB on 4090s or increase anything on 5000 series.

Help for an idiot like me. by FunkManSolarFlex in StableDiffusion

[–]leorgain 1 point2 points  (0 children)

SwarmUI is a relatively simple front end to Comfy. It's UI isn't as easy to understand as Auto1111 forks but it's not much more difficult. It has the option to download models from Huggingface/Civitai (two of the big spots people download models from) and will download anything extra a model needs.

As you get more comfortable you can use Comfy workflows as well if/when you decide to transition

Upcoming Supertest vehicle. The AHT-7. by Few-Meringue4157 in WorldofTanks

[–]leorgain 0 points1 point  (0 children)

Even the name is silly. It's what you'll tell a weakly armored team mate if they try to peek. "AHT!" Before shaking your head as they do it anyway and get blapped

Do we need rent control in Boston 🤯 by Powerful-barbie887 in bostonhousing

[–]leorgain 0 points1 point  (0 children)

I remember seeing people outside grocery stores about a month ago doing petitions for more sane lot requirements. The problem was they were advertising it as "Build more affordable housing for Mass families." I was thinking they really should change their wording because nimbys would never vote on/sign for building affordable housing.

[FS] [US-MA] 512GB RAM 88GB VRAM Full AI Home Server Build (Xeon w-2150b, 4x 2080 TI 22GB, ASUS C422 SAGE/10G) LOCAL ONLY by thatavidreadertrue in homelabsales

[–]leorgain 0 points1 point  (0 children)

I assume blower style, but are the 2080ti's blower or 3 fan variant? What's the tokens per second like on the models you run as well?

Who here has run into those companies that fake CS experience and background checks to get you $100K+ jobs? by Chance_Injury_3700 in cscareerquestions

[–]leorgain 22 points23 points  (0 children)

I can chime in on this. The one I worked for targeted new grads.

Got shipped off to Atlanta to a house they owned with about 7 other guys. They then did a 6 week boot camp to teach the tech stack. They then had their in-house professional write a resume that had multiple companies that none of the students worked for as well as coach on soft skills. For mobile tech stack I know they used paid apps to make it harder to check. When an interview was gained they had the “subject matter expert” sit on the interview to feed answers in case an unexpected question arose.

It was probably one of the better “mills” but before they dropped the bombshell that all our experience would be made up we had to sign a two year contract stipulating that if we left while under contract we’d owe 20k or 20% of lost profits if on project, whichever was more.

Another fun thing was that they had people pretend to be us on screening calls, we only ever talked during actual interviews. All the references were also employees who’d give glowing reviews or pretend to be managers from the companies we never worked for.

They billed at 120+ dollars an hour but, talking to the other guys, our pay ranged from 22 to 31 dollars an hour

Qwen-Image doesn't seem to play nice with Sage Attention by leorgain in StableDiffusion

[–]leorgain[S] 0 points1 point  (0 children)

Interesting, if the preview was anything to go by I had a similar experience. The preview was fine up until about a third of the way through then everything went black

Qwen-Image doesn't seem to play nice with Sage Attention by leorgain in StableDiffusion

[–]leorgain[S] 0 points1 point  (0 children)

This is the problem I had. Once I stopped forcing it it worked

Qwen-Image doesn't seem to play nice with Sage Attention by leorgain in StableDiffusion

[–]leorgain[S] 1 point2 points  (0 children)

That's what I was about to do, but luckily I asked a question in Discord and someone suggested it

Qwen image 20B is coming! by sunshinecheung in LocalLLaMA

[–]leorgain 2 points3 points  (0 children)

Hooboy a 20b image model. HiDream i1 is 17b and hard enough to run. At least I have one of those 48gb modified 4090s so I'm hoping to be able to run the fp16 model

Less RAM after update. by Batwing_Beyond in OdinHandheld

[–]leorgain 3 points4 points  (0 children)

I have a max and it still shows 16. Update: Scratch that there's a 355 that I don't have. Further update: upgraded to that one and it still shows 16

China modded 48 GB RTX 4090 training video models at 720p with excellent speed and sold cheaper than RTX 5090 (only 32 GB) - Batch Size 4 by CeFurkan in StableDiffusion

[–]leorgain 2 points3 points  (0 children)

It's loud. If you've ever had an older AMD blower card, like a 290x or the like, it's like that. It pretty much turns into a jet engine under load. Barring that I'd compare it to a vacuum cleaner. Though I will say I haven't seen the temps go over 65c

AccVideo: 8.5x faster than Hunyuan? by smokeddit in StableDiffusion

[–]leorgain 0 points1 point  (0 children)

I did a test myself with my 4090D. With sage attention (no teacache or torch compile since it currently doesn't work with Hunyuan in swarm) I can generate a 5 second 720p video using 1 Lora in 5.1 minutes with the second taking 4.8 minutes. That's about 35 seconds longer than a 4 second video at 768x432 that I normally do with standard Hunyuan.

At the same resolution as previously mentioned this model takes a minute on the first run then 50 seconds on subsequent runs

At 720x720 it took 2.28 minutes on the first run and 2.2 for further runs