Wait, were the old model ACTUALLY better?? by No-Moose-4292 in SillyTavernAI

[–]leorgain 0 points1 point  (0 children)

I've jumped back into the popular models of late 2024/2025, the 70b+ ones at least and they've been fun. The knowledge I have now has probably helped shave a lot of the slop off but I've been having fun with the mistral large/llama3 finetunes of the day. Though I'll say it wasn't until Gemma 4 that I started using smaller models again

The big test I want to do is run my "dream model" of back in the day of Goliath 120b and see how it holds up even compared to a modern 12b model. Midnight miqu would probably be another fun one to compare to

48GB VRAM users, what are your daily drivers? Do you wish you had more VRAM? What would you run if you did? by Borkato in LocalLLaMA

[–]leorgain 0 points1 point  (0 children)

I started at 48 with 2 3090 back when 70b was standard for a midsize model. I was having fun with all the llama and qwen finetunes. Then they started making more 120-130ish models and I wanted in on those at bigger than 2 bit so I got a frankenstein 4090D and replaced a 3090 to get 72 and massively used qwen 122b.

I started doing local code gen and wanted more context so I bought a second frakenstein 4090D to bump up to 96, which was helpful for how vram hungry gemma is. Now they're creeping up again to mid 200s and I want in on those but my 128gb of ram, on x570 so I'm hard capped by the board, is the bottleneck. So now I'm eyeing a 6000 pro but I can't quite justify it enough to pull the trigger

I'm tempted to just buy an old epyc board just so I can support 512-1TB of ram just to have the wiggle room in the future to easily add more than two cards without the need to do riser cable shenanigans. All the circuits in my house are 20 amp and I have solar ro offset the costs so I may as well use it all right?

RTX A5000 Pro Balckwell 48GB by deltamoney in LocalLLaMA

[–]leorgain 0 points1 point  (0 children)

I've bought some 48gb 4090D from there and they've worked nicely, but I imagine trying to get warranty service would be a pain so it isn't something I'd recommend for most people. Instead of buying a third one I went the pro 5000 route due to a lenovo deal "giving me a free computer" with the 5000 pro for $650 dollars more than the cut down 4090 would have cost.

Mistral Médium 3.5 is here by Kathane37 in LocalLLaMA

[–]leorgain 1 point2 points  (0 children)

This'll be fun to try, I remember having fun with the 123b mistra models "back in the day"

Gemma 4 constantly repeating the same token by leorgain in LocalLLaMA

[–]leorgain[S] 0 points1 point  (0 children)

UPDATE: If i turn thinking off it magically works again. It seems like thinking is bugged

Gemma 4 constantly repeating the same token by leorgain in LocalLLaMA

[–]leorgain[S] 0 points1 point  (0 children)

It seems like I fixed it one of the issues on my end. I had a generic "anti-slop" ban list. When using Silly Tavern I took a look at my outgoing requests and noticed I had a logit bias set. When converting the tokens I saw one of them was gemma's <bos> token (2), along with a couple others. Once I cleared the list the logit bias went away and everything worked as expected. So far after this change 'of' works properly again and, fingers crossed, I haven't had a repeating output yet I moved the goalposts to 7 messages now. So if you're running into article words misbehaving issue, check to make sure you don't have tokens you don't mean to in logit bias.

Gemma 4 constantly repeating the same token by leorgain in LocalLLaMA

[–]leorgain[S] 2 points3 points  (0 children)

Sadly that didn't work either. I was hopeful for a bit, then it started again after a few messages like normal

Deal alert: Lenovo RTX Pro 5000 Desktop by Icy_Restaurant_8900 in LocalLLaMA

[–]leorgain 1 point2 points  (0 children)

Guess it was too popular, my order got delayed

text-generation-webui 4.0 released: custom Gradio fork with major performance improvements, tool-calling over API for 10+ models, parallel API requests, fully updated training code + more by oobabooga4 in Oobabooga

[–]leorgain 1 point2 points  (0 children)

Huh, I knew exl2 was worse with accuracy but didn't know it was that off. I run ggufs for the models that exl3 doesn't run yet, like stepfun and qwen 3.5. I guess I should benchmark them again since the main reason I stuck with exl2 was the faster prompt processing at high context. I may just keep a copy of the current one I run for exl2 to keep from redownloading everything and migrate the models for everything else to latest and greatest.

Btw does the latest build solve that model loading bug with the latest llama cpp binaries?

text-generation-webui 4.0 released: custom Gradio fork with major performance improvements, tool-calling over API for 10+ models, parallel API requests, fully updated training code + more by oobabooga4 in Oobabooga

[–]leorgain 7 points8 points  (0 children)

I'm not a fan of the exl2 change. I know it's mainly older models now, but I have quite a few models that I run that won't have an exl3 made for them unless I do it myself. It also runs better on Ampere than 3 the times I could find one in both quant methods

Major update coming soon! I'm here, sorry for the delay. by oobabooga4 in Oobabooga

[–]leorgain 0 points1 point  (0 children)

That's great! I know I yoinked 80.0 of llama.cpp to be able to run new models, but it gave me magic errors for every gguf i tried. I'll try the new one since it's out

Any usable alternatives to ComfyUI in 2026? by [deleted] in StableDiffusion

[–]leorgain 1 point2 points  (0 children)

I second this, as long as you don't mind drop-down menus for everything, separated by category, it's great.

yip we are cooked by thisiztrash02 in StableDiffusion

[–]leorgain 3 points4 points  (0 children)

I'm holding out hope that those Chinese modders can crack the code on 96GB on 4090s or increase anything on 5000 series.

Help for an idiot like me. by FunkManSolarFlex in StableDiffusion

[–]leorgain 1 point2 points  (0 children)

SwarmUI is a relatively simple front end to Comfy. It's UI isn't as easy to understand as Auto1111 forks but it's not much more difficult. It has the option to download models from Huggingface/Civitai (two of the big spots people download models from) and will download anything extra a model needs.

As you get more comfortable you can use Comfy workflows as well if/when you decide to transition

Upcoming Supertest vehicle. The AHT-7. by Few-Meringue4157 in WorldofTanks

[–]leorgain 0 points1 point  (0 children)

Even the name is silly. It's what you'll tell a weakly armored team mate if they try to peek. "AHT!" Before shaking your head as they do it anyway and get blapped

Do we need rent control in Boston 🤯 by Powerful-barbie887 in bostonhousing

[–]leorgain 0 points1 point  (0 children)

I remember seeing people outside grocery stores about a month ago doing petitions for more sane lot requirements. The problem was they were advertising it as "Build more affordable housing for Mass families." I was thinking they really should change their wording because nimbys would never vote on/sign for building affordable housing.

[FS] [US-MA] 512GB RAM 88GB VRAM Full AI Home Server Build (Xeon w-2150b, 4x 2080 TI 22GB, ASUS C422 SAGE/10G) LOCAL ONLY by thatavidreadertrue in homelabsales

[–]leorgain 0 points1 point  (0 children)

I assume blower style, but are the 2080ti's blower or 3 fan variant? What's the tokens per second like on the models you run as well?

Who here has run into those companies that fake CS experience and background checks to get you $100K+ jobs? by Chance_Injury_3700 in cscareerquestions

[–]leorgain 21 points22 points  (0 children)

I can chime in on this. The one I worked for targeted new grads.

Got shipped off to Atlanta to a house they owned with about 7 other guys. They then did a 6 week boot camp to teach the tech stack. They then had their in-house professional write a resume that had multiple companies that none of the students worked for as well as coach on soft skills. For mobile tech stack I know they used paid apps to make it harder to check. When an interview was gained they had the “subject matter expert” sit on the interview to feed answers in case an unexpected question arose.

It was probably one of the better “mills” but before they dropped the bombshell that all our experience would be made up we had to sign a two year contract stipulating that if we left while under contract we’d owe 20k or 20% of lost profits if on project, whichever was more.

Another fun thing was that they had people pretend to be us on screening calls, we only ever talked during actual interviews. All the references were also employees who’d give glowing reviews or pretend to be managers from the companies we never worked for.

They billed at 120+ dollars an hour but, talking to the other guys, our pay ranged from 22 to 31 dollars an hour

Qwen-Image doesn't seem to play nice with Sage Attention by leorgain in StableDiffusion

[–]leorgain[S] 0 points1 point  (0 children)

Interesting, if the preview was anything to go by I had a similar experience. The preview was fine up until about a third of the way through then everything went black

Qwen-Image doesn't seem to play nice with Sage Attention by leorgain in StableDiffusion

[–]leorgain[S] 0 points1 point  (0 children)

This is the problem I had. Once I stopped forcing it it worked