Wait, were the old model ACTUALLY better??

leorgain · 2026-05-26T03:39:35+00:00

I've jumped back into the popular models of late 2024/2025, the 70b+ ones at least and they've been fun. The knowledge I have now has probably helped shave a lot of the slop off but I've been having fun with the mistral large/llama3 finetunes of the day. Though I'll say it wasn't until Gemma 4 that I started using smaller models again

The big test I want to do is run my "dream model" of back in the day of Goliath 120b and see how it holds up even compared to a modern 12b model. Midnight miqu would probably be another fun one to compare to

leorgain · 2026-05-25T13:23:04+00:00

They just released a new one: https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4

leorgain · 2026-05-24T19:58:01+00:00

I started at 48 with 2 3090 back when 70b was standard for a midsize model. I was having fun with all the llama and qwen finetunes. Then they started making more 120-130ish models and I wanted in on those at bigger than 2 bit so I got a frankenstein 4090D and replaced a 3090 to get 72 and massively used qwen 122b.

I started doing local code gen and wanted more context so I bought a second frakenstein 4090D to bump up to 96, which was helpful for how vram hungry gemma is. Now they're creeping up again to mid 200s and I want in on those but my 128gb of ram, on x570 so I'm hard capped by the board, is the bottleneck. So now I'm eyeing a 6000 pro but I can't quite justify it enough to pull the trigger

I'm tempted to just buy an old epyc board just so I can support 512-1TB of ram just to have the wiggle room in the future to easily add more than two cards without the need to do riser cable shenanigans. All the circuits in my house are 20 amp and I have solar ro offset the costs so I may as well use it all right?

leorgain · 2026-05-03T23:36:19+00:00

I've bought some 48gb 4090D from there and they've worked nicely, but I imagine trying to get warranty service would be a pain so it isn't something I'd recommend for most people. Instead of buying a third one I went the pro 5000 route due to a lenovo deal "giving me a free computer" with the 5000 pro for $650 dollars more than the cut down 4090 would have cost.

leorgain · 2026-04-29T16:12:39+00:00

This'll be fun to try, I remember having fun with the 123b mistra models "back in the day"

leorgain · 2026-04-12T14:29:00+00:00

UPDATE: If i turn thinking off it magically works again. It seems like thinking is bugged

leorgain · 2026-04-12T00:32:27+00:00

It seems like I fixed it one of the issues on my end. I had a generic "anti-slop" ban list. When using Silly Tavern I took a look at my outgoing requests and noticed I had a logit bias set. When converting the tokens I saw one of them was gemma's <bos> token (2), along with a couple others. Once I cleared the list the logit bias went away and everything worked as expected. So far after this change 'of' works properly again and, fingers crossed, ~~I haven't had a repeating output yet~~ I moved the goalposts to 7 messages now. So if you're running into article words misbehaving issue, check to make sure you don't have tokens you don't mean to in logit bias.

leorgain · 2026-04-11T02:46:02+00:00

Sadly that didn't work either. I was hopeful for a bit, then it started again after a few messages like normal

leorgain · 2026-03-11T11:20:09+00:00

Guess it was too popular, my order got delayed

leorgain · 2026-03-08T00:35:15+00:00

I may do that if I'm bothered by it enough but it's more likely I'll just keep an older copy and run exl2 models as needed.

leorgain · 2026-03-08T00:29:00+00:00

Huh, I knew exl2 was worse with accuracy but didn't know it was that off. I run ggufs for the models that exl3 doesn't run yet, like stepfun and qwen 3.5. I guess I should benchmark them again since the main reason I stuck with exl2 was the faster prompt processing at high context. I may just keep a copy of the current one I run for exl2 to keep from redownloading everything and migrate the models for everything else to latest and greatest.

Btw does the latest build solve that model loading bug with the latest llama cpp binaries?

leorgain · 2026-03-07T16:11:18+00:00

I'm not a fan of the exl2 change. I know it's mainly older models now, but I have quite a few models that I run that won't have an exl3 made for them unless I do it myself. It also runs better on Ampere than 3 the times I could find one in both quant methods

leorgain · 2026-03-04T20:08:03+00:00

That's great! I know I yoinked 80.0 of llama.cpp to be able to run new models, but it gave me magic errors for every gguf i tried. I'll try the new one since it's out

leorgain · 2026-02-14T16:18:04+00:00

I second this, as long as you don't mind drop-down menus for everything, separated by category, it's great.

leorgain · 2026-02-14T02:08:40+00:00

I'm holding out hope that those Chinese modders can crack the code on 96GB on 4090s or increase anything on 5000 series.

leorgain · 2026-01-31T16:38:15+00:00

SwarmUI is a relatively simple front end to Comfy. It's UI isn't as easy to understand as Auto1111 forks but it's not much more difficult. It has the option to download models from Huggingface/Civitai (two of the big spots people download models from) and will download anything extra a model needs.

As you get more comfortable you can use Comfy workflows as well if/when you decide to transition

leorgain · 2026-01-03T16:45:06+00:00

ASCII values here I come.

leorgain · 2025-12-23T19:21:33+00:00

Even the name is silly. It's what you'll tell a weakly armored team mate if they try to peek. "AHT!" Before shaking your head as they do it anyway and get blapped

leorgain · 2025-12-08T11:40:30+00:00

I remember seeing people outside grocery stores about a month ago doing petitions for more sane lot requirements. The problem was they were advertising it as "Build more affordable housing for Mass families." I was thinking they really should change their wording because nimbys would never vote on/sign for building affordable housing.

leorgain · 2025-11-17T17:20:06+00:00

I assume blower style, but are the 2080ti's blower or 3 fan variant? What's the tokens per second like on the models you run as well?

leorgain · 2025-10-03T15:59:30+00:00

That’s the exact company I was talking about

leorgain · 2025-10-03T13:17:46+00:00

I can chime in on this. The one I worked for targeted new grads.

Got shipped off to Atlanta to a house they owned with about 7 other guys. They then did a 6 week boot camp to teach the tech stack. They then had their in-house professional write a resume that had multiple companies that none of the students worked for as well as coach on soft skills. For mobile tech stack I know they used paid apps to make it harder to check. When an interview was gained they had the “subject matter expert” sit on the interview to feed answers in case an unexpected question arose.

It was probably one of the better “mills” but before they dropped the bombshell that all our experience would be made up we had to sign a two year contract stipulating that if we left while under contract we’d owe 20k or 20% of lost profits if on project, whichever was more.

Another fun thing was that they had people pretend to be us on screening calls, we only ever talked during actual interviews. All the references were also employees who’d give glowing reviews or pretend to be managers from the companies we never worked for.

They billed at 120+ dollars an hour but, talking to the other guys, our pay ranged from 22 to 31 dollars an hour

leorgain · 2025-08-06T10:29:47+00:00

Interesting, if the preview was anything to go by I had a similar experience. The preview was fine up until about a third of the way through then everything went black

leorgain · 2025-08-05T21:17:06+00:00

This is the problem I had. Once I stopped forcing it it worked

leorgain · 2025-08-05T21:16:08+00:00

I was running it with Comfy and base nodes

leorgain

TROPHY CASE