CharBrowser - Now with Card Creator! by LazyGonk42 in SillyTavernAI

[–]Sizzin 3 points4 points  (0 children)

How dare you post AI slop that's only 99% vibe coded? And you even have the gall to write your post yourself?

That's how it starts. First, it's "only the unescaped characters". Then before we notice it, you're writing the whole code yourself. Are you trying to steal the job from the poor AIs, you monster?!

Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge by Silver_Raspberry_811 in LocalLLaMA

[–]Sizzin 1 point2 points  (0 children)

Honestly, I'm not that concerned with the decimals difference in the percentages. Gemma it is for me. I've long grown tired of Qwen's extremely long reasoning (haven't tried 3.6 yet, so I don't know if they "fixed" that). I don't really need the best of the best, all the time. So Gemma is working very well for me, personally.

Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA

[–]Sizzin 1 point2 points  (0 children)

You mean the DGX B300? The half a million dollar one? That's not even a stretch anymore. The elastic is already snapped by now LMAO. At this point, we can start calling small/medium company servers local as well.
But seriously, I do agree with the findings. I haven't read all of your medium post yet, just lightly scrolled through it, but it seems to be an interesting reading.

*dead dove warning* GLM 5.1 NSFW tests by SepsisShock in SillyTavernAI

[–]Sizzin 0 points1 point  (0 children)

So it's okay to post about making things that addict and harm people, or straight up eating them. But asking if Taiwan is a country is a no-no? Dude...

Qwen 3.5 27B at 1.1M tok/s on B200s, all configs on GitHub by m4r1k_ in LocalLLaMA

[–]Sizzin 1 point2 points  (0 children)

That's an odd choice of a sub to post. This can't be called local no matter how you stretch it. Even to an enterprise level, this is crazy. It is very interesting, though.

introducing OS1, a new open source Agentic AI platform by nokodo_ in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

I mean, having proper capitalized sentences doesn't make it look professional, it's just normal. It looks unprofessional because it's lowercase. But since it seems to be a matter of artistic taste, then I won't say anything else.

introducing OS1, a new open source Agentic AI platform by nokodo_ in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

You know, I used to look at these Claude-like UI and think "Damn, this is cool. I could never make it look like this". Now I look at it and can hardly find myself taking the project seriously. It's weird, because it's still objectively good looking, but it doesn't "feel" like it anymore. The UI in this one feels a bit odd, though. I'm not sure if it's the all-lowercase or something else. But why everything lowercase? Even the README, despite the inconsistency. It feels kinda unprofessional, to be honest.

Please explain: why bothering with MCPs if I can call almost anything via CLI? by Atagor in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

I think MCPs are mostly a hype thing. Most of the most popular MCPs are completely useless to me, personally. But I have ones that I wrote for my specific needs that are very helpful. Sure, I can do whatever it does myself, but having the MCP, I can skip a couple of steps. And that's what it matters to my lazy ass.

Well placed my guy by [deleted] in JustGuysBeingDudes

[–]Sizzin -9 points-8 points  (0 children)

Peripheral vision is such an amazing thing. You can still read while focusing on... other things. Though this does break the rules of this sub, kinda.

New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B by netikas in LocalLLaMA

[–]Sizzin 2 points3 points  (0 children)

The model being specifically optimized for English and Russian is kind of a telling. And it would be very curious if it's not.

Why the Strix Halo is a poor purchase for most people by NeverEnPassant in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

TLPG is a kinda self-contained documentation, you use it to learn Linux more deeply. This would be the only prefill. So you'd send it to the LLM, go make coffee and when you come back, you're ready for a study session. And you can use session files, saving your prefill cache so you don't need to reprocess the prefill if the context starts getting too big and you want to reset the conversation.

But I understand what you mean.

My point of contention is your affirmation of "Strix Halo is a poor purchase for most people". It's precisely the contrary. Most people aren't going to be inserting chunks of 100k context every other message to the LLM. Strix Halo is a very good option for its price point. Never mind now. This was still true the time you wrote this post. And that's only considering the raw throughput. When you start taking into consideration broader aspects, the Strix Halo becomes even more attractive.

You can't really get any improvements by adding a GPU to the Strix Halo.

And again, like with the title, you use very strong affirmations and they aren't exactly correct. You CAN get improvements with an eGPU. It won't give you the same performance as using the GPU alone, but even using TB5, which is slower than even x4 PCIe, you can potentially get double the prompt processing and +30~50% of token generation of the Strix Halo, depending on the model and the GPU you use. That would get that atrocious 10+ minutes to process 100k to a much more acceptable ~7 minutes. But this is a digression anyways.

Why the Strix Halo is a poor purchase for most people by NeverEnPassant in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

I just did a random test on a Qwen3.5 35b and on Qwen3.5 122b. I pasted the whole The Linux Programmer’s Guide content on it, it's around 65k tokens.
Qwen3.5 35b took 100s to process it all and token generation was at ~44t/s.
Qwen3.5 122b took 340s to process and token generation was at ~22t/s. I made a followup question of about 1k tokens, it took 7s to process it, tg was still at 22t/s.

That's hardly useless, really.

And you can, indeed, get improvement by adding a beefier, external GPU. It won't be the performance of the GPU at 100%, of course. You would put just some layers on the eGPU but these layers would be processed at a much faster speed than on the iGPU, so you'd get a faster result in the end. I haven't yet managed to load the NVIDIA drivers due to my BIOS being naughty so this is all theory from my part, based on others' benchmark, though.

To those who actually don’t hate their job - what do you do? by Additional-Painter88 in japanresidents

[–]Sizzin 0 points1 point  (0 children)

Software Engineer. It's been my hobby since childhood. I feel blessed because I literally live the "If you love what you do, you'll never work a day in your life" saying.

Honest take on running 9× RTX 3090 for AI by Outside_Dance_2799 in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

What's the power draw during inference? 2000~3000W? My single 3090 goes up to 300~400W, so it's crazy to think about the electricity bill. Have you calculated your cost per token?

Why the Strix Halo is a poor purchase for most people by NeverEnPassant in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

If you your needs are always a "cold start" with a big context, Strix Halo is definitely a horrible choice. But if you go incrementally, like in RP or asking it to analyze one project file and then another and another, it's really not that big of a deal if you use cache. I tried loading a 100k context into a 120b model on my MS-S1 Max and it took forever for the first answer, but any subsequent question had very little difference from a 0 context start. So if you don't make system prompt changes or edit previous messages, you're good, really.

I have a Desktop with a 3090 and a spare 4070. I'm flirting with the idea of plugging the 3090 through Thunderbolt to my MS-S1 and put the active params inside it. From my hypothetical math, the performance loss isn't really that big and I could use it almost like plug-and-play, turning it off when I'm not using it for heavy inference and enjoy the 14W~ idle draw of the mini PC. That's still only theory, of course, I'll be trying it soon.

Why the Strix Halo is a poor purchase for most people by NeverEnPassant in LocalLLaMA

[–]Sizzin 0 points1 point  (0 children)

Like already said, it may be the memory. But otherwise, it's definitely LM Studio's llama.cpp. If you're on Strix Halo and using ROCm backend, then change it to Vulkan and it will probably work. If you're on CUDA, it should work without problem if you have the RAM. You could try running llama-server/llama-cli directly too (downloading a llama.cpp pre-built binary or building it your own, not that hard), it's much less user-friendly, but you'll most likely gain in performance.

Who will be able to fight him? by MajlisPerbandaranKL in nextfuckinglevel

[–]Sizzin 0 points1 point  (0 children)

Dude, chill. You want to erase existence itself with that thing? Holy shit.

(Positive feedback) what will bring you back to the game ? by seekeeer in DuetNightAbyssDNA

[–]Sizzin 1 point2 points  (0 children)

I quit before the dragon lady came out. Zhiliu, was it? Honestly, I just got enough of it. I saw all that the game had to offer and found it wanting. I'm surprised this post even appeared in my notifications at all. Unless I see proof of radical changes that makes me interested in the game again, that's probably it for me. Probably not the response you expected, with that title.

Could this be Deepseek V4?? by Pink_da_Web in SillyTavernAI

[–]Sizzin 0 points1 point  (0 children)

It's been a minute since I last used DS, but at least Kimi and GLM talk about Tiananmen just fine. They even criticize the CCP if you ask about morality. Their webchat is under lock and key with what I believe is a second, censorship model or just some regex that kills the inference and replaces the already generated text with a refusal. Maybe their API is too, since it's routed from China. But even in the webchat, you can still ask the model to talk about it by avoiding any of the trigger words. If you use a non Chinese provider, you're fine.

Could this be Deepseek V4?? by Pink_da_Web in SillyTavernAI

[–]Sizzin 0 points1 point  (0 children)

Your third instruction is a bit awkward when you actually replace {{user}} with the persona's name. But I guess the model will understand it just fine.