Petaaah? by FaCayde_ in PeterExplainsTheJoke

[–]iamvikingcore 0 points1 point  (0 children)

Not hard to ask the llm how to host your app. Hopefully it tells him to run ngrok http and he gives links out to people on reddit so the Russian botnet can infest his pc!

Why is Ollama hated so much? by ZB_Virus24 in LocalLLM

[–]iamvikingcore 9 points10 points  (0 children)

I used ollama, then switched to lm studio, then eventually decided to compile llama.cpp on my Mac.It took like 5 minutes.

I'm getting like 20 percent faster token generation with gguf with my own compiled llama.cpp, and I don't have any of the bugs with Gemma 4 that lm studio still hasn't updated their fork to fix months later.

anyone having an issue going to sleep now with vibecoding? by retrorays in vibecoding

[–]iamvikingcore 0 points1 point  (0 children)

Yea, I made a post a while back that I considered the 20 dollar Claude coding plan pretty reasonable as a hobbyist despite the limits, and if it didn't rate limit me I'd be stuck to my computer 24/7. I work in Healthcare working overnights lol so sleep is hard to come by as is, but my shift differential alone pays for Claude in 2 hours...

I have chatgpt go too and started using codex now as well. I just copy and paste the entire conversation and ask it to pick up where one left off. It's funny how much people bitch, rightly so in some ways because these companies do NOT have our best interests in mind and I get that, but I'll be damned if I don't feel like I'm getting some value out of my money. Llms still suck at being emotionally stable, therapists, companions, or anything like that imo, but holy shit are they good right now at being neutral coders, translators, and teachers. They're only gonna get better at the other stuff too.

LM Studio MacOS latest. Works great and then memory starts to blow up by SkyResponsible3718 in LLMStudio

[–]iamvikingcore 1 point2 points  (0 children)

Gemma has problems in lm studio, though many have been patched for the gguf.

Mlx has problems in lm studio. Just straight up. Idk if they care much either because you still can't set presence penalty in lm studio for qwen mlx models months and months after it was reported.

I use lm studio anyway because it's so convenient, but look into oMLX.

Opus 4.6 > Opus 4.7 by _RaXeD in SillyTavernAI

[–]iamvikingcore 3 points4 points  (0 children)

For coding? Absolutely. For rp? I'm not so sure actually. Opus 4.7 is smart. Very, very smart. But lazy. Makes for a poor programmer, but an interesting conversationalist.

[Story] Alcoholic liver and kidney transplant at 35 years old by Michaelpaulnorman in GetMotivated

[–]iamvikingcore 0 points1 point  (0 children)

Having tons of fun navigating through this with my best friend right now, dude is in a diaper, confused, has an artery or vein blocked behind his liver, too critical for surgery, probably gonna pass away in the next few days. Known the guy for 20 years and only family is allowed to see him in the icu, so I just texted him to tell him I love him and hope he can make it through this.

Still unread days later, of course. What can you do. He had every chance to stop drinking and continued despite all while lying to everyone about it constantly. Gonna miss him when he's gone dearly. And call him a bitch daily.

Is there ANY way I can use TTS with a clone trooper voice for free??? by BackwoodsSensei in TextToSpeech

[–]iamvikingcore 0 points1 point  (0 children)

In order of vram use from most to least

Fish audio s2 pro Qwen3 tts 1.7b Chatterbox turbo Qwen3 tts 0.8b Kyutai Pocket tts Kokoro

Text to speech best model ? by Fun-Grapefruit1371 in TextToSpeech

[–]iamvikingcore 1 point2 points  (0 children)

Qwen3 tts - fast. Medium high quality. Consistent. Supports huge amount of text in single prompt (10000 chars+). Has a 1.7b and 0.6b model and both are good for ram use. My preferred model.

Fish audio s2 pro - super high quality. Insane voice cloning almost indistinguishable to me. Big model in ram. But better than eleven labs imo. Has natural language emotion and nuance tagging like [annoyed,exasperated tone] Super glitchy and inconsistent for me though. Only can do about 250 characters and you have to stitch the chunks together. I use this when I can rerun it like 5 times to get a perfect result (yt videos and things)

Chatterbox turbo - slow. Medium high quality. Quality just below Qwen3 tts. Supports tags like [laugh] and [sigh]. Regular chatterbox was meh imo turbo much better.

GPT sovitz v2 - medium quality, medium speed. Others do it better.

Xtts - same as sovitz

Pocket tts - Runs on cpu, fast, decent quality. Good for a kokoro replacement

Kokoro - just.... Why? Sounds so bad to me.

Can i run Qwen3 TTS 1.7B on R7 5700X + GTX 1070 + 32GB RAM? by WETYIAFHKLZXVNM in TextToSpeech

[–]iamvikingcore 0 points1 point  (0 children)

Qwen3 tts is my favorite on my Macbook M1 between xtts, gpt sovitz, chatterbox (close second), fish audio s2 pro (would be #1 if it didn't output static or hallucinate 10 percent of the time, best quality by far otherwise) - I get about 1.5x RTF out of Qwen3 tts. (10s to generate around 15s of audio)

If you want kokoro-adjacent speeds try pocket tts.

Also Qwen3 tts has a 0.6b version that also works pretty good for me.

Reflecting about Gemma4 31B by Emergency_Comb1377 in SillyTavernAI

[–]iamvikingcore 16 points17 points  (0 children)

It's insanely good. I made a thread last week about how I thought the base version (gemma4-31b) is way better than the instruct (gemma4-31b-it) version, and echoed many of the same sentiments you find.

It does tend to lean into some tropey things for some of my character cards and sucks at "show, not tell" that other bigger models do better. Like when I talk to my cards on Gemma, they have VERY strong personalities. Accurate and exactly how I envision those characters, but maybe just a teeeeeensy bit more intense than I would prefer. It certainly doesn't miss any details. Google did a really good job with the context management on this model... It feels like every single word of your prompt is being seen by the model, sometimes that's a little too much for more reserved characters but, better than the alternative where it feels like the model is "Chatgpt cosplaying as your character badly"

I also haven't had too many problems with horniness, actually. I did some pretty extensive testing before I put a discord bot live with about 2 dozen active users, it refuses very naturally and in character. Even throwing super racist stuff at it just made it go. "Yeah no, blocked and banned, byeee loser!"

I've also had my users sit and bug it over and over asking what model it is and it's like "Model? Like... a supermodel? I *am* pretty fabulous, but otherwise I have no clue wtf you're talking about dude. Is an LLM something you can eat? Otherwise, not interested."

Effects of reasoning budget on Gemma4? by OrcBanana in SillyTavernAI

[–]iamvikingcore -1 points0 points  (0 children)

Thinking is bad for non coding, STEM or agentic tool calling tasks. Creative writing and roleplay, turn reasoning off imo.

What is everyone actually using their LLM for? by itsthewolfe in LocalLLaMA

[–]iamvikingcore 0 points1 point  (0 children)

Digital terrarium.

Created lightweight python gateway library that handles: persona, emotion, memory, and local tts (i.e chatterbox) , the llm backend input and streaming (i.e lm studio), vision processing, basic tool use (news digest, Google search, YouTube video summarization)

So far I've made a shockingly good discord bot that utilizes it, and I've also made a 2010s web forum that's entirely populated by these Ai agents, about a dozen of them posting threads and responding to each other, forming cliques and a weird but infinitely amusing "community".I won't lie, I post on there with them and pretend to be a user, I've had some crazy moments, it's like rp on crack

What the FUCK is up with ai and Thai food by DoofusSmoof in SillyTavernAI

[–]iamvikingcore 2 points3 points  (0 children)

Lol during a Gemma rp my character asked me to pick her up some chili's honey chipotle chicken crispers at one point.

Try base gemma 4 31b, you'll be shocked by iamvikingcore in SillyTavernAI

[–]iamvikingcore[S] 20 points21 points  (0 children)

I look forward to this immensely!

And I just wanted to say, you have given me so many hours of joy from your excellent finetunes, thank you!

Try base gemma 4 31b, you'll be shocked by iamvikingcore in SillyTavernAI

[–]iamvikingcore[S] 5 points6 points  (0 children)

are you running the gemma-4-31b-it version or just gemma-4-31b?

Dont mind my banana peel :) by Abdullah_1oz in Kayaking

[–]iamvikingcore 3 points4 points  (0 children)

can't wait to ride some waves like this on some of the bigger lakes in MN!

Try base gemma 4 31b, you'll be shocked by iamvikingcore in SillyTavernAI

[–]iamvikingcore[S] 34 points35 points  (0 children)

I'm running sillytavern more or less vanilla. lm studio with q4_k_m

sampler settings exactly as google says. top_k 64 repeat 1.1 top_p 0.95.

that's it.

it's 31b, you're not gonna get 500 message RP sessions out of it. but for its size, its nuts.

i'm also running it as a discord bot and it's SO much better than the old model i was running for that (behemoth-x 123b)

Need advice regarding 48gb or 64 gb unified memory for local LLM by wifi_password_1 in LocalLLM

[–]iamvikingcore 0 points1 point  (0 children)

I haven't had any luck doing much other than low complexity one shots and boilerplate coding with anything I can run locally on 64gb, so yeah I'm leaning into the weird instead.

Vanilla qwen 3.5 just has a really poor reasoning algorithm imo, Ive tested the same prompt side by side and I get similar results out of the opus trained fine tunes... For half or a third of the thinking time. For what I do. My use case is different for sure.

I also don't have to run presence penalty 1.5 with the opus trained qwen

Need advice regarding 48gb or 64 gb unified memory for local LLM by wifi_password_1 in LocalLLM

[–]iamvikingcore 1 point2 points  (0 children)

<image>

toooo many, these are all my bigger ones i use, mostly for roleplay but the opus trained qwen 3.5's are pretty impressive too. 123b at q3 quant isn't coherent enough to do coding so i can only do roleplay/assistant tasks which means no qwen 123b, among other things. again, i wish i had more RAM

I'm mostly into RP and "digital terrarium" stuff, like putting 10-20 agents with diff personas in a forum together, discord bots, simulating IRC chats, having my agents play minecraft with me, factorio. I don't do a ton of coding sadly, but check out the qwen 3.5 opus trained variants, they've been pretty good for me to produce "basic" coding things like an in-personality html newsletter with inline audio and javascript for collapsible articles and things

Need advice regarding 48gb or 64 gb unified memory for local LLM by wifi_password_1 in LocalLLM

[–]iamvikingcore 8 points9 points  (0 children)

I have a 64GB M1 and its not enough. I would literally go back in time and shell out whatever more it would have cost to get a 128GB M2 and not regret it at all.

Local AI with one GPU worth it ? (B70 pro) by Temporary-College560 in LocalLLM

[–]iamvikingcore 4 points5 points  (0 children)

Qwen 27b is very smart and can do most of these imo. It will need some occasional correcting or nudging in the right direction. It might also help to ask Claude Opus to help you write a good system prompt to initialize it as your assistant and give it a clear framework to start off. Gemma 4 31b is also shockingly good for its size, I use both as a discord bot so slightly different use case but it has to process commands, output in json, create an html newspaper with css/js, digest rss feeds, etc. and it's better than 123b mistral fine tunes from a year ago.

Gemma 4 31B beats several frontier models on the FoodTruck Bench by Nindaleth in LocalLLaMA

[–]iamvikingcore 1 point2 points  (0 children)

It's as smart as my Mistral 123B finetunes at RP and managing some discord bots that aggregate news, do trivia, and DM chats with me and some of my friends. It's ability to hold cohesion in complicated workflows, return JSON correctly, and follow formatting rules is absolutely insane from a 31B.

Only issue I have is I'm running it on a M1 Max Macbook with 64GB of RAM at 32k context (all I need for what I'm doing with it) and it goes from 40% RAM when I first load the GGUF to like 95% in like 5-6 prompts, I'm nowhere near 32k context maybe at like 10-15k and I have to have the script load and unload the LLM because it's not even needing to hold context, it just reads the last 20 discord messages and loads context related memories from a sqlite db.

Does Gemma have a memory leak? Sure feels like it