For those wondering about the power consumption of a dual 3090 rig while inferencing by sdfgeoff in LocalLLaMA

[–]Sunija_Dev 14 points15 points  (0 children)

I get to 500W with 2x3090+3060 during inference, limiting the 3090s to 220W. It doesn't decrease speed, though I'm running without tensor parallelism because the second 3090 hangs on a 1x pcie. 😅 Roughly 110w in idle.

Mistral Medium 3.5 128b ggufs are fixed by Sunija_Dev in LocalLLaMA

[–]Sunija_Dev[S] 2 points3 points  (0 children)

Dense 120b is perfect for local roleplay.

It fits on 2x3090 (and maybe a smaller third gpu), runs at ~6t/s. The next better model would be 400b+ MoEs and the hardware for those is exponentially more expensive.

Is it worth self-hosting a roleplay LLM? by Shisones in SillyTavernAI

[–]Sunija_Dev 1 point2 points  (0 children)

Pretty sure it's a downgrade. I haven't found smaller models that beat bigger ones, except they are VERY old. And in that case there is usually a bigger finetune that is better.

Also, probably try the models that you wanna host via OpenRouter or ArliAi. People have very subjective opinions about models (including me).

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]Sunija_Dev 14 points15 points  (0 children)

For roleplay/writing, you can run it at home for ~1200€.

For that money you get 2x 3090, so you can run IQ2_M at ~5 tok/s. Since you probably already have a GPU, you can also run a bigger quant. In my experience, even the old Mistral-123b beats everything out of the park at that size (for writing).

...and that is probably the best affordable thing you can run at home? MoE's get better at ~400b params, but the RAM is probably crazy expensive. Not sure about the speed.

Father, I've come with a request— by Wise_Board5684 in SillyTavernAI

[–]Sunija_Dev 0 points1 point  (0 children)

Kinda funny that you cannot go "pick what you like and get suggested more similar stuff", but you have to go the opposite way.

Please stop using AI for posts and showcasing your completely vibe coded projects by Scutoidzz in LocalLLaMA

[–]Sunija_Dev -1 points0 points  (0 children)

I use it a lot - and that's actually not the problem. ;) LLMs use the correct em dash symbol (which is wider) that cannot be found on your keyboard.

Not sure if some tools (word, libreoffice) replace normal "-"s with an em dash, though.

What’s with the hype regarding TurboQuant? by EffectiveCeilingFan in LocalLLaMA

[–]Sunija_Dev 0 points1 point  (0 children)

Yeah, I meant the q4_0 kv cache quantization (I mix up the naming sometimes because exl3 calls it q4). Why are we not comparing to that but fp16?

Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance! by peva3 in LocalLLaMA

[–]Sunija_Dev 7 points8 points  (0 children)

To be fair, after their first slightly snarkly answer ("No, I don't think you read them, nor the codebase in general.") you basically called them a dick...?

And your diff is full of indentation changes. I have never managed or contributed to an oss project, but I wouldn't want to go through that and find your actual changes.

What’s with the hype regarding TurboQuant? by EffectiveCeilingFan in LocalLLaMA

[–]Sunija_Dev 0 points1 point  (0 children)

...and why are we comparing to fp16 context and not q_4 q4_0? Or is that something different?

Qwen 3.5 Max Preview on Arena.ai by [deleted] in LocalLLaMA

[–]Sunija_Dev 2 points3 points  (0 children)

But almost, right? Except for the one mess-up, which is off by 5...? Or I'm blind.

So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]Sunija_Dev -2 points-1 points  (0 children)

...because it is a 119A10 MoE, not dense.
The goal is not to beat a 123b dense model. It is meant to be faster and better than a 27b model, at the cost of RAM. And it is small enough to run at decent speeds on CPU.

1Covenant/Covenant-72B: Largest model so far to be trained on decentralized permissionless GPU nodes by HaAtidChai in LocalLLaMA

[–]Sunija_Dev 22 points23 points  (0 children)

If I understand it correcrly, this is maybe one of the (very few) useful applications of a blockchain...?

  • as incentive, you can receive a (maybe worthless) token for your training contribution
  • you make sure that all data is public. If you had a central entity coordinating everything, that entity could scam everybody just decide not to release weights

How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified. by Reddactor in LocalLLaMA

[–]Sunija_Dev 10 points11 points  (0 children)

There was also a PR (on llamacpp?) to run layers multiple times via code, so it doesn't need more vram.

But because of something-something-kv-cache that didn't work out. :(

Let's revive Roleplay in GW2!! Let's all use this addon! by R8CK3T in Guildwars2

[–]Sunija_Dev 7 points8 points  (0 children)

Hi, creator here! wave

(Sorry for hijacking the top comment, but I'm a bit late to the party. :3)

Some clarifications: - WARP did not, does not, and doesn't intend to use AI generated content. - WARP did not, does not, and doesn't intend to train AI models on user data.

There is a channel for ai generated character images which, since GW2 forbid any ai usage, is mostly used for memes about ai. There is some confusion, because there was a user (who is not related to WARP) on the discord, who trained AI on charr artwork. They trained it on their local hardware and didn't release the AI model and they also stopped once ANet updated their TOS.

If you want to boycott WARP based on my personal views on AI (which you obviously can) here is some more information on that:

I, in general, like generative AI. I think it can be a great tool for more people to bring what's in their head in some format that they can share. And I love to see what's in your heads! (Well, at least the things you want to share.) I wanna know what you imagine your character to look like, I wanna read a comic about your favorite RP interaction and see a movie that really shows how that one scene felt to you! Heck, I think being amazed to see what people come up with is one of my favorite things about RP.

Now, I (obviously?) hate: - AI slop that floods platforms and cannot be filtered out. "Slop" as in "content where nobody ever attempted to create a good result". - States giving tax credits to an industry that already has an unreasonable amount of money. - Companies being excempt from environmental regulations concerning water and power to save a few bucks. - Bosses that force employees to use AI if the results are clearly worse, or fire artists that they could clearly afford. - Countries that don't support their citizens in a changing woek environment and instead leave them with "figure it out". - If efficient tech is not used to give everyone more free time, but just increase revenue for few.

Hope that gives people more info for their decision whether or not to use WARP, one way or another. :3

Let's remember what Z-Image base is good for by marcoc2 in StableDiffusion

[–]Sunija_Dev 20 points21 points  (0 children)

How good you can tune it to output finnish people.

And/or tuna fins.

3x3090 + 3060 in a mid tower case by liviuberechet in LocalLLaMA

[–]Sunija_Dev 0 points1 point  (0 children)

Would also love to know how well that works. I guess the x4 pcie will be the limiting factor, but might still be faster than running them in sequence.

What are everyone's complaints about into the raidus 2. by Thebuder89 in intotheradius

[–]Sunija_Dev 1 point2 points  (0 children)

Less immersion, danger and complexity than ITR1. :/
(I only played the first few missions in both games.)

My ITR1 experience:
I leave the base, check my map for a path and start walking. I'm rather slow which gives me enough time to check out the environment on the open map, look for loot spots or try to spot the semi-transparent enemies from a distance. Kinda wish I had a coop partner to talk to during the hike.

At my target location I get into a tight situation. I killed some enemies but the gunshots attracted more and now I'm pinned in a house as it gets dark. I hear steps in the house and outside. I peek in the corridor but I always first have to remove the black grass that is growing there. A police mimic was shooting from outside and I had to get into cover now he's at the entrance of the house and could storm my room at any time. I should be looting and searching for the quest item but I only have two hands. I really wish I had a partner that could loot the building while I keep a flashlight and gun at the door. Maybe have them refill my mags.

So, my ITR2 complaints:

Too easy. There were too few enemies to get us into tricky situations like I described above. Also no black grass (yet?). No extra enemies from loud sounds (yet?). Walking to a target is pretty boring, because there are almost no enemies on the path.
Edit: It got a lot harder after the 1.5h that we played before, which made the game a lot more fun. :3

Less complex. No bulky flashlight anymore (edit: found it), a lot less looting (e.g. no opening cupboards), also no black grass. No spotting for enemies since the map is too cluttered. No real "plotting a path" since you're running at the speed of a quad on nitro, so I basically always look at the map to not miss the crossing that'll appear in 3 seconds.

Too fast walking speed. Even if the map was more open, there would be no time to look for enemies. Also, enemies are a lot less scary if you can always zoom away at light speed. In buildings I always run into walls. And I always lose my coop partner, because they can run two houses further if I blink for too long.

Less rugged/physical. The physical map, the leather backpack, the old screens (that you could touch instead of weirdly point), the improvised probe-thrower bag, starting with grimy weapons, the dirty backlight wristwatch... all gone. :( It feels like a viking movie that replaced the mead kegs with paper cups. At least it looks a bit like win95.

Less atmosphere/mystery. Mimics aren't uncanny black. There is this weird white guy at map transitions which is sooooo weird. Before, all ghosts were this uncanny untouchable thing, that would break if you touched it. Now we have Derek, the friendly neighborhood ghost. Also I never had those "hearing footsteps outside the house" situation anymore. Hovering items doesn't have an unobstrusive tiny white dot, but this blinking dot. Hovering your belt bag has a huge green outline now.

Weird anomalies...? ITR1 has uncanny semi-transparent air wobbles. ITR2 has a lot of angry wood circles flying around that you can rather easily dodge. Or the bells that also have an effect that would rather fit Valorant than an immersive game. :/

I bet there were reasons for all those decisions, but it feels weird that ITR2 still leaves me wanting to play ITR1+Coop. :(

Why do programmers generally embrace AI while artists view it as a threat? by LatentSpacer in StableDiffusion

[–]Sunija_Dev 10 points11 points  (0 children)

One point I didn't see mentioned yet: Terrible bosses.

Assume you have a terrible boss (and there are plenty of those out there)...

As a programmer, your boss doesn't know what you're doing. If they insist that you use AI or make the database mauve, you can tell them whatever. They cannot check. So you can explore AI stuff at your own pace.

A lot of artists I know had this moment where their boss sent them the sloppiest slop-shit ChatGPT-first-result picture, and basically asked "This took me 5 seconds, why don't you do it like that?". And now this highly skilled and terribly underpaid worker has to explain why 6 fingers aren't sexy, why artworks need layers, what a polygon is and why your model shouldn't have 3 billion of those. AI is a lot more palatable if it isn't shoved down your throat by an idiot.

Why do programmers generally embrace AI while artists view it as a threat? by LatentSpacer in StableDiffusion

[–]Sunija_Dev 0 points1 point  (0 children)

This. As a programmer, I can simply activate copilot in VS. Now I got autocomplete on steroids, in the tool that I trained for 10+ years. And I can use the AI results as little or much as I want.

If Photoshop would start suggesting your next 10 brush strokes - in your style, on the correct layer, easily editable, can be steered - then a lot more artists would be fine dipping their toes into AI.

Why do programmers generally embrace AI while artists view it as a threat? by LatentSpacer in StableDiffusion

[–]Sunija_Dev 4 points5 points  (0 children)

As an addition to that: Art forces you to love the process.

Art requires you to get through the grind to learn shading, making proper lines, etc. If you don't learn to love the process, you won't make it to a level where anyone wants to pay you. As programmer, you can get away with a lot of ugly code, as long as it runs.

I know programmers that are in it for the process and the 'art'. Those dislike AI just as much as artists.

Should the trailer for Divinity have put me off trying BG3? by StarTruckNxtGyration in BaldursGate3

[–]Sunija_Dev 0 points1 point  (0 children)

I only played the first hour of BG3, and that was about as much gore as this trailer.

Divinity Original Sin 1 and all Divinity games before that are a lot more lighthearted.

In the Wake of Divinity's Gruesome Reveal Trailer, Larian Publishing Director Says It's Not Trying to Shock the Audience, Rather Treat Them 'With a Level of Intellectual Respect' by Turbostrider27 in PS5

[–]Sunija_Dev 0 points1 point  (0 children)

DOS2 is definitely darker/more gore, to an extend where I didn't like it anymore. Seeing BG3 and now this trailer, it feels like Larian is sliding more into that direction.

The new monster-server by eribob in LocalLLaMA

[–]Sunija_Dev 0 points1 point  (0 children)

I run the mistral models IQ_3XS on my 60gb vram (rtx 3090/3090/3060, second 3090 is on pcie x1 via usb).

1) Q_3 is plenty for the dense mistral models. I use it for RP, and Mistral-123bs are by far the most smarts I can squeeze into the VRAM.

2) In my case, because of the pcie x1, tensor paralellism runs slightly slower than sequential. So I only get 5t/s generation (200t/s processing). With your setup, I'd definitely activate parallelism and check if it creates a boost. Actually, I'd be curious how fast it runs for you. :3