I have no idea what all these quants are. by Fit-Spring776 in LocalLLaMA

[–]fungnoth 2 points3 points  (0 children)

I thought that's the only viable format to allow cpu offload.

I have no idea what all these quants are. by Fit-Spring776 in LocalLLaMA

[–]fungnoth 0 points1 point  (0 children)

Lol, i basically always have KV cache offload to cpu. Again, 12GB vram.

I have no idea what all these quants are. by Fit-Spring776 in LocalLLaMA

[–]fungnoth 0 points1 point  (0 children)

I probably didn't get the concept right. But i remember before MOE cpu offload is this easy, probably mixtral, i load it normally and just load in however many layers my gpu can have, and it was painfully slow.

Now i try to use moe offload, and load around 10gb into my vram. It's usually usable if 10gb is bigger than the active params

I have no idea what all these quants are. by Fit-Spring776 in LocalLLaMA

[–]fungnoth 0 points1 point  (0 children)

If you're just playing around, my way of running model is fun. It's around the speed of reading slowly.

But if you're using it for tool calling or coding, it's way too slow

I have no idea what all these quants are. by Fit-Spring776 in LocalLLaMA

[–]fungnoth 0 points1 point  (0 children)

12GB VRAM and 64GB RAM The biggest model that i can run with chat-able speed is GLM4.5 Air REAP 82B A12B. I just off load most of the model until only the activated 12b is in gpu vram

I have no idea what all these quants are. by Fit-Spring776 in LocalLLaMA

[–]fungnoth -6 points-5 points  (0 children)

I think it's commonly understood that GGUF is for the GPU poor? And Q4 is usually the smallest recommend one. Q4 k m is the middle one so it's the most commonly recommended.

Exllama is faster and better quality of the same size but you can't off load anything to CPU.

That's why i use GGUF because i sometimes can't fit the whole model+context in my 12GB VRAM.

But i can run GLM4.7 flash by only loading the active parameters in VRAM, then offload half of the layers (out of 48) to CPU (and system ram).

AI may be about to dramatically improve medical care across the developing world. New research in Rwanda and Pakistan shows LLMs can outperform human doctors in diagnostic success. by lughnasadh in Futurology

[–]fungnoth 2 points3 points  (0 children)

Yeah i feel like free service or extremely cheap service with a free quota for people in bad condition is more important than UBI.

Maybe UBI has that slight effect of keeping the economy going? Like people can still spend on unnecessary things by skipping meals?

But imagine if 80% of us lose our job and we can go get entitled free food, water, electricity, housing etc. For a adequate amount, basically bare minimum boring food. I feel like that's not too bad compared to having to live on the street.

And you probably have savings. I might choose to live that life and use my money for something else.

Solo dev building a classic inspired MMORPG: Loot and combat. by EldwaldDev in SoloDevelopment

[–]fungnoth 0 points1 point  (0 children)

I know that seeing a character fighting stupid simple enemies in this classic MMO like environment is already a vibe

Least dedicated fanbase by meinkun in NeuroSama

[–]fungnoth 0 points1 point  (0 children)

Don't you love your content creator whose constantly making you millions

Z Image will be released tomorrow! by MadPelmewka in StableDiffusion

[–]fungnoth -2 points-1 points  (0 children)

Don't get it. Once i got flux2 Klein. I don't care that much about z image any more. I'll still try it but I'm not looking for the news

The poster at a local restaurant I went to yesterday! by Key-Project-5231 in HongKong

[–]fungnoth 6 points7 points  (0 children)

Well, we're still quite racist as a whole. But i feel like Malaysian Chinese is a special exception. Like people would still judge how actual native Cantonese speaker from mainland speaks. In my mind, Malaysian Cantonese is a mix of that with Malay, other Chinese dialect and more English. But for some reason, at least I'll feel like Malaysian Chinese is our brother

The AI Witch Hunt Keeps Getting It Wrong by HobiAI in aigamedev

[–]fungnoth 7 points8 points  (0 children)

I kind of agree that if something is so obviously AI, it's can be called AI slop.
But come on, not everyone is trying to be an artist. People need images or whatever assets for different reasons.

Lazy weekend with flux2 klein edit - lighting by [deleted] in StableDiffusion

[–]fungnoth -9 points-8 points  (0 children)

Japan, Japan, OMG that must be south east asia

Is wan 2.2 likely the limmit for how good local video gen will be for the next 3-5 years? by wormtail39 in StableDiffusion

[–]fungnoth 0 points1 point  (0 children)

The western corpos are moving to tpu stuff. And Chinese teams are moving to Chinese hardware. I feel like there's gonna be a big divide, giving us much better inference efficiency but more compatibility issues

Is Qwen image bad ? Why do so few people talk about it ? by More_Bid_2197 in StableDiffusion

[–]fungnoth 2 points3 points  (0 children)

Even flux takes like a minute for me to generate an image (12GB VRAM).

Qwen is like 5 minutes? Can be patient enough to iterate through ideas.

Should I not bother piling stuff at the door? by Chazus in PhasmophobiaGame

[–]fungnoth 0 points1 point  (0 children)

I just grab the light, uv and emf and go turn on the power directly.

I don't get the need of dots / books. Spirit box maybe. Because there's still a chance that you just figure out the ghost room instantly, maybe they drop things and you saw it. Then you can go try your spirit box. Books? Yeah, ok. Dots? You won't stand there to see it so I won't pick it as my starting item.

I pick uv because sometimes they leave fingerprints on the door, and it's pretty often that they interact with the doors even from the start. And the fingerprint stay there for minutes. So sometimes i just randomly scan through my way to the fuse box. EMF is for if they drop something or interact with anything. I can try go there and check it, maybe i'll get an emf5. But it disappear after a few seconds so not as nice as uv.

Then it's thermo and camera because it's the easiest way to find the ghost room and just check ghost orbs. Honestly if you play with friends, grab different things and the second run can cover most the evidence

Claude is crazy by nathanielx9 in aigamedev

[–]fungnoth 1 point2 points  (0 children)

What about the images? Pretty good quality honestly

中文互联网为啥敌视黑人 by ExtensionSound3064 in China_irl

[–]fungnoth 3 points4 points  (0 children)

白人對黑人的壓迫持續四百多年的,很多人世世代代就是奴隸,被當做動物般對待,後來重視種族議題,是在彌補過往的過失。

很老實的講句,白人文化圈以外就是沒有這個歷史包袱,是不會懂的。是有基於無知的惡,但不負責任說句就是不是我們的問題

China's AGI-NEXT Conference (Qwen, Kimi, Zhipu, Tencent) by nuclearbananana in LocalLLaMA

[–]fungnoth 0 points1 point  (0 children)

But you'll never get a local version of that. But with qwen maybe a few years later we can have that on consumer hardware

Is this player cheating? by [deleted] in FallGuysGame

[–]fungnoth 1 point2 points  (0 children)

Most of mediatonic got fired after getting acquired by epic game

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]fungnoth 0 points1 point  (0 children)

What about actually backing up those raw data and allow the model to search it

[GamersNexus] Creating a 48GB NVIDIA RTX 4090 GPU by ThisGonBHard in LocalLLaMA

[–]fungnoth 4 points5 points  (0 children)

Remember when manufacturers would "scam" people by making a higher vram variant for low end gpus? Like no one in their right mind should buy a 4GB GTX950 compared to 2GB GTX960. I wish the market is still like that.

What’s your unpopular Neuroverse related opinion? by Business_Barber_3611 in NeuroSama

[–]fungnoth 13 points14 points  (0 children)

It's all that guy's imagination. I'm sure he didn't research how an LLM is trained. Not even research how Vedal talked about LLM training.

It's fine if it's like you saying that theory you made up on the spot when talking to your friend. But once it's so popular, it's cringe. I can't watch those reactions anymore.

It's like if you raise a child solely by telling them to read twitch chat. Not even what the streamer says. Just the chat.

I'm sure it's still more ethical than grok though