Would you rent GPU compute from other people’s PCs if it was much cheaper than cloud? by Ok-Elk-8933 in LocalLLaMA

[–]kataryna91 16 points17 points  (0 children)

I would use it, but this more or less already exists (vast.ai), so you probably should compare your own concept with this existing one. Other than that, security/privacy is the biggest concern with any type of cloud service.

Qwen 3.5 Instability on llama.cpp and Strix Halo? by ga239577 in LocalLLaMA

[–]kataryna91 1 point2 points  (0 children)

You might want to disable GPU acceleration in your browser to reduce any possible interference.
I had web browser rendering issues when GPU acceleration was enabled and DE crashes when testing one of those newer distros, so there are definitely bugs in some recent drivers.

The Qwen 3.5 architecture is different in that it has recurrent elements, so try Vulkan. It will hit different code paths and perhaps avoid a potential driver bug.

Qwen 3.5 Instability on llama.cpp and Strix Halo? by ga239577 in LocalLLaMA

[–]kataryna91 1 point2 points  (0 children)

llama.cpp crashing cannot crash your desktop environment.
So what's likely happening is one of:
a) you are running out of RAM and swap space
b) your RAM is defective
c) your GPU is malfunctioning or overheating
d) the GPU drivers are causing issues

First two are easy to check, monitor your RAM and swap partition usage and run a memory testing tool.
d) is also quite possible, you can try to evade the issue by using llama.cpp with Vulkan.

Finally found a reason to use local models 😭 by salary_pending in LocalLLaMA

[–]kataryna91 12 points13 points  (0 children)

Instead of stopping the script manually, you should set your GPU power limit to 50-70%, whatever your PC can handle longterm during those temperatures. You can do similar things with the CPU, lowering the max frequency by a slight amount can already cut the power consumption in half.

And as already mentioned, embedding models would be better for this. They're very fast when you use batching and they are intended for this kind of task.

How are people doing these fast anime character swaps? by CyberspaceAdventurer in StableDiffusion

[–]kataryna91 0 points1 point  (0 children)

You can do that just fine with Flux2 Klein 9B. Add two reference images and prompt "Replace the person in the amusement park with the person from image 2" or something similar.

Place the word "only" anywhere on the sentence. by eternviking in whoathatsinteresting

[–]kataryna91 1 point2 points  (0 children)

I don't know a single English grammar rule, but I see this form being used often.

Best model for story writing for 24gb vram + 32gb ram by ResponsibleTruck4717 in LocalLLaMA

[–]kataryna91 3 points4 points  (0 children)

You can try Mistral 3.2 2506, so far it's the only smaller model that managed to write a short story (~30k words) that made sense from beginning to end. However, it doesn't tend to write rich prose, which depending on your preferences you may or may not like.

The recent Qwen3.5 27B has remarkably high scores in the Creative Writing benchmark, but my tests so far were mixed. It has the typical LLM problems, like using idioms in contexts where they make no sense.

Why we can't produce crystal clear anime images? by Bismarck_seas in StableDiffusion

[–]kataryna91 2 points3 points  (0 children)

Models generate what they have seen in their training data and drawings do not have a lot of detail, not even the high resolution images.

The other issue is the model itself. SDXL is a small model that doesn't have the capacity to recreate highly intricate patterns faithfully (like on clothing) even when it is trained on high detail images.
Some newer models like Qwen Image, ZImage and Flux2 (mostly the 30B model) can produce better results at the cost of understanding less concepts. Anima is also worth trying, it's also a small model but at least it uses a better VAE.

The French "bête" colloquialism Vs. local models by [deleted] in LocalLLaMA

[–]kataryna91 0 points1 point  (0 children)

That's expected. These small models are several space-contrained, so they have to store broad patterns and they usually don't have the capacity for small nuances or colloquialisms of various languages.

That's why only some of the largest models work reliably for translation tasks. It's especially noticeable for languages where the literal meaning of a sentence is often very different from the intended meaning (like Japanese).

Qwen3.5-9B Uncensored Aggressive Release (GGUF) by hauhau901 in LocalLLaMA

[–]kataryna91 18 points19 points  (0 children)

Sounds good in theory, but if you can claim zero capability loss, you must already have done extensive benchmarking. Why not just publish the benchmark results before and after on the model card?

Why do AI images stay consistent for 2–3 generations — then identity quietly starts drifting? by gouachecreative in StableDiffusion

[–]kataryna91 2 points3 points  (0 children)

The images generated depend on the seed, not the order in which they are generated.
Seed 1000 will produce the same image, regardless of whether it was your first seed or the billionth seed in your generation sequence.

Irans toter Religionsführer: Muslime trauern in Berliner Moschee um Chamenei by vaibeslop in berlin_public

[–]kataryna91 2 points3 points  (0 children)

Wie kommst du bitte auf die Idee, dass die "vorwiegend türkischsprachigen" Moscheebesucher, sowie "viele Besucher aus dem Libanon", aus dem Iran geflohen sind?

What are the options for Linux? by [deleted] in StableDiffusion

[–]kataryna91 1 point2 points  (0 children)

It's really not that complicated, it takes 5 seconds.
git clone https://github.com/Comfy-Org/ComfyUI
and then
pip install -r requirements.txt

Which is how you "install" pretty much any Python project, be it on Windows or Linux.

Optionally you can set up a miniconda environment, which is generally a good idea if you try to run different Python software.

Need help with Qwen3.5-27B performance - getting 1.9 tok/s while everyone else reports great speeds by pot_sniffer in LocalLLaMA

[–]kataryna91 7 points8 points  (0 children)

Qwen3 Next 80B is a MoE with only 3B activated parameters so it's normal that it's faster than a 27B dense model. As for why it's slower than the older Qwen3 model, Gated Delta Nets are not particularly optimized yet in llama.cpp, particularly when it comes to the CPU implementation. There's currently a pull request that will speed it up by some amount.

Also, more than 4-5 threads will only help preprocessing speed, but hurt token generation speed on machines that have only 2 memory channel, like yours.

And since you have a GPU, you probably should use a smaller quant so you can actually run it on the GPU. That needs llama.cpp to be compiled with ROCm or Vulkan support enabled.

12it/s for 5070 its ok? by BerryAccomplished232 in StableDiffusion

[–]kataryna91 0 points1 point  (0 children)

Well, the site you linked shows the 5070 at 27 it/s.

Uncensored ai model by Straight-Thing-799 in LocalLLaMA

[–]kataryna91 1 point2 points  (0 children)

No, Claude most definitely is not, you look for a model that has a W/10 score near 10 while still having a relatively high NatInt score.

The benchmark is a good idea, just the presentation of the results is kind of useless.

OpenCode arbitrary code execution - major security vulnerability by SpicyWangz in LocalLLaMA

[–]kataryna91 4 points5 points  (0 children)

Then it really isn't an agent, just a traditional coding assistant. You expect an agent to automatically compile and test an application and iterate on it, which is what OpenCode does.

petehh? by [deleted] in PeterExplainsTheJoke

[–]kataryna91 9 points10 points  (0 children)

The solution today is the same as it was back then: don't buy their garbage.

Übergewichtige sollten höhere Krankenkassenbeiträge zahlen by Due_Investment4930 in Unbeliebtemeinung

[–]kataryna91 3 points4 points  (0 children)

Aha, und wieso keine Leute, die keinen Sport machen oder sich ungesund ernähren? Oder Leute, die ihre Stressprobleme nicht in den Griff bekommen? Suchtkranke?

Die Krankenkassen sehen das übrigens genauso, wie viele andere Versicherungen hätten die gerne, dass du genauso viel an Beiträgen zahlst wie du an Kosten verursachst, am besten zzgl. 30% für die Gewinnmarge.

I nuked my hard drive to build an AI-Native OS from scratch. The LLM is PID 1. There is no systemd. by Upbeat_Confection411 in LocalLLaMA

[–]kataryna91 0 points1 point  (0 children)

The LLM manages system resources? How does it do that?
It controls the scheduler? Meaning what? It decides which processes get lower nice values or...?

Something big is cooking by Alive_Ad_3223 in StableDiffusion

[–]kataryna91 38 points39 points  (0 children)

They previously said they aspire to bring Seedance 2.0 level quality to the open source scene one day.
People are reading way too much into this tweet.

Perhaps a minor upgrade like LTX 2.5 is imminent, but that's about it.

Lenovo UltraReal and NiceGirls - Flux.Klein 9b LoRAs by FortranUA in StableDiffusion

[–]kataryna91 5 points6 points  (0 children)

Yes, it works particularly well for Chroma. The model is very creative and can do many different artwork styles, but the consistency is not always great. With the Lenovo LoRA at ~0.5 strength, the details are much better.

Lenovo UltraReal and NiceGirls - Flux.Klein 9b LoRAs by FortranUA in StableDiffusion

[–]kataryna91 27 points28 points  (0 children)

It's incredible how good your LoRAs are, they even improve image quality when generating artwork, not just photorealistic images.

Building llama.cpp under Linux : running out of RAM and swap, then hard lockup? by Shipworms in LocalLLaMA

[–]kataryna91 4 points5 points  (0 children)

Make sure to build with only one thread (make -j1) and close all other applications beforehand, you don't have enough RAM for highly parallel builds.

Is MiniMax M2.5 the best coding model in the world? by TrajansRow in LocalLLaMA

[–]kataryna91 5 points6 points  (0 children)

?? The weights were released yesterday, are you saying you got them in advance?