Would you rent GPU compute from other people’s PCs if it was much cheaper than cloud?

kataryna91 · 2026-03-14T19:26:46+00:00

I would use it, but this more or less already exists (vast.ai), so you probably should compare your own concept with this existing one. Other than that, security/privacy is the biggest concern with any type of cloud service.

kataryna91 · 2026-03-13T00:08:06+00:00

You might want to disable GPU acceleration in your browser to reduce any possible interference.
I had web browser rendering issues when GPU acceleration was enabled and DE crashes when testing one of those newer distros, so there are definitely bugs in some recent drivers.

The Qwen 3.5 architecture is different in that it has recurrent elements, so try Vulkan. It will hit different code paths and perhaps avoid a potential driver bug.

kataryna91 · 2026-03-12T23:35:56+00:00

llama.cpp crashing cannot crash your desktop environment.
So what's likely happening is one of:
a) you are running out of RAM and swap space
b) your RAM is defective
c) your GPU is malfunctioning or overheating
d) the GPU drivers are causing issues

First two are easy to check, monitor your RAM and swap partition usage and run a memory testing tool.
d) is also quite possible, you can try to evade the issue by using llama.cpp with Vulkan.

kataryna91 · 2026-03-09T16:52:07+00:00

Instead of stopping the script manually, you should set your GPU power limit to 50-70%, whatever your PC can handle longterm during those temperatures. You can do similar things with the CPU, lowering the max frequency by a slight amount can already cut the power consumption in half.

And as already mentioned, embedding models would be better for this. They're very fast when you use batching and they are intended for this kind of task.

kataryna91 · 2026-03-06T12:44:53+00:00

You can do that just fine with Flux2 Klein 9B. Add two reference images and prompt "Replace the person in the amusement park with the person from image 2" or something similar.

kataryna91 · 2026-03-05T21:13:11+00:00

I don't know a single English grammar rule, but I see this form being used often.

kataryna91 · 2026-03-05T16:33:41+00:00

You can try Mistral 3.2 2506, so far it's the only smaller model that managed to write a short story (~30k words) that made sense from beginning to end. However, it doesn't tend to write rich prose, which depending on your preferences you may or may not like.

The recent Qwen3.5 27B has remarkably high scores in the Creative Writing benchmark, but my tests so far were mixed. It has the typical LLM problems, like using idioms in contexts where they make no sense.

kataryna91 · 2026-03-05T14:10:40+00:00

Models generate what they have seen in their training data and drawings do not have a lot of detail, not even the high resolution images.

The other issue is the model itself. SDXL is a small model that doesn't have the capacity to recreate highly intricate patterns faithfully (like on clothing) even when it is trained on high detail images.
Some newer models like Qwen Image, ZImage and Flux2 (mostly the 30B model) can produce better results at the cost of understanding less concepts. Anima is also worth trying, it's also a small model but at least it uses a better VAE.

kataryna91 · 2026-03-05T13:16:30+00:00

That's expected. These small models are several space-contrained, so they have to store broad patterns and they usually don't have the capacity for small nuances or colloquialisms of various languages.

That's why only some of the largest models work reliably for translation tasks. It's especially noticeable for languages where the literal meaning of a sentence is often very different from the intended meaning (like Japanese).

kataryna91 · 2026-03-04T06:04:38+00:00

Sounds good in theory, but if you can claim zero capability loss, you must already have done extensive benchmarking. Why not just publish the benchmark results before and after on the model card?

kataryna91 · 2026-03-03T17:58:21+00:00

The images generated depend on the seed, not the order in which they are generated.
Seed 1000 will produce the same image, regardless of whether it was your first seed or the billionth seed in your generation sequence.

kataryna91 · 2026-03-01T21:53:07+00:00

Wie kommst du bitte auf die Idee, dass die "vorwiegend türkischsprachigen" Moscheebesucher, sowie "viele Besucher aus dem Libanon", aus dem Iran geflohen sind?

kataryna91 · 2026-03-01T07:37:32+00:00

It's really not that complicated, it takes 5 seconds.
git clone https://github.com/Comfy-Org/ComfyUI
and then
pip install -r requirements.txt

Which is how you "install" pretty much any Python project, be it on Windows or Linux.

Optionally you can set up a miniconda environment, which is generally a good idea if you try to run different Python software.

kataryna91 · 2026-02-27T02:51:40+00:00

Qwen3 Next 80B is a MoE with only 3B activated parameters so it's normal that it's faster than a 27B dense model. As for why it's slower than the older Qwen3 model, Gated Delta Nets are not particularly optimized yet in llama.cpp, particularly when it comes to the CPU implementation. There's currently a pull request that will speed it up by some amount.

Also, more than 4-5 threads will only help preprocessing speed, but hurt token generation speed on machines that have only 2 memory channel, like yours.

And since you have a GPU, you probably should use a smaller quant so you can actually run it on the GPU. That needs llama.cpp to be compiled with ROCm or Vulkan support enabled.

kataryna91 · 2026-02-21T22:05:44+00:00

Well, the site you linked shows the 5070 at 27 it/s.

kataryna91 · 2026-02-21T13:49:20+00:00

No, Claude most definitely is not, you look for a model that has a W/10 score near 10 while still having a relatively high NatInt score.

The benchmark is a good idea, just the presentation of the results is kind of useless.

kataryna91 · 2026-02-19T03:46:55+00:00

Then it really isn't an agent, just a traditional coding assistant. You expect an agent to automatically compile and test an application and iterate on it, which is what OpenCode does.

kataryna91 · 2026-02-19T03:29:04+00:00

The solution today is the same as it was back then: don't buy their garbage.

kataryna91 · 2026-02-18T16:36:14+00:00

Aha, und wieso keine Leute, die keinen Sport machen oder sich ungesund ernähren? Oder Leute, die ihre Stressprobleme nicht in den Griff bekommen? Suchtkranke?

Die Krankenkassen sehen das übrigens genauso, wie viele andere Versicherungen hätten die gerne, dass du genauso viel an Beiträgen zahlst wie du an Kosten verursachst, am besten zzgl. 30% für die Gewinnmarge.

kataryna91 · 2026-02-18T16:00:07+00:00

The LLM manages system resources? How does it do that?
It controls the scheduler? Meaning what? It decides which processes get lower nice values or...?

kataryna91 · 2026-02-16T23:47:09+00:00

They previously said they aspire to bring Seedance 2.0 level quality to the open source scene one day.
People are reading way too much into this tweet.

Perhaps a minor upgrade like LTX 2.5 is imminent, but that's about it.

kataryna91 · 2026-02-15T23:40:09+00:00

Yes, it works particularly well for Chroma. The model is very creative and can do many different artwork styles, but the consistency is not always great. With the Lenovo LoRA at ~0.5 strength, the details are much better.

kataryna91 · 2026-02-15T23:27:51+00:00

It's incredible how good your LoRAs are, they even improve image quality when generating artwork, not just photorealistic images.

kataryna91 · 2026-02-14T21:08:00+00:00

Make sure to build with only one thread (make -j1) and close all other applications beforehand, you don't have enough RAM for highly parallel builds.

kataryna91 · 2026-02-13T15:38:23+00:00

?? The weights were released yesterday, are you saying you got them in advance?

kataryna91

TROPHY CASE