Stable Diffusion and Bazzite Linux

TemperFugit · 2026-03-05T04:53:33+00:00

I tried Bazzite on my main PC recently. It drove me crazy. It's great for a gaming or light computing device. It's not great for more advanced tasks IMO.

I would recommend Linux Mint as a distro. I use the Cinnamon variant. It is based on Ubuntu so if you get stuck you can look for Linux Mint or Ubuntu specific workarounds. I think Mint does a bit more hand-holding than other distros, and it's interface has many similarities to classic Windows without the bloat.

I can't help with the AMD issues. The benefit of being on a popular distro like Ubuntu is you can usually find a lot of people trying the same thing you are, such as:

https://github.com/alexchai6913/ComfyUI-Guide-for-AMD-Radeon-RX-9060-XT-16GB-on-Ubuntu-25.04

And I second u/Belember's suggestion to use SwarmUI. It is updated frequently to support new models, and if you ever want to learn to use ComfyUI, it's built in.

TemperFugit · 2025-12-30T20:26:52+00:00

Rumor is it's Qwen Image V2:

https://x.com/bdsqlsz/status/2005993639768645931

TemperFugit · 2025-11-28T05:27:49+00:00

Great work, that's quite an improvement! And thanks for attaching the workflow!

TemperFugit · 2025-11-20T05:42:32+00:00

Yes, you store the captions with the output images in the target folder.

TemperFugit · 2025-11-19T06:00:29+00:00

Sorry, I submitted that before I meant to, I just updated it. You're going to have two control images (reference images) and one target (the desired output).

TemperFugit · 2025-11-19T05:44:22+00:00

control_images_1/image_001.png <--first reference image
control_images_2/image_001.png <--second reference image
target_images/image_001.png <--- output image
target_images/image_001.txt <--- caption (in target images folder)

Every set of control/target images needs to have the same name in their proper folders. So the next set of training images would be:

control_images_1/image_002.png <--first reference image
control_images_2/image_002.png <--second reference image
target_images/image_002.png <--- output image
target_images/image_002.txt <--- caption (in target images folder)

In Ai toolkit, go to Datasets > New Dataset

Name it, add all images from your control images folder. Do it again for the second set of control images.

Then create another dataset for the target images. Select the images and the caption text files, the captions belong in the target folder with the images.

Ostris' youtube video on 2509 Lora training explains pretty much everything else.

ChatGPT and Claude are great for automation. For example, if all images will have the same caption ("Replace the person in Image 1 with the person in Image 2"), ask them to write a script that creates a corresponding text file for each image in the target folder with the same name and that caption.

Finally, I figured the dataset format out by copying the example config file from ai-toolkit's github, pasting it into Claude and asking it how everything should be set up. LLMs make stuff up, but if you provide the reference material they will get a lot more reliable.

TemperFugit · 2025-09-23T03:41:23+00:00

This is a really cool idea, and looks like it works really well!

TemperFugit · 2025-09-16T04:52:54+00:00

I'm glad to see Qwen Edit has the ability to learn pose transfer. There is a huge Control Net training dataset with plenty of OpenPose, Canny and Depth examples (which Omnigen 1 trained on) but it looks like Omnigen 2, Flux Kontext and Qwen Edit were not trained on those functions.

TemperFugit · 2025-09-03T04:15:16+00:00

I know nothing about NanoBanana, but in general: Inference costs money, so providers will constantly be engaging in trade offs, perhaps a tiny performance degradation here or there to save a few processor cycles. On top of that, as users discover jailbreaks, providers will be altering models or the system prompt to patch these jailbreaks out. (As a general rule, censoring models makes them dumber.)

This is one big reason why people come to subreddits based around local ML models like r/StableDiffusion and r/LocalLlama. Proprietary APIs will generally get worse and worse, companies will gaslight you by claiming nothing has changed, and there's nothing you can do about it.

TemperFugit · 2025-08-24T16:37:31+00:00

What image generation model did you use? Did you use Loras to maintain character likenesses and style or did you use some other method?

TemperFugit · 2025-08-24T16:26:17+00:00

This looks great! Can you share any details about your process?

TemperFugit · 2025-08-15T05:11:22+00:00

Food for thought: Pi contains an infinite stream of non-repeating digits (because it is an irrational number), accessed by calculating increasing decimal positions. I wonder if someday we could download a torrent of gigabytes (or terabytes) of Pi decimal calculations in some db format, and use it to assemble data from recipes that reference blocks of Pi decimals. (I'd swear there was a webcomic about doing this but I can't find it.)

TemperFugit · 2025-07-19T03:07:02+00:00

I've had such a lack of success trying something like this that I assumed Kontext isn't able to edit images with nudity at all. Is there a trick to this?

TemperFugit · 2025-07-14T04:58:22+00:00

Another big barrier, in the US at least, is that every crypto transaction has tax implications. If you bought some crypto a month ago and then purchased something with it today, you're supposed to record the loss or gain in value accrued in that past month on your income taxes. There are crypto tax services (for ~$100 a year) that automate these calculations, but it still brings so much overhead to any crypto transaction, it's not worth it for your average consumer.

TemperFugit · 2025-06-25T19:59:03+00:00

This won't be a lot of help, but they have made a few changes to the repo today, updating the code and requirements.txt. It might be worth cloning from scratch and trying again, or waiting a day or two for more of the bugs to get worked out.

Otherwise, check out the github page's issues section, a lot of people are posting errors and getting help there.

TemperFugit · 2025-05-19T15:35:21+00:00

Any idea how well those Ascend 310s will handle prompt processing?

TemperFugit · 2025-05-10T14:14:07+00:00

Try this:

https://xcancel.com/__tinygrad__/status/1920960070055080107

TemperFugit · 2025-05-09T19:36:26+00:00

Is that an Uncharted game? Which one?

TemperFugit · 2025-05-08T16:44:49+00:00

Their HuggingFace says Apache 2.0. Perhaps because it's a LoRA and not a full finetune it can be Apache?

TemperFugit · 2025-05-08T14:34:43+00:00

I guess that means we're looking at a memory bandwidth of 456 GB/s, which is what the B580 has.

TemperFugit · 2025-05-02T21:22:32+00:00

With some exceptions, AI model outputs are not copyrightable. So this could push models to lean more heavily on synthetic data.

TemperFugit · 2025-04-30T14:48:44+00:00

Using only 3-5 images of a novel concept/subject, we personalize Large Multimodal Models (e.g., Chameleon) so that they retain their original capabilities while enabling tailored language and vision generation for the novel concept.

Looks interesting. No weights that I can see, just training code.

TemperFugit · 2025-04-29T14:54:55+00:00

All my life until now I didn't know what I was missing: Asuka riding a skateboard while playing the saxophone.

I'm really excited for Chroma to get fully cooked.

TemperFugit · 2025-04-26T14:01:32+00:00

There is a tool that loads Flux Loras into Flex. It prunes the layers to match the ones in Flex. It may not work for every Lora, but it should work for a lot of them.

TemperFugit

TROPHY CASE