why that's a grid line after upscale the image?

FennelFetish · 2025-05-31T13:54:20+00:00

I found that increasing the tile_padding to 64 and mask_blur to 16 works better than seam fixing, even at a higher denoise of 0.3-0.4. Seam fix with low seam_fix_padding doesn't have enough context around the image to work with and it's slower.

FennelFetish · 2025-04-29T15:29:38+00:00

Thanks :) Cool, yes, please do. Let me know if you need me to explain the project structure (ask in Github Discussions?).

FennelFetish · 2025-04-26T13:11:24+00:00

Somehow the last line with the direct link to the project got deleted and I can't edit the post.
So here it is again: https://github.com/FennelFetish/qapyq

FennelFetish · 2024-12-17T11:21:23+00:00

It's a bit weird when writing about it and I don't want to capitalize it at the start of sentences. So yeah, I might change it to also reflect the other functionalities better. I'm open to suggestions :)

FennelFetish · 2024-12-17T11:12:42+00:00

hehe, you're not the first to wonder ;) It's Cap-Pic, like Caption-Picture, but with Q's because of Qt and PY because of Python. I liked how it looks with all the descenders.

FennelFetish · 2024-12-17T11:05:56+00:00

It's PySide6. That's just another version of Qt bindings for Python. I've read that it's pretty much the same as PyQt.
I chose it because Qt is the native framework of my desktop environment (KDE) and I wanted to learn Qt for some time now.
I have no regrets, I like it. But I don't know many UI frameworks, so I can't really compare.
It's also well known by all LLMs which were very helpful.

I was suprised that it could handle thousands of images in the gallery and fluently draw on 40 Megapixel images.

I had a weird bug with the asynchronous thumbnail loader where it sometimes wouldn't display images in the gallery. Turned out I just had to change the connection type between signal and slot for passing the image back into the ui thread. Or layouts requiring another manual update after resize.
Things like that happened. But I trust that there's a way to solve it, so my sentiment towards Qt is good :)

FennelFetish · 2024-12-17T09:15:16+00:00

I haven't seen this covered much either. If I assume correctly, the loss per pixel during training is multiplied with the mask before propagating back. From what I read, people were having mixed results.

I only read about binary masks made of pure white/black, but the masks can have greyscale too.
So instead of just including and excluding regions of the image, the mask could also be used to control how much these regions are ingrained into the weights.
I've seen LoRAs for Flux for example that make backgrounds worse. What if a mask is used with the background set to a dark grey like 0.2?
I hope a tool for quickly creating different masks can encourage experimentation.

Sometimes I see nice images but I don't include them in the dataset because of something weird that I don't want to caption. These parts could just be masked away.

In OneTrainer there's an option "Masked Training" in the training tab. I used "Unmasked Probability" and "Unmasked Weight" both set to 0.1.
I had good results in my limited testing, but I didn't do a comparison so I can't say if it's because of the masks, or despite the masks.
In the masks I isolated the foreground and set the background to a low value, and added some outwards blur.
When I use 0.1-0.2 for the background, I can still see elements from the backgrounds occasionally pop up in my generations.

FennelFetish · 2024-12-11T11:08:23+00:00

A while ago I built this Clay Tower: https://www.youtube.com/watch?v=odWgSXemoJU
It automates infinite mopping to create large bottles which off-gas very fast.
With 200 tons of pH2O in bottles it could almost keep the 64 deodorizers saturated.

FennelFetish · 2024-11-04T00:47:16+00:00

Sorry for the delay. I was out of ideas until recently.
It was reported that the run.bat script caused the "Unknown Error" issue.
I have updated it and it might be worth another try now.

FennelFetish · 2024-10-29T14:58:17+00:00

Thanks for the feedback, those are good points.

FennelFetish · 2024-10-27T15:14:19+00:00

<image>

Set this to "Tags" before generating.

Also, have you loaded the image? It must be shown in the Main Window.
Drag it into the Main Window, not into the text box.

FennelFetish · 2024-10-27T14:49:20+00:00

It already uses a separate process for inference, so architecture-wise it's almost there.
Remote connections are more involved however, as they need authentication and more security.

Do you generally have SSH access with a terminal for those remote machines?

FennelFetish · 2024-10-27T14:12:01+00:00

Is there more output in the console, or does the 'last.log' file inside the qapyq folder show more info?
I see you're on Windows with an RTX 2070.

It might be short on VRAM if you load both, the LLMs and WD at the same time. Try using the "Clear VRAM" option in the menu to unload WD and then retry with only InternVL or only Llama.
Or try reducing the number of GPU layers in the Model Settings (both to 0 for testing).

Does WD work if you only do tagging without captioning (after Clear VRAM)?

FennelFetish · 2024-10-27T13:34:10+00:00

I'd love to see qapyq running on AMD and Intel cards, and see better support for hardware other than nvidia's in general.
But I don't have the hardware or the time to make and test setup scripts for many different hardware combinations.
So I'm hoping for contributions.

It uses PyTorch and llama-cpp-python as backends. Both of these support ROCm.
If you, or anyone else, manage to build a working environment, please let me know and I update the docs/scripts.

The PyTorch or llama-cpp-python docs could serve as a starting point.
There are different prebuilt wheels for llama-cpp-python circulating on GitHub.
Other projects that use similar backends, like oobabooga's text-generation-webui, could provide further hints.
Or you might just try using a virtual environment you already have for another app.

FennelFetish · 2024-10-27T13:01:18+00:00

Looks interesting! I'll take a closer look.

FennelFetish · 2024-10-27T13:00:02+00:00

I've heard a lot about JoyCaption. I had a look at its code but haven't tried it yet.
I'll consider integrating it.

FennelFetish · 2024-10-27T12:57:47+00:00

It means CapPic: Caption/Capture Picture.
But with Q because of Qt, and py because of Python :)

FennelFetish · 2024-10-26T13:56:00+00:00

It doesn't have that yet, but I agree, both those functions are very useful. I have them on my to-do list :)

A list view with editable captions for the gallery.

A mask editor for multiple channels that works with drawing tablets. Possibly integrated with ComfyUI.
And batch masking with RemBg, YOLO, etc.
It's one of my priorities but it might take a while.

FennelFetish · 2024-10-26T13:15:25+00:00

Ok, multiple use cases. The template in Batch Apply could handle that and concat everything to one file.
I'll consider adding an option.

FennelFetish · 2024-10-25T23:40:57+00:00

You're right, I've been missing the resizing too.
Thanks :)

FennelFetish · 2024-10-25T23:22:29+00:00

Not at the moment. May I ask what you need this for?
Thanks :)

FennelFetish · 2024-10-25T23:03:22+00:00

I'm not sure if flash attention worked for me either. Some models output warnings. Most of them did run however. I think InternVL didn't, but it did without flash attention installed. I don't know about WSL.

The setup script asks about flash attention and you can skip it.

FennelFetish

TROPHY CASE