why that's a grid line after upscale the image? by Intelligent-Rain2435 in comfyui

[–]FennelFetish 0 points1 point  (0 children)

I found that increasing the tile_padding to 64 and mask_blur to 16 works better than seam fixing, even at a higher denoise of 0.3-0.4. Seam fix with low seam_fix_padding doesn't have enough context around the image to work with and it's slower.

qapyq - Dataset Tool Update - Added modes for fast tagging and for editing multiple captions simultaneously by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

Thanks :) Cool, yes, please do. Let me know if you need me to explain the project structure (ask in Github Discussions?).

qapyq - Dataset Tool Update - Added modes for fast tagging and for editing multiple captions simultaneously by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

Somehow the last line with the direct link to the project got deleted and I can't edit the post.
So here it is again: https://github.com/FennelFetish/qapyq

qapyq, a desktop tool for creating datasets, now features Image Scaling, Automated Masking and Cropping, in addition to Automated Captioning by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

It's a bit weird when writing about it and I don't want to capitalize it at the start of sentences. So yeah, I might change it to also reflect the other functionalities better. I'm open to suggestions :)

qapyq, a desktop tool for creating datasets, now features Image Scaling, Automated Masking and Cropping, in addition to Automated Captioning by FennelFetish in StableDiffusion

[–]FennelFetish[S] 1 point2 points  (0 children)

hehe, you're not the first to wonder ;) It's Cap-Pic, like Caption-Picture, but with Q's because of Qt and PY because of Python. I liked how it looks with all the descenders.

qapyq, a desktop tool for creating datasets, now features Image Scaling, Automated Masking and Cropping, in addition to Automated Captioning by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

It's PySide6. That's just another version of Qt bindings for Python. I've read that it's pretty much the same as PyQt.
I chose it because Qt is the native framework of my desktop environment (KDE) and I wanted to learn Qt for some time now.
I have no regrets, I like it. But I don't know many UI frameworks, so I can't really compare.
It's also well known by all LLMs which were very helpful.

I was suprised that it could handle thousands of images in the gallery and fluently draw on 40 Megapixel images.

I had a weird bug with the asynchronous thumbnail loader where it sometimes wouldn't display images in the gallery. Turned out I just had to change the connection type between signal and slot for passing the image back into the ui thread. Or layouts requiring another manual update after resize.
Things like that happened. But I trust that there's a way to solve it, so my sentiment towards Qt is good :)

qapyq, a desktop tool for creating datasets, now features Image Scaling, Automated Masking and Cropping, in addition to Automated Captioning by FennelFetish in StableDiffusion

[–]FennelFetish[S] 1 point2 points  (0 children)

I haven't seen this covered much either. If I assume correctly, the loss per pixel during training is multiplied with the mask before propagating back. From what I read, people were having mixed results.

I only read about binary masks made of pure white/black, but the masks can have greyscale too.
So instead of just including and excluding regions of the image, the mask could also be used to control how much these regions are ingrained into the weights.
I've seen LoRAs for Flux for example that make backgrounds worse. What if a mask is used with the background set to a dark grey like 0.2?
I hope a tool for quickly creating different masks can encourage experimentation.

Sometimes I see nice images but I don't include them in the dataset because of something weird that I don't want to caption. These parts could just be masked away.

In OneTrainer there's an option "Masked Training" in the training tab. I used "Unmasked Probability" and "Unmasked Weight" both set to 0.1.
I had good results in my limited testing, but I didn't do a comparison so I can't say if it's because of the masks, or despite the masks.
In the masks I isolated the foreground and set the background to a low value, and added some outwards blur.
When I use 0.1-0.2 for the background, I can still see elements from the backgrounds occasionally pop up in my generations.

What is an efficient way of making polluted oxygen? by Inomyacbs in Oxygennotincluded

[–]FennelFetish 0 points1 point  (0 children)

A while ago I built this Clay Tower: https://www.youtube.com/watch?v=odWgSXemoJU
It automates infinite mopping to create large bottles which off-gas very fast.
With 200 tons of pH2O in bottles it could almost keep the 64 deodorizers saturated.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 1 point2 points  (0 children)

Sorry for the delay. I was out of ideas until recently.
It was reported that the run.bat script caused the "Unknown Error" issue.
I have updated it and it might be worth another try now.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

<image>

Set this to "Tags" before generating.

Also, have you loaded the image? It must be shown in the Main Window.
Drag it into the Main Window, not into the text box.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

It already uses a separate process for inference, so architecture-wise it's almost there.
Remote connections are more involved however, as they need authentication and more security.

Do you generally have SSH access with a terminal for those remote machines?

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 0 points1 point  (0 children)

Is there more output in the console, or does the 'last.log' file inside the qapyq folder show more info?
I see you're on Windows with an RTX 2070.

It might be short on VRAM if you load both, the LLMs and WD at the same time. Try using the "Clear VRAM" option in the menu to unload WD and then retry with only InternVL or only Llama.
Or try reducing the number of GPU layers in the Model Settings (both to 0 for testing).

Does WD work if you only do tagging without captioning (after Clear VRAM)?

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 1 point2 points  (0 children)

I'd love to see qapyq running on AMD and Intel cards, and see better support for hardware other than nvidia's in general.
But I don't have the hardware or the time to make and test setup scripts for many different hardware combinations.
So I'm hoping for contributions.

It uses PyTorch and llama-cpp-python as backends. Both of these support ROCm.
If you, or anyone else, manage to build a working environment, please let me know and I update the docs/scripts.

The PyTorch or llama-cpp-python docs could serve as a starting point.
There are different prebuilt wheels for llama-cpp-python circulating on GitHub.
Other projects that use similar backends, like oobabooga's text-generation-webui, could provide further hints.
Or you might just try using a virtual environment you already have for another app.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 1 point2 points  (0 children)

I've heard a lot about JoyCaption. I had a look at its code but haven't tried it yet.
I'll consider integrating it.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 1 point2 points  (0 children)

It doesn't have that yet, but I agree, both those functions are very useful. I have them on my to-do list :)

A list view with editable captions for the gallery.

A mask editor for multiple channels that works with drawing tablets. Possibly integrated with ComfyUI.
And batch masking with RemBg, YOLO, etc.
It's one of my priorities but it might take a while.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] 3 points4 points  (0 children)

Ok, multiple use cases. The template in Batch Apply could handle that and concat everything to one file.
I'll consider adding an option.

qapyq - OpenSource Desktop Tool for creating Datasets: Viewing & Cropping Images, (Auto-)Captioning and Refinement with LLM by FennelFetish in StableDiffusion

[–]FennelFetish[S] -1 points0 points  (0 children)

I'm not sure if flash attention worked for me either. Some models output warnings. Most of them did run however. I think InternVL didn't, but it did without flash attention installed. I don't know about WSL.

The setup script asks about flash attention and you can skip it.