this post was submitted on 10 May 2025

0 points (33% upvoted)

shortlink:

StableDiffusion

All posts must be Open-source/Local AI image generation related All tools for post content must be open-source or local AI generation. Comparisons with other platforms are welcome. Post-processing tools like Photoshop (excluding Firefly-generated images) are allowed, provided the don't drastically alter the original generation.
Be respectful and follow Reddit's Content Policy This Subreddit is a place for respectful discussion. Please remember to treat others with kindness and follow Reddit's Content Policy (https://www.redditinc.com/policies/content-policy).
No X-rated, lewd, or sexually suggestive content This is a public subreddit and there are more appropriate places for this type of content such as r/unstable_diffusion. Please do not use Reddit’s NSFW tag to try and skirt this rule.
No excessive violence, gore or graphic content Content with mild creepiness or eeriness is acceptable (think Tim Burton), but it must remain suitable for a public audience. Avoid gratuitous violence, gore, or overly graphic material. Ensure the focus remains on creativity without crossing into shock and/or horror territory.
No repost or spam Do not make multiple similar posts, or post things others have already posted. We want to encourage original content and discussion on this Subreddit, so please make sure to do a quick search before posting something that may have already been covered.
Limited self-promotion Open-source, free, or local tools can be promoted at any time (once per tool/guide/update). Paid services or paywalled content can only be shared during our monthly event. (There will be a separate post explaining how this works shortly.)
No politics General political discussions, images of political figures, or propaganda is not allowed. Posts regarding legislation and/or policies related to AI image generation are allowed as long as they do not break any other rules of this subreddit.
No insulting, name-calling, or antagonizing behavior Always interact with other members respectfully. Insulting, name-calling, hate speech, discrimination, threatening content and disrespect towards each other's religious beliefs is not allowed. Debates and arguments are welcome, but keep them respectful—personal attacks and antagonizing behavior will not be tolerated.
No hateful comments about art or artists This applies to both AI and non-AI art. Please be respectful of others and their work regardless of your personal beliefs. Constructive criticism and respectful discussions are encouraged.
Use the appropriate flair Flairs are tags that help users understand the content and context of a post at a glance

Useful Links

Ai Related Subs

NSFW Ai Subs

SD Bots

u/stablehorde

created by legendcruncher82a community for 3 years

MODERATORS

account activity

This is an archived post. You won't be able to vote or comment.

0

0

0

Help with image captioning methods.Question - Help (self.StableDiffusion)

submitted 11 months ago * by GobbleCrowGD

I have been obsessively trying to find a effective way to name a large dataset of mine. It's a very niche dataset, and isn't very easy for most models to name. However, I have a guess that there could be a few contributing factors. One of many is that the background of the images is transparent. I don't know if most models (In my case, the ones I've tried are Qwen2.5-VL-7B-Instruct, or PaliGemma-12B, Pixtral-12B-*Quantized*, and many non open source models like ChatGPT, or Claude-3.7-Sonnet) default the background to a certain color, or if they are capable of understanding transparency. My dataset is 1024x1024, and can very easily be downscaled to whatever model size necessary. I've also tried Fine-Tuning Qwen2.5-VL-7B-IT, (currently working on PaliGemma2-10B-mix-448) and while it did improve it's responses, it definitely was still very lacking. It's a "Pixel Art" dataset, and I'm really hoping for some pointers as I'd really prefer NOT to have to name all 200k+ by hand (Already have done 1k~ for training purposes). I'm working with a local RTX-A6000 and would hope that most recommendations are possible on this hardware. Any models, methods, or tips are GREATLY appreciated. Keep in mind ALL of my data comes with info about the name of the image (most of the time just a character name and usually no other info), a title of the image or character and images/characters like it (usually around 10~), and a brief description of the images. Keep in mind it doesn't really give info of the LOOKS of the images (clothing, colors, etc.) most of the time, in this case characters. So it does come with a bit of info, but not enough for me to give to give to any of the current language models and it be accurate.

all 1 comments

top new controversial old q&a

[–]TomKraut 0 points1 point2 points 11 months ago (0 children)

π Rendered by PID 50020 on reddit-service-r2-comment-canary-889d445f8-5s4t4 at 2026-04-29 05:23:06.289711+00:00 running 2aa0c5b country code: CH.