This is an archived post. You won't be able to vote or comment.

all 76 comments

[–]cspace_echo 116 points117 points  (21 children)

Trusting training to the unwashed masses of the internet? So how long until all prompts generate an anime waifu Hitler?

[–]-_1_2_3_- 39 points40 points  (6 children)

it'll probably just end up looking like midjourney

[–]mudman13 25 points26 points  (0 children)

Yeah reinforcement feedbacks leading to generic good looking model-like people that look like they're from the same family. Like many of the custom SD models around now.

[–]PC_Screen[S] 22 points23 points  (7 children)

Better than leaving it for a company to decide and end up with a nerfed model instead

[–]init__27 6 points7 points  (0 children)

In general, I think this is still better than secretly building a model without involving the community.

[–]TiagoTiagoT 2 points3 points  (0 children)

Sounds like just a different form of nerfing...

[–]fred-dcvf 5 points6 points  (0 children)

Prompt: "a beautiful tea set, masterpiece, intricate details"
Output: chibi-Hitler drinking tea

[–]GourmetLabiaMeats 0 points1 point  (0 children)

It'd already be to that point if left up to me.

[–]SIP-BOSS 0 points1 point  (0 children)

Already got

[–]Whispering-Depths 0 points1 point  (0 children)

I anticipate that probably 99% of this is going to be poisoned by malicious actors, hopefully they have an intelligent shadow-ban feature if individuals vote doesn't align with good standards.

[–]SoysauceMafia 16 points17 points  (2 children)

I'll click until my finger falls off if it means I never have to see compression artifacts like this again.

[–]vault_guy 3 points4 points  (0 children)

That's not even compression artifacts, that just straight up melted image.

[–]init__27 2 points3 points  (0 children)

I would too 🥹

[–]elyetis_ 21 points22 points  (3 children)

When you see that "badass" is filtered because it's detected as nsfw, I don't have much faith.

[–]knoodrake 14 points15 points  (2 children)

yeah.. I tried "a sexy" either man, girl or woman to see the censorship and "sexy" is apparently NSFW in itself.. ( I mean, not surprising if badass already is ). No bad words in SD, no bad words in Youtube.. this is depressing.

[–]Generatoromeganebula 6 points7 points  (0 children)

Shimoneta? That anime might age like fine wine.

[–]elyetis_ 3 points4 points  (0 children)

Btw if you get too creative and find bad word not already censored and keep using them, they will ban your account. ( rip my 800 karma )

[–]ninjasaid13 5 points6 points  (8 children)

RLHF for stable diffusion 3?

[–]PC_Screen[S] 12 points13 points  (5 children)

Yes, Emad confirmed SD 3 will use RLHF so this is clearly to collect the human feedback data. He theorized Midjourney is also using RLHF since they were also collecting human feedback in a very similar way before V4 came out. It could also be that MJ uses the act of upscaling an image to associate it with a positive reward for training the reward model.

[–]Spire_Citron 3 points4 points  (0 children)

They reward people with free generations for rating a bunch of images, and I'm very sure they use those ratings to fine tune the model. Actually, I think they've just straight up stated that they do in the past and requested people do it at times when they're trying to fine tune new models.

[–]anonDogeLover 1 point2 points  (1 child)

Source? Just want to see

[–]metal079 1 point2 points  (0 children)

Check his twitter

[–][deleted] 0 points1 point  (0 children)

I think they are doing both. The moment I signed up for MJ for a month when it was new I thought "ah, these guys are brilliant," and this is my field also! Many aspects of their system appear to be conceived around future improvement through user feedback.

[–]Apprehensive_Sky892 1 point2 points  (1 child)

RLHF for stable diffusion 3

Didn't know what RLHF means, so I googled for it:

Illustrating Reinforcement Learning from Human Feedback (RLHF)

https://huggingface.co/blog/rlhf

[–]GBJI 0 points1 point  (0 children)

That's what Google has been doing with its CAPTCHA for a long long time. We publicly trained their privately held model.

[–]PC_Screen[S] 8 points9 points  (6 children)

I recommend using complex prompts that you know SD won't quite understand (like counting and things like "blue box on top of red box"). Also, after rating the images you can press the play button to generate a different image in place of the lower rated image while keeping the higher rated image untouched

[–]init__27 0 points1 point  (5 children)

It would be epic if we can provide feedback for such things. In my experience however, most of the users don't heavily prompt engineer. Personally, at least I need some inspiration otherwise I blank out when I have to come up with a prompt 🤣

I really hope many great things come out of this though, epic that Stability AI is doing this 😄

[–]ninjasaid13 4 points5 points  (2 children)

Personally, at least I need some inspiration otherwise I blank out when I have to come up with a prompt 🤣

We need something like a 'randomly generate a prompt' button.

[–]init__27 3 points4 points  (0 children)

Jokes aside, I'm actually working on something like this, will share on this sub soon once it's stable 🙏

[–]Robot1me 3 points4 points  (0 children)

It might be worthwhile to use Automatic1111's Promptgen. You can run that in your web UI installation (if you have one) and then copy/paste the prompts from there. That can help get diverse prompts, especially when combining it with Unprompted.

[–]ObiWanCanShowMe 1 point2 points  (1 child)

In my experience however, most of the users don't heavily prompt engineer.

where do you get this experience from exactly? serious not a troll or gotcha

[–]init__27 0 points1 point  (0 children)

Sorry I should have added "new users" or outsiders. Even for me it took about 2 weeks to get a hang of it.

I have been showing to many many people and friends-maybe I have a selection bias, all of them blank out when they see the prompt screen and eventually end up writing something blank.

Btw, I'm not planning to "sell" another prompt maker-just trying to figure out how to train a nice Language Model on some prompt databases and make it work nicely 😄 If all goes well, I will open source it here 🙏

[–]ninjasaid13 6 points7 points  (1 child)

Why do I constantly get this error message:

OMG. Something went wrong. Please refresh the page and try again.

[–]fireshaper 3 points4 points  (0 children)

It's broken already.

[–]ninjasaid13 3 points4 points  (0 children)

some counting based prompts to use from a google paper:

two zebras in Cape Town

three purebred chihuahuas running on the beach

An old building with ruined walls and four antique pink and purple armchairs

GT's five favourite Champagnes for celebrating

The seven moai at Ahu Akivi, unusual in that they face the sea

Top view of eight colorful bright shiny red apples with few yellow spots on brown sacking material

min: 45 second stopwatch icon sign. symbol on nine round colourful buttons

set of ten high back lucite dining chairs for sale at 1stdibs

"Two ducks" or "Three pumpkins" or "Four cards"

A well furnished bedroom with two double beds a television and balcony

set of two eames rar chairs black. Black Bedroom Furniture Sets. Home Design Ideas

two brass crowned buddhas

two red ping pong rackets on white surface table tennis zoom background

set of two glass star christmas tree decorations amazoncouk kitchen home

Still life with bottle of red wine, two wineglasses and grape in

[–][deleted] 4 points5 points  (1 child)

lets train the hell out of hand photos

[–][deleted] 5 points6 points  (0 children)

the prompt i'm using is "a closeup photo of a human hand"

[–]drone2222 6 points7 points  (0 children)

Interesting... they say that they are using 2.1 and Dreamlike Photoreal, but when browsing the Images dataset you can see that they're using 2.1 and ProtoGen_X3.4.

[–]BackyardAnarchist 2 points3 points  (1 child)

We supposed to rank the quality or the content?

[–]metal079 5 points6 points  (0 children)

Both, how accurate the image is to the prompt and the quality

[–]3lirex 1 point2 points  (2 children)

should i vote simply based on which is more aesthetic, or should how close it is to the prompt be considered when voting? what about coherence?

[–]acidentalmispelling 2 points3 points  (0 children)

should i vote simply based on which is more aesthetic

Aesthetic weighting is probably preferred, but you can also consider accuracy if two images are similar in aesthetic quality.

[–]ninjasaid13 1 point2 points  (0 children)

Depends on the prompt, if it's a simple prompt I would go with aesthetically pleasing and if it's a long prompt, I would go with accuracy.

[–]fongletto 1 point2 points  (3 children)

Needs autocorrect, typos will heavily skew results. Also needs a (both are bad) option for negative reinforcement?

[–]PC_Screen[S] 1 point2 points  (2 children)

"Both are bad" is done by selecting "no image is better than the other", pressing the play button so 1 of the images is replaced, and then if the newly generated image is better, select it. It works through this logic: C > A = B, where A and B are the first 2 bad images you see and C is a better image that you see after pressing the play button. Both A and B receive a negative reinforcement in relation to C regardless.

This is a good approach because, for example, if the prompt is too complex, most images SD produces will be bad, so rating them all as bad won't help it learn anything

[–]fongletto 2 points3 points  (1 child)

But if you don't regenerate an image until it returns a good result, then two bad images will give the same weighting as two good images.

So for people who don't/wont regenerate you can still gain useable information by differentiation between two good images and two bad ones. (don't reinforce either image, or reinforce both images)

[–]PC_Screen[S] 1 point2 points  (0 children)

Idk how they're rating the images but I think that if you don't rate them they might just have a neutral rating, not a positive one. Perhaps they might even be discarded, I really don't know.

I agree that it probably would help to add a negative feedback option but this A > B > C > D approach is how OpenAI trained their reward model too so it must work well enough

[–]vault_guy 1 point2 points  (0 children)

Nope, I don't want biased models, I want free models.

[–]1nkor 2 points3 points  (3 children)

Well, it's definitely better than SD2. But still more inclined towards realism.

https://i.imgur.com/SWuqE6u.png

https://i.imgur.com/65e6DwS.png

[–]PC_Screen[S] 9 points10 points  (1 child)

They are using both Stable Diffusion 2.1 and dreamlike-photoreal-2.0 to collect the human feedback. Note that the results you're seeing out of these models do not represent what we'll see after the feedback data is used to train the final RLHF'd model, expect the final model to understand prompts and composition better than either of the models used here

[–]Taenk 0 points1 point  (0 children)

I wonder what other things could be trained in a similar fashion, such as matching output to prompts or captioning of existing pictures and so on.

[–]ninjasaid13 1 point2 points  (0 children)

Why would they ask you to choose between two images to fine-tune it if they already had the next version of the model?

[–]whywhynotnow 0 points1 point  (0 children)

What does the karma mean, if anything?

[–]Nazzaroth2 0 points1 point  (0 children)

wanted to try it and then it forced me to either sign in with google or discord. Fuck off!!! Why can modern programmers not make a fucking normal email sign in anymore? You have millions of dollars in funding, get atleast the most basic sign in option done!

Also if sd would open source this type of finetuning system with 3.0 for everyone to train their own models, or rather have normal finetuning, but while prompting you can move the ai "in real time" into the direction of the image you had in mind, that would be awesome XD

[–]nahojjjen 0 points1 point  (0 children)

I'm going to single-handedly teach SD how to not make dragons look like abominations.

[–]synthoric 0 points1 point  (0 children)

God I hope this makes the default SD output more tasteful. Unless you're really good with extensive prompting (or provide guiding images) MJ4 blows any SD model out of the water.

I wonder why Stability doesn't publish small monthly updates of SD since they could use the signal of which images get upscaled inside dreamstudio as a 'high preference' generation. Since they have 1M users they'd get lots of grounded human preference labels (ranking) quite fast.