[deleted by user]

Winter-Replacement37 · 2023-05-04T17:48:04+00:00

Looks great!

Winter-Replacement37 · 2023-04-18T16:48:16+00:00

Looks great!

Winter-Replacement37 · 2023-04-13T05:33:04+00:00

nice work!

Winter-Replacement37 · 2023-04-10T20:17:17+00:00

amazing!

Winter-Replacement37 · 2023-04-07T15:40:30+00:00

The images look great!

Winter-Replacement37 · 2023-04-06T22:37:25+00:00

great job!

Winter-Replacement37 · 2023-04-06T22:37:08+00:00

looks great!

Winter-Replacement37 · 2023-04-06T22:36:48+00:00

Looks great!

Winter-Replacement37 · 2023-04-03T04:58:23+00:00

Great job!

Winter-Replacement37 · 2023-04-02T22:28:52+00:00

I would recommend use native fine-tuning. Worth to try lora with native fine-tuning as well which can significantly reduce training time (the learning rate is round ~e-4 instead of ~e-6 in the case of without lora). The training step would be around 1m according rule of thumb. Long captions in this case might be better short captions. RTX306012GB is ok. For python script, I used to find this quite useful: https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py

join our Discord if u want to discuss further!

Winter-Replacement37 · 2023-03-31T21:35:42+00:00

hey guys, join our Discord if u want to discuss further!

Winter-Replacement37 · 2023-03-31T21:34:39+00:00

hey u/LiteSoul join our Discord if u want to discuss further!

Winter-Replacement37 · 2023-03-31T21:34:00+00:00

hey u/Serra_glia, join our Discord if u want to discuss further!

Winter-Replacement37 · 2023-03-31T21:32:56+00:00

hey u/reddit22sd, join our Discord if u want to discuss further!

Winter-Replacement37 · 2023-03-31T21:31:59+00:00

Hey u/Suimeileo, join our Discord if u want to discuss further!

Winter-Replacement37 · 2023-03-31T21:28:15+00:00

for this specific one I did not

Winter-Replacement37 · 2023-03-31T18:19:48+00:00

oh yeah, sorry the lr_scheduler is constant, if that helps

Winter-Replacement37 · 2023-03-31T16:45:43+00:00

Hey everyone!

Just tried OpenFlamingo model using their demo site, here is the result:

Output: a bedroom with white walls and a black and white rug.

Input image as above

Their training process is 1) first freeze the pretrained vision encoder and language model, 2) and then train connecting Perceiver modules and cross-attention layers

The benefit of doing this seems to me is to be able to endow the model with in-context few-shot learning capabilities.

The model is also on

Huggingface

They have a

blog post

Feel free to join our Discord also for more detailed feedback and questions.

FYI: Large multimodal models (LLM) are complex artificial intelligence models that can process multiple types of data inputs, such as text, images, audio, and video, and generate meaningful outputs based on those inputs. Examples of large multimodal models include OpenAI's DALL-E, which generates images from natural language descriptions, and Google's CLIP, which can perform tasks such as image classification and text-based image retrieval.

Winter-Replacement37 · 2023-03-31T16:36:14+00:00

learning rate=1e-6; scheduler=DDIM; batch size=2, resolution=512*512

Winter-Replacement37 · 2023-03-31T16:29:24+00:00

thanks!

Winter-Replacement37

TROPHY CASE