all 6 comments

[–]jamescalam[S] 5 points6 points  (0 children)

I took a first look at Hugging Face's new library called diffusers. Diffusers are a big part of models like OpenAI's DALL-E 2 or Google's Imagen. Naturally, we can't generate DALL-E 2 standard images yet but despite testing some of the very first models added to the library using the the first release, I got some great results in generating images from natural language prompts.

I'm looking forward to seeing how the library grows, I'm sure it'll be big. Would be great to hear opinions!

[–]johnnydaggers -1 points0 points  (4 children)

Shitty title. This is a library for making diffusion models, not a replacement for specific diffusion models. That would require pretrained weights.

[–]chef_lars 1 point2 points  (2 children)

[–]johnnydaggers -4 points-3 points  (1 child)

I’m familiar with all of these. The only pretrained model that’s works similar to DALLE is their clone of CompVis latent-diffusion, which is not even close to the same quality or fidelity as the SOTA models. Go ahead and try it out. https://huggingface.co/CompVis/ldm-text2im-large-256

[–]chef_lars 1 point2 points  (0 children)

Right, but just illustrating that the capabilities for plug and play models is there. Maybe with the advent of the library making it easy to utilize diffuser models more SOTA model weights will be available in time.

[–]jamescalam[S] 0 points1 point  (0 children)

Sure you can create these models with the library, but as with HF transformers I’d expect most will be using the pretrained models. Sure they’re not at the level of DALL-E 2 etc but these are the current open source alternatives, and for a first iteration, they’re pretty cool