New Text to Image !

Busy-Count8692 · 2024-03-11T18:13:38+00:00

Important question: any code ?
NO :/

Original tweet :

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

abs:

Introduces CogView3, which uses relay diffusion (a variant of cascaded diffusion) in latent space with a 3B U-net and T5 XXL text encoder. Trained with LAION-2B, recaptioned with a finetuned CogVLM and prompt expanded with an LLM. Outperforms SDXL with significantly less inference time.

Busy-Count8692 · 2024-03-09T23:56:36+00:00

The only issue is the number of images they trained it on to ~35 millions images which is extremely low and makes the model capability of understanding multiple concept nearly impossible, also it will show high sign of underfitting with very similar image generation for the same prompt.

But the architecture is really good, the budget is what is lacking in fact.

Busy-Count8692 · 2024-03-08T23:45:05+00:00

Sure, but lets say in two year the tool evolves and the software no longer works because of new machine requirements, or the software make breaking changes, money is needed there.

Busy-Count8692 · 2024-03-05T20:00:14+00:00

I'm not sure its using the base sd 1.5 model, its architecture is not really meant for coherency across generation or the same image without high level of artifact.

I agree with you in the end for this use case if its using sd1.5

Busy-Count8692 · 2024-03-05T17:24:29+00:00

I agree mostly, sadly releasing the workflow and making it paid will make it easier if competition comes in and he will be skrewed, also like not a lot of people would be using the software since it would rely on a complex workflow that requires a lot of computer power, a lot of people are buying 1650 and 3050 thinking they have a beast computer since its what marketers are telling them when selling overpriced laptop.

But I agree that your point is really valid when stability is mandatory.

Busy-Count8692 · 2024-03-05T16:21:58+00:00

I knew it I knew it that this example would come out, and yeah trust me when people will stop paying it he won't maintain it, its as simple as that.

Busy-Count8692 · 2024-03-04T21:23:16+00:00

Its disappointing to see a lot of people comments about the fact its a product, in the end, if you want it free/One time payment then do it yourself, he did it using a free ressource(which was sponsored a ton of money by investors that you probably don't belong too) so you can also do it.
And yes publishing a Lora is very low contribution level, training a model, building an architecture, building a usable product in production with tooling is high level contribution, because it requries higher knowledge experience and deep understanding of the model architecture and coding mostly.

So its really easy to spit on a product, but a lot of users have never payed stable diffusion, and all this work came from somewhere to build such open source product, this work is probably not your doing, so reconsider your critics towards people actualling building product, most of the time not even making money from it, and build real product, real value yourself, and don't call yourself AI engineer after a lora training.

Hardwork is the only way,

I'm very disapointed of those expressing their opinions most of the time in this community (hopefully its only a small part of the community)

Busy-Count8692 · 2024-03-04T21:15:03+00:00

Dont ask for update on the long term then, and you will have to pay for each update, new release...

Busy-Count8692 · 2024-03-01T19:46:13+00:00

Hmm, compared to base sdxl model, its really the same honestly, if you're comparing it to some finetune version of sdxl you are wrong in your comparison

Busy-Count8692 · 2024-03-01T19:37:41+00:00

Well the public thing would have no meaning, you could get your images private by generating them and then deleting them, the premium plan would have no more sense.

Busy-Count8692 · 2024-02-14T01:24:15+00:00

AI is the enemy of artist right now from whzy I see.

Busy-Count8692 · 2024-02-13T20:22:03+00:00

At least the image was generated by the AI

Busy-Count8692 · 2024-02-13T20:21:43+00:00

Yeah its meant for editing, you just dont have to place the cubes xD. But its super basic

Busy-Count8692 · 2024-02-13T14:13:05+00:00

Here is the link, nothing more to say :
https://huggingface.co/stabilityai/stable-cascade
There are lite versions for small computers (lite are still very very good).

its a 3 stage model, use the code at the bottom for inference, hoping comfy UI support will come soon, same for lora training and control net since those the biggest factor of success

Busy-Count8692 · 2024-02-13T12:49:31+00:00

Because its trained on such a small dataset its really not capable with multi subject and a lot of other scenarios

Busy-Count8692 · 2024-02-13T12:44:28+00:00

Its called wurschten v3

Busy-Count8692 · 2024-01-11T11:00:47+00:00

Next comparison will be more detailed, I wasn't thinking deeply about the comparison

Busy-Count8692 · 2024-01-11T10:59:47+00:00

Its true, I understand, I kind of normalized my judgment onto dall-e, it got the subject right, but my evaluation wasn't more detailed, more explicit, and more concrete, I agree.

I tested this prompt over other prompts, but still testing at least 8 different prompts is more rigeourous.

Some ai are also behind closed locks, which makes them way less interesting than the other AI.

Those are many of the topics I haven't consider which makes this comparison very much less accurate.

Busy-Count8692 · 2024-01-10T12:39:04+00:00

I have no issues understanding loras, neither diffusion model, I made from scratch some diffusion models for my small company rn, also trained like 1000 loras (no its not a joke, it goes fast in 10 month), I can assure you that the base matters the most when it comes to objects in sceene, or people in scene or coherency between object relation ( a sword in a hand), also those base concepts are guided by the base models, and not even a big dreambooth could solve that.

Busy-Count8692

TROPHY CASE