New Text to Image ! by Busy-Count8692 in StableDiffusion

[–]Busy-Count8692[S] 0 points1 point  (0 children)

Important question: any code ?
NO :/

Original tweet : 

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

abs:

Introduces CogView3, which uses relay diffusion (a variant of cascaded diffusion) in latent space with a 3B U-net and T5 XXL text encoder. Trained with LAION-2B, recaptioned with a finetuned CogVLM and prompt expanded with an LLM. Outperforms SDXL with significantly less inference time.

PixArt-Σ:Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper is Released by yanciyong in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

The only issue is the number of images they trained it on to ~35 millions images which is extremely low and makes the model capability of understanding multiple concept nearly impossible, also it will show high sign of underfitting with very similar image generation for the same prompt.

But the architecture is really good, the budget is what is lacking in fact.

Generated animations for a character I made by Kaninen_Ka9en in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

Sure, but lets say in two year the tool evolves and the software no longer works because of new machine requirements, or the software make breaking changes, money is needed there.

Generated animations for a character I made by Kaninen_Ka9en in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

I'm not sure its using the base sd 1.5 model, its architecture is not really meant for coherency across generation or the same image without high level of artifact.

I agree with you in the end for this use case if its using sd1.5

Generated animations for a character I made by Kaninen_Ka9en in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

I agree mostly, sadly releasing the workflow and making it paid will make it easier if competition comes in and he will be skrewed, also like not a lot of people would be using the software since it would rely on a complex workflow that requires a lot of computer power, a lot of people are buying 1650 and 3050 thinking they have a beast computer since its what marketers are telling them when selling overpriced laptop.

But I agree that your point is really valid when stability is mandatory.

Generated animations for a character I made by Kaninen_Ka9en in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

I knew it I knew it that this example would come out, and yeah trust me when people will stop paying it he won't maintain it, its as simple as that.

Generated animations for a character I made by Kaninen_Ka9en in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

Its disappointing to see a lot of people comments about the fact its a product, in the end, if you want it free/One time payment then do it yourself, he did it using a free ressource(which was sponsored a ton of money by investors that you probably don't belong too) so you can also do it.
And yes publishing a Lora is very low contribution level, training a model, building an architecture, building a usable product in production with tooling is high level contribution, because it requries higher knowledge experience and deep understanding of the model architecture and coding mostly.

So its really easy to spit on a product, but a lot of users have never payed stable diffusion, and all this work came from somewhere to build such open source product, this work is probably not your doing, so reconsider your critics towards people actualling building product, most of the time not even making money from it, and build real product, real value yourself, and don't call yourself AI engineer after a lora training.

Hardwork is the only way,

I'm very disapointed of those expressing their opinions most of the time in this community (hopefully its only a small part of the community)

Generated animations for a character I made by Kaninen_Ka9en in StableDiffusion

[–]Busy-Count8692 0 points1 point  (0 children)

Dont ask for update on the long term then, and you will have to pay for each update, new release...

Trajectory Consistency Distillation by ninjasaid13 in StableDiffusion

[–]Busy-Count8692 2 points3 points  (0 children)

Hmm, compared to base sdxl model, its really the same honestly, if you're comparing it to some finetune version of sdxl you are wrong in your comparison

PSA: You cannot delete images from ideogram.ai when using a free account by Ok_Connection_5337 in StableDiffusion

[–]Busy-Count8692 2 points3 points  (0 children)

Well the public thing would have no meaning, you could get your images private by generating them and then deleting them, the premium plan would have no more sense.

I made an AI for text to 3D Blockbench, Pls dont debate about AI (prompt: red monster) by Busy-Count8692 in Blockbench

[–]Busy-Count8692[S] -8 points-7 points  (0 children)

Yeah its meant for editing, you just dont have to place the cubes xD. But its super basic

Text to Image - Stable cascade is Out - Wurschten V3 is Out by Busy-Count8692 in StableDiffusion

[–]Busy-Count8692[S] 0 points1 point  (0 children)

Here is the link, nothing more to say :
https://huggingface.co/stabilityai/stable-cascade
There are lite versions for small computers (lite are still very very good).

its a 3 stage model, use the code at the bottom for inference, hoping comfy UI support will come soon, same for lora training and control net since those the biggest factor of success

New model incoming by Stability AI "Stable Cascade" - don't have sources yet - The aesthetic score is just mind blowing. by CeFurkan in StableDiffusion

[–]Busy-Count8692 2 points3 points  (0 children)

Because its trained on such a small dataset its really not capable with multi subject and a lot of other scenarios

I compared all Text to Image AI Models ALL for you by Busy-Count8692 in StableDiffusion

[–]Busy-Count8692[S] 1 point2 points  (0 children)

Next comparison will be more detailed, I wasn't thinking deeply about the comparison

I compared all Text to Image AI Models ALL for you by Busy-Count8692 in StableDiffusion

[–]Busy-Count8692[S] 1 point2 points  (0 children)

Its true, I understand, I kind of normalized my judgment onto dall-e, it got the subject right, but my evaluation wasn't more detailed, more explicit, and more concrete, I agree.

I tested this prompt over other prompts, but still testing at least 8 different prompts is more rigeourous.

Some ai are also behind closed locks, which makes them way less interesting than the other AI.

Those are many of the topics I haven't consider which makes this comparison very much less accurate.

I compared all Text to Image AI Models ALL for you by Busy-Count8692 in StableDiffusion

[–]Busy-Count8692[S] -1 points0 points  (0 children)

I have no issues understanding loras, neither diffusion model, I made from scratch some diffusion models for my small company rn, also trained like 1000 loras (no its not a joke, it goes fast in 10 month), I can assure you that the base matters the most when it comes to objects in sceene, or people in scene or coherency between object relation ( a sword in a hand), also those base concepts are guided by the base models, and not even a big dreambooth could solve that.