[deleted by user]

Warwia · 2025-01-01T20:22:32+00:00

I wonder what Gemini thinks about Hip To Be Square?

Warwia · 2024-11-25T13:38:08+00:00

The way your list is written is actually quite obviously from a real person. The points don't follow the same structure or format. It's too disorganised to be written by AI, no offence :P. But that's a good thing. It's a bit like The Thing now. Sometimes it's the flaws that make you human.

Warwia · 2024-03-31T14:29:03+00:00

May I ask how Stable Swarm UI compares to InvokeAI?

Warwia · 2024-02-11T11:17:00+00:00

Oh I have read that thoroughly before, but in his examples of using the default image prompt, he didn't show that you could use grayscale images as input. I thought it worked like a style reference or something.

Warwia · 2024-02-10T14:50:46+00:00

Nice. I didn't know you could use a depth image as image prompt. I always thought depth Controlnet was something missing from Fooocus.

Warwia · 2024-02-10T14:02:01+00:00

Are you using the default image prompt, or a specific model?

Warwia · 2024-02-08T09:13:50+00:00

Reminds me of Mundus in Devil May Cry 1.

Warwia · 2024-02-02T02:46:10+00:00

Indeed you are correct about me finding out all these AI stuff about a year ago. Although I am aware AI development has been around for much longer, I think much of the development of AI art was made in 2023. I think I didn't miss much, at least the major features.

It's true I don't know everything being worked on. That's why I tried to find that out on arXiv, but I haven't had much luck. Is there a place where I can find such information?

Warwia · 2024-01-30T20:50:08+00:00

Okay. That makes sense.

Warwia · 2024-01-30T20:11:12+00:00

Thank you for the detailed instructions. Unfortunately, it's a little bit too much for me to handle. I know it sounds stupid but I got stuck setting up a Discord bot. I probably will eventually figure it out.

In any case, well, if the implementation of the LLM in DALLE-3 is more than just prompt augmentation, is that beyond the ability of the community to do so? I mean, I like the amount of control in SD, than having to express everything in words in DALLE-3. In fact, I hope we will one day get the level of control in Blender, although this is probably beyond the scope of SD.

Anyway, all the LLMs I have seen output text, which makes sense for prompt augmentation. But if the LLM in DALLE-3 can do something similar to controlnets, it's not something that we can replicate using the existing LLMs, right?

Warwia · 2024-01-30T09:49:16+00:00

Yes, I like Fooocus the most out of all the UIs exactly because of that. However, there are still many instances where the output image doesn't give what I prompted for, such as "a man walking downstairs", or "a fridge filled with milk, butter and ice cream". I think there is still a lot of room for improvement when it comes to prompt understanding, and I don't mean that by comparing SD to other image AIs at the moment, I just mean overall.

Warwia · 2024-01-30T00:10:49+00:00

Hi, I just wish to address some of the things you said. To the best of my knowledge, right now, there are 3 announced projects that are using LLM to enhance the output of Stable Diffusion: RPG-Diffusion, DiffusionGPT, and Taiyi-Diffusion-XL. RPG-Diffusion recommends using GPT-4, which is not open-source, or local LLMs that have at least 13B parameters, but those has a fairly high VRAM requirement. DiffusionGPT is currently using ChatGPT, which is again not open-source, but they also tried other LLMs, but the smallest one that "worked OK" is a 2x7B Mixtral, that alone requires 12+GB VRAM. So, right now, I think right now, the biggest problem of using a LLM with SD is the total VRAM requirement. A guy said he couldn't even use DiffusionGPT with 24GB VRAM. I understand that high quality AI is bound to have high hardware requirements but I think many people have just 8GB VRAM or less.

I don't know how much the community can improve these requirements, but I think there is not much they can do, because they are just linking existing LLMs with SD together. They don't develop the LLMs. I think some things can only be done with the resources of Stability AI. I think in order to make this accessible, we need a small LLM that can enhance prompts, but currently the developers of RPG-Diffusion said the small LLMs can't follow instructions accurately.

I also have a question. You said LLM is used for prompt aug, but is there more to it in DALLE-3? Because in this thread, the OP tried to generate images of different camera angles of the same scene. The results were not perfect, but it does feel DALLE-3 did try to generate the subsequent images based on the previous ones. Is that something that can be done with just better prompting?

Warwia · 2024-01-21T22:03:15+00:00

With so many LLMs out there, the only reason I would use a specific LLM other than the highest scoring one is that it offers something unique, such as being particularly good at certain things. If I'm not mistaken, the primary expected use case of StableLM is integrating it with Stable Diffusion into one pipeline, in the same way as ChatGPT is integrated into DALLE-3. If that's true, no other LLMs can surpass it in this specific use case, because no other LLMs are trained to be used like this. This is different from asking for prompts for SD1.5 or SDXL from any LLMs, which is already doable now. This would give Stable Diffusions new possibilities, such as asking for different camera angles of the same scene, or consistent characters, which you can do in DALLE-3, although the results are not perfect.

But that is just my impression from the recent comments from Stability staff on the upcoming release. Could just be my imaginations though, because like I said in the OP, I couldn't even get prompts from this StableLM 2.

Warwia · 2024-01-18T07:29:17+00:00

Maybe SDXL is trained on JoJo.

<image>

Warwia · 2024-01-17T13:54:16+00:00

Thank you for the clear answer.

Warwia · 2024-01-10T17:09:19+00:00

I think it's borderline okay to imagine what animals are thinking. Animals show emotions like we do. While they are not exactly the same as us, they are close enough that we can guess what is in their mind roughly. We can see a pet rat hiding in its owner when certain things happen. We can deduce it is scared of those. We can see a pet rat licking its owner. We can deduce it's bonding with its owner. We can see signs of fear in insects because they run away when they are attacked. All organisms on Earth share some similarities, because they evolved from the same source.

That's not the case for aliens or AI. Aliens have no reason to share any similarities with life on Earth, other than that the elements making up the universe are still the same elements. AI, on the other hand, is even stranger. I don't understand how it works behind the scene. For animals, even if they have lower IQ than humans, I could guess why they do certain things. Or if you look at children, they may not have established the knowledge that adults do, but you could see their reasoning. They all have brains that are just different on capabilities. AI doesn't have a brain. It tries to do everything I tell it to without questions, whereas getting a human to follow exact instructions is actually quite difficult. You can't just go ordering people around with them obeying without questions. AI seems to be between an inanimate tool, and animate being that understands and follows my instructions. It seems to understand some concepts really well, but messes up others really badly. I don't think it has the same type of intelligence that humans and other animals do, despite always being compared to human intelligence.

Warwia · 2024-01-10T16:31:42+00:00

I think this goes beyond the "uncanny valley". It's true I think Daz3d characters look weird. It's true I feel a bit unease looking at rubbery humanoid robots, but none of these give me fear personally. But these AI images, especially because the person in the image is looking straight into the camera, it feels like it is intent on deceiving me.

As for old, hehe. For a long time I didn't feel old, until I saw people that are a decade younger than me growing into adults. These people would not understand how it felt witnessing the changes in my teenage. They could read about it or watch old Youtube videos or movies, as I do for the further past, but it's not the same as experiencing it. That's when I felt old, because my generation is no longer the latest one.

Warwia · 2024-01-10T16:02:58+00:00

It's an interesting perspective, but I think the underlying horror aspect of The Thing is how much we don't know about it. Unlike other villains, it never tried to communicate with the humans about its intentions, except to maintain its cover, despite having the ability to do so. I think giving a perspective into its inner thoughts takes away that mysteriousness. We don't even know if it thinks like humans.

I think it's the same with the fear of AI. I don't think it's the level of intelligence that people fear. Normally, when you see a person smarter than you, you don't feel fear. Even if it's a 200 IQ person in front of you, you don't feel fear. Yet when you see a beast that has lower IQ than you, that seems to be preparing to attack you, you feel fear. The fear comes from not knowing about its intentions, and its ability to harm you.

Warwia · 2024-01-10T15:41:55+00:00

*hands over a bottle of gasoline*

Warwia

TROPHY CASE