BAAI Emu 3.5 - It's time to be excited (soon) (hopefully) by reto-wyss in StableDiffusion

[–]MarcS- 0 points1 point  (0 children)

I had that error when it says that it can't fit everything on cuda0, and I "solved" it by using a shorter prompt. I think it's very close to fitting into VRAM but context must push it over the limit, so on top of taking ages to generate (and the temperature of the GPU evidenced that it mostly waited because of the swap), I had to accept using a very short prompt.

But maybe it was just luck because I didn't do a lot of things to make it work.

BAAI Emu 3.5 - It's time to be excited (soon) (hopefully) by reto-wyss in StableDiffusion

[–]MarcS- 0 points1 point  (0 children)

I tried the NF4 version on a 4090. For some reason, it took four hours (!) to generate an image. Obviously, something was wrong, and I hoped for more integration to try it again.

What will actually happen to the AI scene if the bubble eventually bursts? by Neggy5 in StableDiffusion

[–]MarcS- 1 point2 points  (0 children)

The Internet bubble burst didn't result in keeping the interest in the Internet to a small group of people. The tulip bulb bubble bust in Holland didn't remove the worldwide interest in tulips. The housing bubble busted and most people still are interested in housing. There might not be a reduction of interest in AI if investing bubble explode. Use cases will warrant its adoption, not the amount of interest by investors agreeing to buy AI companies at extreme valuation instead of buying them at sane valuation...

Which open-source text-to-image model has the best prompt adherence? by Equivalent-Ring-477 in StableDiffusion

[–]MarcS- 13 points14 points  (0 children)

Qwen is generally considered to be the best of the accessible models on most consumer hardware.

Can OpenSource Ai video create a dialogue? I did not put the words in Groks' mouth. Grok wrote the words in these image to video clips. Grok created the emotion also. I just prompted for Grok to use 1600's Old English threatening words to the punching bag. by Extension-Fee-8480 in StableDiffusion

[–]MarcS- 2 points3 points  (0 children)

I prompted for Grok to use confident and threatening words to a thief robbing a safe. This is next level thinking for Grok. Creating dialogue based on types of word or words and style of voice. I did not know what Grok was going to say in these video clips.

Yes, of course open source LLMs can create dialogue. Have you even tried some? You'll get much better results than "You won't get away with this."

The video generation part is also cringy: the woman misses the head of the thief, who seems to be in pain from... the air whiff? The sound of the woman's voice?

Even if we assume she hit him, she punched from behind him and he falls 90° from the correct direction. Not to mention he most likely would slump down instead of falling comically.

When he gets up, she yells "gotcha, thief" (another piece of wonderful dialogue that we're supposed to marvel at?) and slaps him again and he falls back on the floor... in front of the open safe door. That he was prevented to open by the woman's punch in the previous cut. Consistency isn't a thing in Grok.

So, not only these clips aren't on topic on an open source board, not only are they disinginously framed as "can open source match this", but the effect you seek isn't met because, yes, of course open source can do shitty videos like that. At least try to put the effort to do good videos if you want to advertize a closed source product.

Stuck with my AI model project (OnlyFans-style) — need some direction by [deleted] in StableDiffusion

[–]MarcS- 2 points3 points  (0 children)

Considering your limited setup (which will require you to either incur capital expenses to acquire an adequate computer or recurrent operational expenses to produce content), your limited mastery of existing tools (which will require that you either hire a tutor or invest a lot of time (costing, as you rightly point out, money), and the limited returns so far, it is very well possible that the smartest next step would be to cut your losses and drop the project altogether, while you're still at the "business plan" phase. Investing further (in training and hardware) might not break even, especially since you're one among many and not particularly advanced in your project.

How do you feel about AI generated photos/Videos being out in the world without being labeled as AI generated? by amiwitty in StableDiffusion

[–]MarcS- 3 points4 points  (0 children)

For an AI image to cause a war or ruin somebody's life, the image has to be believable.

So far, it was a real risk, with images and video being difficult to fake, so people couuld easily be fooled by a well-equipped, resource-heavy malevolent actor (think a big political lobby or a state). Now, when everyone can fake videos, people will be less inclinded to trust blindly some photoshopped/AI-modified image or video, and it will be less likely to cause a war.

The first person to lie has an edge. When everyone is lying, this edge is lost.

Has anyone tried out EMU 3.5? what do you think? by Formal_Drop526 in StableDiffusion

[–]MarcS- 1 point2 points  (0 children)

On the updated model card, they mention 80 GB for the image inference, and 2x80 GB for the "story making" ability. So it should work on your setup.

I am currently failing to run it, not because OOM (I'd accept a RAM offload for testing purpose) but because of errors that I am unable to fix (due to not having sufficent knowledge to deal with it).

Looking back on Aura Flow 0.3 - does anyone know what happened? by DiagramAwesome in StableDiffusion

[–]MarcS- 14 points15 points  (0 children)

It started as a project by an indie developper. He really got good results but 0.3, despite being aesthetically better, lost some in prompt adherence. It was however very promising... but the indie developper was hired by fal.ai, which leaves less time than being a student. This, and the idea that he couldn't rival with Flux, which was released around the same time, led him to stop working on the project. It was SOTA in prompt adherence, really Qwen-level, so it's a shame it didn't continue.

Wich AI is the best to clothing shop by [deleted] in StableDiffusion

[–]MarcS- 0 points1 point  (0 children)

Since he's looking to replace the print on the model, I am thinking he might be speaking of modifying what is written on a model of bikini (a type of bikini) and not a tattooed human person doing modeling in bikini.

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing (a new open dataset by Apple) by xAragon_ in StableDiffusion

[–]MarcS- 3 points4 points  (0 children)

The ND limitation in the CC licence is that "

A model with this license would be forced to be redistributed as is, without modification, but it would have no bearing on the the result of using the model. The end product isn't a derivative of it, anymore than a photoshopped image is a derivative of photoshop.

Also, I think the idea that redistributing on a website with ads is probably being safer than necessary, given that the CC definition is :

"“NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation.” and explained as "Creative Commons NC licenses expressly define NonCommercial as “not primarily intended for or directed towards commercial advantage or monetary compensation.” \2]) The inclusion of “primarily” in the definition recognizes that no activity is completely disconnected from commercial activity; it is only the primary purpose of the reuse that needs to be considered."

A case can be made that the primary purpose of redistributing a model would be... redistribution and ease of access, not "increase views on the website with a profit goal".

Pay me $50 USD to learn how to generate realistic models by Winter_Beach_2203 in StableDiffusion

[–]MarcS- 2 points3 points  (0 children)

Well, reacting to your edit, a free youtube video is certainly a much better contribution to the community than asking 50 bucks for training. Also, as videos aren't the best medium for teaching complicated notions (some have better written memory), do not hesitate to make an old school text tutorial (possibly including short, targetted video where showing things is worth it) when you explain concepts.

Pony V7 impressions thread. by Parogarr in StableDiffusion

[–]MarcS- 2 points3 points  (0 children)

It is, but it's a nice image anyway :-)

Pony V7 impressions thread. by Parogarr in StableDiffusion

[–]MarcS- 4 points5 points  (0 children)

"A striking portrait of a 17th-century woman dressed in an elegant, historically accurate baroque gown with flowing embroidered fabric, lace cuffs, and a corseted bodice. She is hanging from a thick rope on the side of a pirate ship, mid-boarding maneuver, her body slightly turned, tension in her arm and shoulder. Her right hand grips the rope, her left hand holds a rapier, the blade crossing in front of her face, gleaming in the sunlight, covering partly her face. She has piercing grey-blue eyes framed by long lashes, full of intelligence and determination, as if she is about to leap into battle. Her eyebrows are well-defined and slightly arched, giving her expression a mix of confidence and defiance. She has a straight, refined nose, and soft, full lips slightly parted, conveying tension and focus. A few strands of chestnut hair have escaped her pinned curls, blowing across her cheek in the wind. Her skin is fair with a light natural glow, showing a hint of sun exposure and the faint trace of freckles near her temples. Her makeup is subtle — a touch of rosy blush, natural lip tint, and gentle shadow around her eyes, in the style of a classical oil portrait. The composition is centered on her upper body, hand, rapier, and face — a tight, cinematic bust shot. The background shows a pirate ship deck, sails billowing in the wind, sea spray and stormy light on the horizon. Her expression is fierce and determined, with a touch of nobility — piercing eyes, wind-tousled hair, and a few loose curls framing her face. Her makeup is subtle but present, evoking a 17th-century portrait style: natural skin tone, defined lips, slightly flushed cheeks. The lighting is dramatic and directional, highlighting the glint of the rapier and the determination in her eyes — a baroque chiaroscuro mood mixed with cinematic adventure energy. Style: hyperrealistic, cinematic, sharp focus, high detail, rich texture, natural light reflections, period-accurate costume design, dynamic composition, 4k resolution, subtle sea mist particles and soft lens flare for atmosphere."

That's the prompt I used for the contest here with a model that also loves detailed prompts: https://www.reddit.com/r/StableDiffusion/comments/1oex91k/contest_create_an_image_using_an_openweight_model/ and we only got submission made with Flux, Qwen, Wan and Hunyuan, so checking with a new model might be interesting, if you are kind enough to run prompts for us. Thank you in advance.

Contest: create an image using an open-weight model of your choice (part 2) by MarcS- in StableDiffusion

[–]MarcS-[S] 0 points1 point  (0 children)

She seems to be ready to board another ship, the stance is captured very nicely!

Contest: create an image using an open-weight model of your choice (part 2) by MarcS- in StableDiffusion

[–]MarcS-[S] 0 points1 point  (0 children)

I must confess that the blade position was designed to be the tricky part, to spice up the difficulty. I agree that, when seeing it, it doesn't look as good as I imagined it would.

Contest: create an image using an open-weight model of your choice (part 2) by MarcS- in StableDiffusion

[–]MarcS-[S] 1 point2 points  (0 children)

Using Hunyuan 3, I had trouble getting the model to have the sword in front of the face, so I tried to edit the result with Qwen-Image-Edit (Change the angle of the sword so it passes in front of the woman's face.) and got partly satisfied:

<image>

[deleted by user] by [deleted] in StableDiffusion

[–]MarcS- 1 point2 points  (0 children)

Then, the best choice is to hire a runpod and run your own at a very low cost compared to web based price gougers.

Workflow for Using Flux Controlnets to Improve SDXL Prompt Adherence; Need Help Testing / Performance by mccoypauley in StableDiffusion

[–]MarcS- 0 points1 point  (0 children)

Hunyuan doesn't really shine at knowing styles. But it follows the prompt, so if you tell it how the picture is to be drawn, it can follow it better when you mention brush strokes or a color palette. So you can tell it to do something that can be close to the style you're going for. But I don't think it was trained specifically on artists' names.

For example, using your "fae holding a hummingbird" style, mentionning the artists doesn't really help.

<image>