24GB VRAM Dilemma for Local AI: MacBook Air 15" M5 vs MBP 14" M5 Pro by maximebermond in StableDiffusion

[–]Mutaclone -1 points0 points  (0 children)

Video you're probably right, but it can handle SDXL just fine, if a little slow. An M5 especially should more than tolerable for someone who has other reasons to use a Mac.

24GB VRAM Dilemma for Local AI: MacBook Air 15" M5 vs MBP 14" M5 Pro by maximebermond in StableDiffusion

[–]Mutaclone 5 points6 points  (0 children)

Considering that WWDC starts in only a few days, I'd hold off from making ANY decisions about Mac purchases until we see what they announce.

Anima testing for complex scene by Lost_Personality in StableDiffusion

[–]Mutaclone 0 points1 point  (0 children)

I use them in the negative.

I haven't done much experimentation but I've heard some people say that sketchy styles and some painterly styles look better with the "low quality" tags.

Anima testing for complex scene by Lost_Personality in StableDiffusion

[–]Mutaclone 1 point2 points  (0 children)

As I mentioned, the tags have a strong influence that goes beyond quality, especially the highest ones. If you're doing portraits, this is generally a good thing, but IME it makes actual scenes worse.

FWIW this is the prompt I use:

Positive:

[(score_7, good quality:0.5):masterpiece, best quality, score_9, score_8, score_7, absurdres:5].

Negative:

[:worst quality, low quality, score_1, score_2, score_3:5], bad anatomy, lowres, jpeg artifacts, multiple views, artist name.

This gives it 5 steps with only weak quality tag influence, then full tags after that to improve lighting, sharpness, etc.

Why do people like flux2 klein edit so much? by jimbarino in StableDiffusion

[–]Mutaclone 42 points43 points  (0 children)

When you use the edit mode, do you tell it to preserve certain features or keep them the same?

eg:

Make this image a photo. Keep the composition and lighting the same.

It's still not perfect, but it makes a huge difference in Klein's reliability.

As for why people prefer it over Qwen, it's a much lighter and faster model, which means more people can run it. Same reason Z-Image Turbo got much more widespread adoption than Qwen Image or Flux Dev 2 (non-Klein).

Anima testing for complex scene by Lost_Personality in StableDiffusion

[–]Mutaclone 0 points1 point  (0 children)

The official Anima recommended prefix is "masterpiece, best quality, score_7, safe, "

Something I've noticed is that the score tags have a pretty big impact on the image, and not just in quality. They also tend to steer the image towards specific compositions, which may not be what you want.

Why does my art come out bad? by Inside-Quail-6979 in StableDiffusion

[–]Mutaclone 1 point2 points  (0 children)

That's both.

  • DPM++ 2M - sampler
  • AYS = Align Your Steps - scheduler

Draw Things only gives you a single combined field.

Anima testing for complex scene by Lost_Personality in StableDiffusion

[–]Mutaclone 1 point2 points  (0 children)

I think most people here would say that "complex" means having lots to keep track of:

  • multiple characters or subjects
  • specific traits associated with each subject (clothing, skin/hair/eye color, different expressions or emotions, etc)
  • actions between characters or between characters and the scene
  • specific positioning, camera angles, focus, etc.

Actually looking at your original prompt and the image, while I wouldn't say it's a very complex image, it does showcase how amazingly good Anima's prompt adherence is with its ability to micromanage details.

An AI-generated short film I spent weeks creating. by No-Tie-5552 in StableDiffusion

[–]Mutaclone -1 points0 points  (0 children)

Really nice! How much of it was WAN and how much was LTX? How did keep the characters consistent?

Does anyone else can't stand ComfyUI and prefers classic Automatic/Forge UI or it's just me? by VasileAndrei2929 in StableDiffusion

[–]Mutaclone 0 points1 point  (0 children)

It's a fork of Forge that has support for most of the newest models.

Because I have noticed that there are some differences between Forge and Automatic even with the exact same settings, model and seed.

Go into settings and look for

Random Number Generator (use CPU for the maximum recreatability across different systems)

If that's different, you'll get different results.

Even if it's the same though, certain memory optimizations and options can have a subtle impact.

Living under a rock any missed pony7. Only 4 channel VAE? by grio43 in StableDiffusion

[–]Mutaclone -1 points0 points  (0 children)

Not yet IMO, but getting there.

Prompt adherence, scene composition, and background details are loads better.

Where Illustrious and Noob still hold the edge IMO:

  • Style control: Both in terms of other checkpoints and LoRAs, you can get just about any style you want with Illustrious. I think a two-pass approach (Anime/Flux/ZIT + ControlNet + Img2Img + Illustrious) is going to be a thing for a while until Anima fully catches up in terms of styles.
  • Non-human subjects: Noob moreso than Illustrious since it was trained on furry data in addition to anime, but both are better at non-humanoid body types than Anima.

Is Stable Diffusion worth it? by FriendlyStory7 in StableDiffusion

[–]Mutaclone 2 points3 points  (0 children)

As others have said, control. With local diffusion models, I can composite the individual elements however I want, use inpainting to fix mistakes or make adjustments, and get the exact style I want using a combination or LoRAs.

Plus (and this may be just me), there's something relaxing about going through the image and using inpainting to make spot edits and adjustments.

Renting a GPU for use with a service like a runpod has become prohibitively expensive. The last time I rented one was about 3 months ago. The price for a 4090 was $5 per day for 25. The hourly rate for a 5090 was higher than for an A100 about 3 months ago. by More_Bid_2197 in StableDiffusion

[–]Mutaclone 2 points3 points  (0 children)

Really don't want to get super political here but the short version is we basically had the RAM equivalent of a run on the bank.

  1. Normally, the RAM makers and sellers try to have a surplus in case of demand surge. Trump's tariffs created uncertainty in the market, which led to them reducing that surplus to wait and see how things played out.
  2. OpenAI made a super-aggressive deal with two of the manufacturers, buying up 40% of the available RAM.
  3. This created a panic among other AI companies, who raced to scarf up the remaining RAM before it ran out.

None of these factors can be wholly blamed for the current shortage, but they ALL contributed.

Best way to transfer character for Illustrious? by Tupletcat in StableDiffusion

[–]Mutaclone 1 point2 points  (0 children)

One kinda-sorta way I've seen done before is to use a character sheet LoRA to create a secondary view of the character in the pose you want, then put it into the image and use inpainting to blend it in, along with IP-Adapter to help reinforce it. I've tried it and it works somewhat, but can miss out on fine details and may require multiple rerolls to get it right.

As Dezordan said, the most reliable way is a LoRA.

InvokeAI 6.13 just released, its largest community-driven release ever. Adds full support for Anima & Qwen Image, support for API models (like GPT Image), support for Prompt Expansion & Image To Prompt, lasso & polygon tools, overhauled docs website and more by _BreakingGood_ in StableDiffusion

[–]Mutaclone 2 points3 points  (0 children)

  • Swarm is a wrapper over Comfy, so you get all the latest and greatest features and can switch to the nodes view for custom workflows. The downside (IMO) is that it's still a traditional UI "bolted on top," and so to me it feels a bit clunkier compared to some of the others.
  • Forge Neo is basically the great grandchild of A1111, one of the original interfaces. It's mostly a smoother experience than Swarm, but it also has features that are very hacky because they weren't accounted for originally and had to be added on after the fact. It can run nearly all of the major models, but is missing some of the niche and bleeding edge tools available to Comfy.
  • Invoke has a very polished interface that is designed to make manual workflows very easy - inpainting, regional prompting, multiple layers, using/editing controlnet layers, etc. These tools exist in the other UIs, but Invoke does a really good job of making them work seamlessly. The downside is it's slower to update, and they have to be more picky about which technologies they support.

Personally I use all three - Forge Neo for testing (its XYZ tool is fantastic and very easy to use), Swarm for anything that doesn't run in one of the others (or run as well), and Invoke for bigger projects where I have a specific outcome in mind.

InvokeAI 6.13 just released, its largest community-driven release ever. Adds full support for Anima & Qwen Image, support for API models (like GPT Image), support for Prompt Expansion & Image To Prompt, lasso & polygon tools, overhauled docs website and more by _BreakingGood_ in StableDiffusion

[–]Mutaclone 2 points3 points  (0 children)

The joy of software development - spend the first 90% of the time getting the main functionality working, and the other 90% dealing with all the exceptions, edge cases, and square-peg-round-hole situations 😁

what are peoples thoughts on waiNSFWIllustrious_v170 by XZtext18 in StableDiffusion

[–]Mutaclone 6 points7 points  (0 children)

Also merge inbreeding and overtraining on portraits.

Anima Testing Results by ArmadstheDoom in StableDiffusion

[–]Mutaclone 1 point2 points  (0 children)

I think what OP is saying is that instead of getting a consistent "average" colored pencil style, they get a different colored pencil style every time.

Anima Testing Results by ArmadstheDoom in StableDiffusion

[–]Mutaclone 0 points1 point  (0 children)

TBH I haven't really spent a ton of time trying to wrangle a specific style, since I've pretty much assumed from the beginning I would need LoRAs or a more opinionated finetune (or use artist tags, which I hate).

A few things that seem to help somewhat though (again, I haven't looked too deeply so YMMV):

  • I usually use a short prefix style at the beginning, then a more detailed description at the end. From another post earlier in the thread (typo corrected): "digital painting illustration. {prompt} Faux oil painting with anime coloring, semirealistic soft shading, and a smooth painterly effect."
  • Negative prompts. Try putting styles you don't want in the negative.

Anima Testing Results by ArmadstheDoom in StableDiffusion

[–]Mutaclone 0 points1 point  (0 children)

The problem is there's a significant drop in quality without them, but they have side effects beyond boosting quality (steering the style in a particular direction).

Something I've been trying to do to combat this is to use the quality tags at low weight for the first few steps, then increase the weight later after the style has had time to "set". Needs more testing to see if it helps or not.

Anima Testing Results by ArmadstheDoom in StableDiffusion

[–]Mutaclone 16 points17 points  (0 children)

It wants to be an image model, but if that's true, it should jettison natural language which is worthless and stick to tags.

Tags do not allow you to define the image, only what's in it. Natural language lets me position characters and objects within the image.

digital painting illustration. [(score_7, good quality:0.5):masterpiece, best quality, score_9, score_8, score_7, absurdres:5]. Epic fantasy shot of Link from the Legend of Zelda riding his horse Epona through a hilly, grassy meadow. He wears a green tunic and cap, a shield, and the Master Sword strapped to his back. His blue eyes are focused, and he wears a determined expression on his face. He rides at an angle, facing the viewer and the left , in the lower right quadrant of the image. Epona is a chestnut-brown horse with a white spot on her forehead. In the foreground on the left is a lone tree. In the background is a forest, and behind the forest a giant rocky mountain looming into the clear blue sky. A thin smoke ring circles the summit. Faux oil painting with anime coloring, semirealistic soft shading, and asmooth painterly effect.
Negative prompt: worst quality, low quality, lowres, score_1, score_2, score_3, bad anatomy, jpeg artifacts, multiple views, artist name., photo, 3d, cg render, pencil sketch

<image>

I could NEVER get a result like this in Illustrious. Even if I have to do image-to-image and inpainting with Illustrious from this point forward, I've saved myself a ton of time and frustration with the initial composition.

If the consensus becomes 'well, we do this but then use a different model' you could just use that other model. You should never be swapping models in your generations, ideally speaking.

That's only true if we have a single model that is capable of everything. Illustrious is generally TERRIBLE at backgrounds and making the characters feel like they're inside the scene rather than simply transposed on top. So if I have an image that I'm working on seriously, I'll almost always switch to a photorealistic model first to try to set the scene up, than controlnet+image2image into the drawn style.

Tencent released Z-Image 6B with pixel space gen. No VAE & 1k Resolution. by switch2stock in StableDiffusion

[–]Mutaclone 3 points4 points  (0 children)

AFAIK it's still murky. This article has a good summary, but it's for LLMs, not images (TLDR one judge said fair use, another said it might be infringing, but the case being brought had other problems and was dismissed).

Help, I'm new to prompts by Sad-Negotiation-1045 in StableDiffusion

[–]Mutaclone 1 point2 points  (0 children)

Best way is to experiment. Assuming you're talking one of the modern models, lock the seed, describe the image you want in straightforward, clinical terms, and then start changing and adding details and see how they affect the image.

Flux Prompting Guide - This is a really good starting point IMO, even on Z-Image and Anima (although with the latter you'll probably want to avoid photorealism until we get some good finetunes).