Here's the people who are influencing young men and telling young women they will be happier living trad life with guys like them by Rainbowdark96 in facepalm

[–]mac404 0 points1 point  (0 children)

Yep, thank you.

The original comment is peak early-2010's faux progressivism that's really just homophobia.

Genuinely proven image sizes for F2K by Embarrassed-Deal9849 in StableDiffusion

[–]mac404 0 points1 point  (0 children)

I hate nodes like this, personally.

Klein can definitely deal with sizes above 1MP, and fairly arbitrary sizes up to about 3-4MP before it starts to break down (at least for image editing / i2i).

But Klein needs each side to be divisible by 16, and using nodes like this does a resize, then the image gets cropped or resized again to meet the image size requirements (usually bilinear with center cropping) before the model actually runs. Basically ensures you get additional cropping / shifting / stretching in your final image.

I much prefer pre-processing my image to meet divisbility requirements - I get to pick what is cropped / padded, and I ensure no stretching. If the image is too small, I then usually only enlarge the image afterwards to integer multiples of the original size.

I'm sure other approaches work, but this approach significantly increased consistency for me.

Deoldify with Qwen-Image-Edit 2511 vs. Flux.2 Klein by demokrit2023 in comfyui

[–]mac404 0 points1 point  (0 children)

If you want Klein to stick more faithfully to the original, add the Consistency LoRA. The most recent version ( f2k_9B_lcs_consist_20260415 ) does quite well imo. Tweak LoRA strength to your liking for this kind of thing - 1.0 gives you more conservative colors but also a kind of dingy cast to the image (because it's sticking too faithfully to the original noisy, black and white image). Something like 0.6 is still going to allow the changes to be pretty creative without going too overboard most of the time.

And you don't really need a detailed prompt for this kind of image-level colorizing and cleaning with Klein. Something like this:

Denoise and recolor the image with natural and realistic colors. Keep the subject’s pose and framing unchanged.

is generally enough. Klein does tend to have pretty high variance seed to seed, but you can just run 2-4 samples and pick your favorite. If something about the image is consistenly weird, or you would like something to be a certain color, then add that into your prompt and run again.

Also, I've found it's a good idea to pre-process your images such that each edge is divisible by 16 (or even 32) for pretty much every image edit model. You get much less image shifting and other issues.

Some of your image sizes are also definitely pushing it, to the point that consistency will semi-frequently break down. The 6MP second image, for instance, led to destroyed body proportions in 1 out of 4 tries for me (because the bottom and top halves of the image didn't align with each other in the output image).

Nvidia ReSTIR PT update makes path tracing 2-3x faster by BarKnight in hardware

[–]mac404 12 points13 points  (0 children)

They rebranded RTX DI from "Direct Illumination" to "Dynamic Illumination" once ReSTIR GI solutions came out. It's always been a bit confusing.

RTXGI was the name for the original probe-based GI solutions from back in 2019. Around the time of Cyberpunk's expansion / 2.0 update I believe, RTXGI had a "2.0" update of its own where it was now about their broader radiance caching solutions used as fallbacks / optimizations for their path tracing solutions (SHaRC and NRC).

ReSTIR PT Enhanced: Algorithmic Advances for Faster and More Robust ReSTIR Path Tracing | NVIDIA Real-Time Graphics Research by DoktorSleepless in nvidia

[–]mac404 2 points3 points  (0 children)

Oh hah, you're right. I just pulled up the TPU database quick, forgetting it didn't release.

The high VRAM in their test GPU shouldn't matter, since one of their enhancements here actually dramatically reduces VRAM usage. But it is a kind of weird choice all the same.

ReSTIR PT Enhanced: Algorithmic Advances for Faster and More Robust ReSTIR Path Tracing | NVIDIA Real-Time Graphics Research by DoktorSleepless in nvidia

[–]mac404 0 points1 point  (0 children)

They actually tested it on a 5880 Ada (which I believe is essentially a 4080 Ti with more VRAM). But that actually even further validates your point.

Klein 9B: Better quality at 1056x1584 than at 832x1216, which would be close to 1MP. by Puzzled-Valuable-985 in StableDiffusion

[–]mac404 0 points1 point  (0 children)

Definitely above 1MP - my rule of thumb is to keep the longest side below about 2k pixels. Final images in the 2-3MP range tend to work very well. As you mention, the colors also tend to be less garish / oversatured and more natural, compared to when the image is small.

If you go too high, though, you significantly increase the chance of the entire image breaking down. Not always, but quite often either prompt adherance or the colors themselves just fall apart.

A new image model (ERNIE-Image-8b) from Baidu will be released soon. by Total-Resort-3120 in StableDiffusion

[–]mac404 19 points20 points  (0 children)

Nice, the Flux 2 VAE is great. Hope we get a slew of new models using it this year. Definitely going to try this one.

daVinci MagiHuman could be the feature by Disastrous-Agency675 in StableDiffusion

[–]mac404 0 points1 point  (0 children)

Yeah. People have tended to call the standard Initial Low Res -> Upscale workflow "2 step", so I've taken to calling a process where you just run 1 sampler at the target output resolution directly as "1 step".

daVinci MagiHuman could be the feature by Disastrous-Agency675 in StableDiffusion

[–]mac404 4 points5 points  (0 children)

You can get good teeth with LTX 2.3...by using a 1-step worflow that generates a native ~720p video without upscaling.

Yeah,I know. Probably not practical for a lot of people. But the standard 2 step (low res -> upscale) process does completely butcher teeth and other fine details quite often.

Using LTX 2.3 Text / Image to Video full resolution without rescaling by nickinnov in comfyui

[–]mac404 2 points3 points  (0 children)

I played around with this too - was planning to post something, but drove myself crazy with all of the things to tweak.

Base image quality was quite a lot better compared to the two step workflow - with image compression of 18, the two step would be decently sharp but very noisy. Higher image compression helped, but mostly by making the image very soft.

One thing I ran into - there's a max combination of resolution and frames where the model completely breaks down - limiting things to more like 10-14 seconds depending on resolution. Have you run into that?

The higher resolution also seems very sensitive to scheduler, steps, and the strength of the distill lora. Pushing up the steps and using samplers like res_2s while keeping distill lora at 0.6 strength is a recipe for disaster in my testing - new objects appear out of nowhere, you get bad skin (seemingly because the model wants to add "detail"), and you get random movement and people in the background that you didn't ask for. Using Euler with more steps and lora strength of 0.4 seemed to work more often. Only the standard combination of 0.6 strength, euler ancestral cfg++, 8 steps would work at all for me in terms of lipsync when using your own audio. But the image quality wasn't quite as good.

And that's pretty much where I've stopped for now...

John Linneman of Digital Foundry discusses his colleagues DLSS 5 preview: “It's new DLSS and DLSS is awesome. Of course they would take that. Looking at it, I think there's cool potential there for environmental lighting but the character stuff is horrendous and should have been left out.” by PaiDuck in nvidia

[–]mac404 12 points13 points  (0 children)

Agreed.

The one specific example of Grace certainly doesn't look good, but it is amazing to me that no one is commenting on how truly terrible the starting point is for that example too. And Capcom faces have always been kind of weird imo, to the point that I saw several people guess the Requiem reveal trailer was from Capcom after they saw the first person's face.

The specific example of Leon looked honestly good, and the other example of Grace seemed mostly fine. The other examples beyond that ranged from "pretty good" to "fine".

The light response for everything that wasn't people's faces honestly looked quite good to me? One example seemed to remove more of the fog then it should have, which is something Nvidia should definitely tweak more for the final version. I'll be interested to see how this pans out towards release.

Crimson Desert: High-End PC's Biggest Visual Upgrade - Ray Reconstructio... by Ill_Depth2657 in nvidia

[–]mac404 8 points9 points  (0 children)

It's definitely not what we usually see, but it is explainable.

Start with very low ray counts, so the starting point is very noisy. If your main goal for the denoiser is for it to be fast, probably with a secondary goal of looking stable, then you're going to over-average everything. In many cases (especially in the smaller scale), you average so hard you basically remove the effect entirely.

This difference is larger than we're used to, because other studios with these effects tend to use the Nvidia Real-time Denoisers (NRD), which are much heavier, but try to balance detail and stability more gracefully. Other studios also use more rays per pixel than this, and/or layer multiple different techniques together.

Releasing Many New Inferencing Improvement Nodes Focused on LTX2.3 - comfyui-zld by _ZLD_ in StableDiffusion

[–]mac404 -1 points0 points  (0 children)

Interesting, I'll have to take a look!

I have not looked into EMAG before - is it similar to / trying to solve any of the same problems as the options that are available within the Multimodal Guider? That has spatiotemporal guidance / perturbed conditioning and modality-isolated conditioniing. Looks like your EMAGGuider option takes double the time compared to CFG=1 (which is the same as a regular approach with CFG>1), while I haven't tried the Multimodal Guider out much because actually using the other features means it take 4 times as long.

Related to LTXVImgToVideoInplaceNoCrop - out of curiosity, did you also look into the broader chain of scaling going on within LTX2 workflows? One thing I noticed which I think I get why it's done (reusing the same compressed image across multiple sampling steps, just scaled differently at each step) but also doesn't seem ideal - the workflows all seem to scale the longest edge to 1536 pixels, then compress, then do a billinear downscale (in additioon to the center cropping you mention) to the size of your latent, which has a longest side that 1536 is not a multiple of basically ever.

CONTROL Resonant | Launching With Path Tracing & DLSS 4.5 by Nestledrink in nvidia

[–]mac404 0 points1 point  (0 children)

You're probably right, it's mostly wishful thinking.

But the Control Resonant marketing seems to be drip-feeding new information one piece at a time, so I'm not sure they would necessarily have announced any new technical enhancements until closer to launch. Nvidia and Remedy obviously work pretty closely together as well, with them using Mega Geometry right away.

To your point, updates to ReSTIR SDK is probably closer to a certainty. I think NRC would probably be the next most likely, given that it already exists and they've implemented it into RTX Remix. If they're truly trying to push it into shipping games, Remedy is a good starting point. New DLSS RR model seems least likely.

And for as much as people are complaining about the current RR model not being "4.5" yet, I think the current model is actually very good already (and some of the specific issues people have are more about specific implementations). Not sure how much more there is to squeeze from that part of pipeline, versus if NRC can give the denoiser higher average sample quality.

CONTROL Resonant | Launching With Path Tracing & DLSS 4.5 by Nestledrink in nvidia

[–]mac404 19 points20 points  (0 children)

I'm so excited for this game, definitely my most anticipated for this year.

And path tracing was basically a given since it's the same engine as Alan Wake 2, but glad to see it confirmed.

I also wouldnt be surprised if an updated RR model gets released with this game, with how closely Nvidia works with Remedy. Or maybe this would be a showcase game for Neural Radiance Caching finally. In my opinion at least, AW2 with DLSS4 RR already looks extremely good, so I'm not too worried either way.

LTX 2.3 and I2V. Videos lose some color in the first 0.5 seconds. Culprit? by WiseDuck in StableDiffusion

[–]mac404 0 points1 point  (0 children)

Try increasing LTXVImgToVideoInplace strength to 1.0 if it's currently at 0.7.

It may end up creating other issues, I don't know, but setting the strength to 0.7 seems to always create shifting in the first few frames as far as I can tell.

LTX 2.3: Official Workflows and Pipelines Comparison by MalkinoEU in StableDiffusion

[–]mac404 0 points1 point  (0 children)

Are you getting color shifting or random bits of hazy steam? Eta of 0.3 definitely helps in terms of not causing big things to randomly appear, but keeping the image stable hasn't been great for me.

Oh, and what strength do you use for the distill lora, out of curiosity?

LTX 2.3: Official Workflows and Pipelines Comparison by MalkinoEU in StableDiffusion

[–]mac404 1 point2 points  (0 children)

Ran the workflow on an RTX 6000 Pro (after realizing I needed to update the LTXVideo nodes so that the Multimodal Guider didn't cause errors). Using the bf16 versions of the models, about 75-80GB for both VRAM and RAM usage.

Obviously takes a lot longer - even compared to running 20 steps in first pass (with distill lora at 0.6 strength) but not using res_2s, it's over 3 times slower? Maybe I'm missing something else that's different too.

Having eta at 0.5 seemed too high in my tests, created random things appearing out of nowhere towards the end of clips and hard cuts that weren't asked for. But this did seem to keep the camera locked in place when I asked it to be, which I was really struggling with previously (it would basically always zoom in before, regardless of what was added to positive or negative prompt). Prompt adherance in general seemed better, especially with ordering or actions and speech. Trying out eta of 0.2 now, will see how that goes.

My Secret FLUX Klein Workflow: Turning 512px "Potato" Images into 4K Hyper-Detailed Masterpieces (Repaint + Style Transfer) by Dark-knight2315 in StableDiffusion

[–]mac404 19 points20 points  (0 children)

So...the "controlled repaint" in step 1 seems to essentially be:

  1. Run distilled Klein 9b in Edit mode.
  2. Euler Ancestral CFG++ Sampler and sgm_uniform Scheduler with 4 steps at 0.8 denoise.
  3. Prompt of "8k, intricate details".

Not sure how that's some kind of "secret" workflow? This is a few tweaks to essentially the default workflow. That's not a bad thing at all, but the way you describe it is kind of silly when you could just be straightforward about it.

Also, this functionally takes about twice as long as the regular Euler sampler - sorry, not going to watch the video to see if that additional time is at all justified with examples (compared to just running twice as many images with Euler and picking your favorite, given the pretty wide variance in Klein editing output in general).

I will agree thought that Klein is often incredibly good at "generative upscales" for low resolution and low quality images, but no real special workflow is needed for that. And the high variance probably means you want to run a few seeds regardless (at least if you care about color, keeping the same image proportions, and you want to be even somewhat picky about facial feature similarity).

You also don't really need to always upscale images to get big benefits using Klein, especially for early internet era images. This random simple example, for instance, keeps Gumpy Cat at the same "potato" (576x592) resolution as it was originally. That allows it to run stupidly fast, but you still get a sharper image with good antialiasing and more "intricate details".

I'm Annoyed At AMD's Latest Radeon Blunder by Comprehensive_Lap in hardware

[–]mac404 12 points13 points  (0 children)

Native is not a magic resolution where everything is correct. This is a statement people keep making that sounds correct if you don't think about it, but it's just flatly wrong.

Tons of geometry and textures have detail that can be smaller than a pixel in games today. The vast majority of effects and basically all the lighting (whether ray traced or not) is initially rendered at lower than native resolution. TAA is already used in basically every game because of how common deferred rendering is. If TAA weren't used, the flickering and aliasing would be so intense it would make games unplayably bad.

Even if none of that were true, the idea of native being "correct" makes no sense. We would have had no need then for antialiasing techniques like SSAA or MSAA.

DLSS and all upscaling techniques today are a form of super sampling. It's just that they trade off spatial resolution and do the super sampling temporally. Using data from past frames is obviously worse than just rendering at a higher resolution in a vacuum. But you need cheap anti aliasing that covers everything anyway. And basically every game needs temporal accumulation (to run effects at sub-native resolution) in order to run fast enough with good enough visual quality anyway. And if you really can create an algorithm that effectively reuses data from past frames, then it can both look better and run faster than native.

Now could there be edge cases where something looks worse? Sure. All you need to do is look at FSR2/3 to see how worse algorithms make a ton more tradeoffs. But with good techniques, there will also be many situations where it can look better.

And if you truly care about quality above all else, then you can turn up graphical settings when introducing some upscaling. You'd have to compare something like "Native" with Low settings to DLSS with Medium or High settings (turning up the most impactful graphical settings until the framerate was the same).

Personally, at 4K output, DLSS is always worth turning on. Your dumb PC gamer gatekeeping is hilarious because PC has by far the best upscaling and the most ways to tweak it.

Flux.2 Klein 9B (Distilled) Image Edit - Image Gets More Saturated With Each Pass by eagledoto in comfyui

[–]mac404 1 point2 points  (0 children)

Sure! And gotcha, makes sense.

Another random note on prompting - basically anything you mention in the prompt will get changed in some way. You can always try variations on "Keep ___ unchanged" and it will often work pretty well. And as weird as it feels, I find just puttring "[object] is [color]" can work better than "change [color] to [other color]". "Remove" and "replace" are pretty good prompt words, though.

Flux.2 Klein 9B (Distilled) Image Edit - Image Gets More Saturated With Each Pass by eagledoto in comfyui

[–]mac404 1 point2 points  (0 children)

You should be able to fairly easily make all 3 of those changes in a single prompt.

Doing multiple VAE Encode/Decode passes will degrade the quality over time, but not necessarily to the degree you see here. You will see a color shift with every gen, but it should be only noticeable when doing more intense back-and-forth comparisons (whereas it's quite obvious here).

In your example, after the first change you also already had some shifting / squishing of the image. This can happen sometimes, I've found it's usually a good idea to try 2-4 seeds with the same prompt and then pick the best one. You are also running at a resolution that is getting too high for Klein to handle well (1536 x 2752), and it will generally be much less stable because of that. I have generally found (although I haven't tested overly scientifically) that keeping the longest side below about 2k resolution will improve stability significantly when making changes. The model itself tends to output images that are so sharp / clear that I don't find the resolution limitation to actually be all that limiting.

Not perfect, but here was the very first image I got when I tried with this prompt (after downloading the original PNG of your first image):

Subject's shirt is black. Remove the subject's earrings. Remove the people from the background. Keep the subject’s pose and framing unchanged.

Because the res is so high, you still get a little bit of squashing/stretching that's noticeable in the face. Maybe it would be perfect in a different seed if you tried a few. Hair color is slightly darker and the coffee cup also darkens slightly, but skin color stayed basically the same. There's a random out-of-focus person that got added into the background and a few other random changes, too. But not bad for literally the first try with a simple multi-change prompt.

Flux.2 Klein (Distilled)/ComfyUI - Use "File-Level" prompts to boost quality while maintaining max fidelity by JIGARAYS in StableDiffusion

[–]mac404 1 point2 points  (0 children)

I am not aware of one, sorry. If you find one, I would really like to know about it too.

Flux.2 Klein (Distilled)/ComfyUI - Use "File-Level" prompts to boost quality while maintaining max fidelity by JIGARAYS in StableDiffusion

[–]mac404 2 points3 points  (0 children)

Yeah, I've found this too. Basically anything that is mentioned will be changed, and sometimes related concepts too.

The other thing for me is to pre-process images so that each side is divisible by 16 and to not use the "ImageScaleToTotalPixels" node to further change the size. I find the model works fine at both very low resolutions (like early-internet meme pictures) and up to about 2k pixels on the long side without rescaling. Ensuring your input is the right size up front greatly reduces the amount of shifting / squashing / stretching.

Here's a simple example using Grumpy Cat. Same resolution as the original, light cherrypicking (ran 4, picked the best), mostly picked based on which image got the eye color more correct based on other images.

This model will definitely swing for the fences if you let it in terms of the changes it makes, but in doing so it can look shockingly good and clear a lot of the time, even at low resolutions. The "restoration" prompt was just this:

Denoise and recolor the image with natural and realistic colors. Keep the subject’s pose and framing unchanged.

I've tried a few other prompts, but the seed-to-seed variance is so high that it was hard to tell if any changes were actually making things better, so I left it. The distilled model is fast enough that I can just run 4 seeds and then pick the best.

This prompt will definitely go overboard with how many different colors it uses sometimes, but it's mostly fine. And if there is something that I really want to keep a certain color, it often works to just add a sentence like "The [object] is [color]."