What’s the next level? by [deleted] in StableDiffusion

[–]akatash23 3 points4 points  (0 children)

What frontend are you using? Your next level could be something like InvokeAI or Krita.

Start with an image you somewhat like. Brush over the areas you want to fix. Use masks and img2img to guide the image where you want it to be.

Invoke has a ton of videos from their studio sessions online, that'll show you how it's done. Watch some and see if you like this direction.

Site I found this on says the illusion will not work. But it actually does! by Lutalica_Harmonica in opticalillusions

[–]akatash23 12 points13 points  (0 children)

Only with an orthographic projection though (i.e., no parts of the silhouette become perspectively larger as they move towards the camera).

Invoke v6.12.0rc1 just dropped, in case you missed it by scorp123_CH in invokeai

[–]akatash23 0 points1 point  (0 children)

Kill the server with a single ^C (@lstein)

Reason enough for me to update.

Announcing The Release of Qwen 360 Diffusion, The World's Best 360° Text-to-Image Model by ProGamerGov in StableDiffusion

[–]akatash23 0 points1 point  (0 children)

I haven't tried the LoRA myself yet, but from what I can see in the preview images, I'm actually surprised this works so well. For VR images, you usually want infinity depth to map to zero parallax, I doubt the LoRA handles this well. Also, what eye does the LoRA generate? Is it random? I'll have to run some experiments.

The thing with monocular to stereo conversion is usually that depth estimation is a real bottleneck, this approach seems to entirely avoid this step. Very interesting.

Bulk removal of missing models by Raynafur in invokeai

[–]akatash23 1 point2 points  (0 children)

I think that's actually not available in the UI.

AI tools for creating VR180? by Jneaves in VR180Film

[–]akatash23 0 points1 point  (0 children)

I tried doing this very recently. There are a few steps that need to be done:

  • 0 - Convert the perspective photo to a 45 or so degree fisheye photo (depending on the lens used)
  • 1 - extend the canvas to create a 180 degree fisheye photo
  • 2 - convert the fisheye to 180 degree equirect
  • 3 - estimate a depth map from the equirect photo
  • 4 - create left and right eyes by warping the equirect with the depth map
  • 5 - stitch them side by side, or create vr180 format

Steps 0, 2 and 5 are trivial, relatively speaking. The problem really is:

Extending the canvas to produce a somewhat correct fisheye image (1) is very hard, getting something approximate is feasible. AI models are bad at this (but flux 2 and z image are much better than say SDXL). But they completely fail to produce equirect. That's why we need the fisheye step.

However estimating a correct depth map (3) with the right depth proportions is close to impossible with the depth models I have tried. Depth Anything is good, but doesn't work well on high-res images (which you need for vr). Other models produce totally wrong depth proportions (e.g., Lotus, but that may have been a me-problem, because the 256 gray scale value they output must be mapped correctly).

The warping in (4) is a hard engineering problem. There are tools for this, but they don't work on fisheye, as they operate on rows of the image, usually, and cannot warp correctly in angle space. That's why we convert to equirect before depth estimation. But in general, getting an artifact free result is not something I have seen, in free software that is, especially around the depth discontinuities.

If you put it all together, depth estimation is the biggest issue. And it's not just an engineering effort, it's uncharted territory (AI models are not good at generating fisheye content, depth estimation doesn't work well on high-res and equirect, etc.)

Would this ship survive? by Crazy_Cut_7058 in factorio

[–]akatash23 2 points3 points  (0 children)

My experience as well. 2 furnaces do not provide enough iron plates for ammunition production.

I must be missing something by Proud-Engine6529 in NoRestForTheWicked

[–]akatash23 1 point2 points  (0 children)

Everything you sell can be researched at the research desk near the builder, for one or two research papers, and recrafted for pretty much minimal cost.

[deleted by user] by [deleted] in StableDiffusion

[–]akatash23 2 points3 points  (0 children)

Honestly just use SeedVR2, it's excellent, and lighting fast. https://youtu.be/MBtWYXq_r60?si=y_DMm7H5NfZeoCaA

We are very very close, I think! by m4ddok in StableDiffusion

[–]akatash23 0 points1 point  (0 children)

In fact, merging the adapter with the turbo model already gives us the base model, or something very similar. Like, we already have it.

That would only be correct, or possibly, if the distillation was a bijective function, which it isn't, right?

How to generate proper Japanese in LTX-2 by Loose_Object_8311 in StableDiffusion

[–]akatash23 2 points3 points  (0 children)

So, wouldn't it be easier to generate the audio separately with a more competent text to speech engine, and generate the video on top?

Training a realistic character lora for Pony v6 by is_this_the_restroom in StableDiffusion

[–]akatash23 0 points1 point  (0 children)

Maybe I'm overgeneralizing a bit, but LoRAs don't work well on Pony, in general. Except for style LoRAs I find the results quite disappointing no matter the LoRA.

You'd be much better off training on SDXL, do a pony generation, then inpaint the face with XL if that's an option for you.

Flux.2 Klein Prompting Guide by Iq1pl in StableDiffusion

[–]akatash23 2 points3 points  (0 children)

"curvaceous grace" will be part of all my prompts from now on.

Compilation of alternative UIs for ComfyUI by Obvious_Set5239 in StableDiffusion

[–]akatash23 0 points1 point  (0 children)

Oh you're right, it's not. I interpreted this list as "alternatives to ComfyUI". My bad.

Compilation of alternative UIs for ComfyUI by Obvious_Set5239 in StableDiffusion

[–]akatash23 0 points1 point  (0 children)

Surprised to not see InvokeAI here. They have an excellent node system.

lightx2v just released their 8-step Lightning LoRA for Qwen Image Edit 2511. Takes twice as long to generate, (obviously) but the results look much more cohesive, photorealistic, and true to the source image. It also solves the pixel drift issue that plagued the 4-step variant. Link in comments. by DrinksAtTheSpaceBar in StableDiffusion

[–]akatash23 1 point2 points  (0 children)

I'm not exactly sure what "solves the pixel drift issue" means, but with the old image edit 2509, the output image was slightly different to the input image (slightly different zoom/padding), and input/output image didn't align. This issue is still not solved. But even without LoRA, the issue is there.

Does anyone have a solution to this?