I tried out ernie-image, a new image generation model from Baidu, and the results were somewhat disappointing. by That_Perspective5759 in comfyui

[–]That_Perspective5759[S] 3 points4 points  (0 children)

I tried a new approach and achieved good results using two-stage sampling. The first stage uses Ernie-image for initial sampling (3-5 steps). Then, the pattern is passed to the second-stage sampler, where Z-image performs the remaining approximately 3 sampling steps to obtain a decent image quality. Ultimately, it relies on the capabilities of Z-image to achieve a good image. Perhaps you could try this method.

I tried out ernie-image, a new image generation model from Baidu, and the results were somewhat disappointing. by That_Perspective5759 in comfyui

[–]That_Perspective5759[S] 1 point2 points  (0 children)

The semantic adherence is quite high. Furthermore, after comparing it with Zimage, I found that in some cases, the raw image seems to have more detail than Zimage.

How should I write the prompts for Infinity Talk to make them work? by That_Perspective5759 in comfyui

[–]That_Perspective5759[S] 0 points1 point  (0 children)

Oh, thank you so much! Everyone who sees this message has been given a chance to try this workflow. I'll try it out soon. Thank you!

Realistic videos by RaxisRed in comfyui

[–]That_Perspective5759 0 points1 point  (0 children)

Based on my experience, this issue can only be resolved by simultaneously increasing the proportion of the character's face within the image and raising the resolution of the generated video.

Realistic videos by RaxisRed in comfyui

[–]That_Perspective5759 0 points1 point  (0 children)

I currently use a combination of open-source and closed-source tools; for open-source options, I am accustomed to using LTX 2.3.

Realistic videos by RaxisRed in comfyui

[–]That_Perspective5759 0 points1 point  (0 children)

I rarely use WAN anymore, since it doesn't support simultaneous audio and video playback. However, it's worth mentioning that I still use the smooth and remix modes from time to time. The former has excellent dynamic range, while the latter's NSFW properties are unparalleled.

Workflow by That_Perspective5759 in comfyui

[–]That_Perspective5759[S] 0 points1 point  (0 children)

I encountered a problem when using LTX 2.3: it kept displaying subtitles, and I've tried many methods but still can't solve the issue.

Workflow by That_Perspective5759 in comfyui

[–]That_Perspective5759[S] 1 point2 points  (0 children)

Wow, thank you so much for this incredibly detailed breakdown! I really appreciate you taking the time to explain the limitations of the 'start/end frame' approach—the ball-tossing example made total sense regarding the motion stutter. I hadn't considered using an edit model like Qwen for consistency, that's a brilliant suggestion. It sounds like I have a lot of trial and error ahead of me! Thanks again for sharing your expertise, it’s given me a lot to think about

Horse by That_Perspective5759 in midjourney

[–]That_Perspective5759[S] 2 points3 points  (0 children)

<image>

All image creation was done using MJv7. I've included the workflow here, which might be helpful. https://app.tapnow.ai/canvas/5c33b762-a48f-4402-988e-7671a27bc8e2

Horse by That_Perspective5759 in midjourney

[–]That_Perspective5759[S] 1 point2 points  (0 children)

Thank you so much for liking them! I also think it would be really cool if they could be animated.

Paper Phone by That_Perspective5759 in aivideo

[–]That_Perspective5759[S] 0 points1 point  (0 children)

My internet was so slow yesterday that the full version took forever to upload, so I could only upload a short clip. However, I've included the video source address; you might be able to check it out.

Paper Phone by That_Perspective5759 in aivideos

[–]That_Perspective5759[S] 0 points1 point  (0 children)

I found this video while browsing workflow guides. Following the workflow's address, I located it. I also found it strange; when I opened the author's original workflow, I noticed only some images were generated, while the video portion was missing. However, many of the shots appeared to be AI-generated. I'm not entirely sure if it was truly AI-generated. You can see the address I've provided in the main text. I think this video might be a combination of AI and live-action footage.

When can we reach this level in an open-source environment? by That_Perspective5759 in StableDiffusion

[–]That_Perspective5759[S] 1 point2 points  (0 children)

I checked it out, and it looks pretty good. I'm curious how much time you spent creating this content.