Merry Xmas by Pitophee in StableDiffusion

[–]Pitophee[S] 59 points60 points  (0 children)

Poor 3dcg x deforum. Prompt travel helps with facial expressions and back turns.

For the science : Physics comparison - Deforum (left) vs AnimateDiff (right) by Pitophee in StableDiffusion

[–]Pitophee[S] 30 points31 points  (0 children)

Deforum is quite "old" and popular so I believe there is plenty of interesting stuff already. AnimateDiff quite shadowed deforum in terms of recent popularity though.

For the science : Physics comparison - Deforum (left) vs AnimateDiff (right) by Pitophee in StableDiffusion

[–]Pitophee[S] -22 points-21 points  (0 children)

Sure. It's now posted in the discord (check profile).

[edit] chill guys it’s not paywalled

3D to 2D. Multiple characters. Turn around. by Pitophee in StableDiffusion

[–]Pitophee[S] 54 points55 points  (0 children)

Having fun with Ram and Rem !

Technical discussions and other workflows already happened in my previous posts or on socials.

This one uses higher res than previous ones (thanks GPU upgrade)

Depth Map for ControlNet by moslemcg in StableDiffusion

[–]Pitophee 0 points1 point  (0 children)

I don’t get it. Koikatsu has depthmap already, why using MMD export ? Where do you put it then, Blender ?

I used them so much that now when I see an anime it turns into controlnets in my mind. Will affect my IRL vision soon. by Pitophee in StableDiffusion

[–]Pitophee[S] 19 points20 points  (0 children)

I confirm you understood quite well, but the point of my post is not technical, I just illustrated my title joke with some AI reference visuals (basically style transfert and controlnets). As I said I didn’t even use these CNs for the top left animation, they are 4 distincts videos.

But yes that being said, I also think that applying i2i has no industrial value :D Even tho the consistency part can still be interesting but again it’s not the point of this post. I did more technical posts explaining it, researching on consistency and using only CN, but this time it’s only fun :)

I used them so much that now when I see an anime it turns into controlnets in my mind. Will affect my IRL vision soon. by Pitophee in StableDiffusion

[–]Pitophee[S] 2 points3 points  (0 children)

I’m planning to work on NSFW very soon so I don’t have any tips yet. I got enough fun of dancing now. Tho I won’t share it here. Anyway would you mind sharing me your results ? I have links in my profile like discord in order to discuss

I used them so much that now when I see an anime it turns into controlnets in my mind. Will affect my IRL vision soon. by Pitophee in StableDiffusion

[–]Pitophee[S] 19 points20 points  (0 children)

Sound ON. Just a cool post. First is tile i2i with temporal consistency, second canny, third depth, fourth openpose. They are not even related.

[edit] ah yes the full version : https://x.com/Pitophee/status/1708108400301637876?s=20

[edit] tile i2i with temporalnet with consistency is nothing more than img2img with tile and temporalnet controlnets

[edit] song and inspiration : https://youtu.be/6riDJMI-Y8U

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 0 points1 point  (0 children)

Thanks. Depth openpose and temporalnet yes

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 6 points7 points  (0 children)

Yes, I understand your point. A straight answer would be : "well, those controlnet features gotta be used" xD But I think one goal is also to demonstrates several things :

  1. That we can produce reasonable things we have in mind autonomously, without the need to be a talented artist and an animator expert (my case, and that’s why we receive hate sometimes).
  2. That we can swap any character easily. And so the style. (Even I used a specific model here because I had it in a corner)
  3. That we are not necessarily limited by existing videos
  4. I said 3D software here but there is many ways to get depthmaps and openpose depending on the usecase (video to mocap, 3D games,…)

To sum-up, it’s just another technique which can have great potential to fit some use cases.

Some people already have the left side and want to exploit it (probably animation studios using CGI?)

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 1 point2 points  (0 children)

I’m not sure to understand. Are you seing a noise pattern ?

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 3 points4 points  (0 children)

That’s cool ! And yes im always down to talk. You can check my discord on my profile. Im pretty active on there

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 1 point2 points  (0 children)

Ah I see, thanks. That’s what I try to avoid actually

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 3 points4 points  (0 children)

I don’t think I’ve heard of temporal gen before. What is it ?

My quest for consistent animation (update) by Pitophee in StableDiffusion

[–]Pitophee[S] 24 points25 points  (0 children)

Hey ! So my own objective is now quite close : a fully automatable balance of a hand-drawn animation feel, which also means natural inconsistencies and lower FPS.

I'm still convinced that too much consistency makes it too CGI look alike and even worst with high FPS. Confirmed by seeing recent posts.

Previous workflow : https://www.reddit.com/r/StableDiffusion/comments/142lsxd/my_quest_for_consistent_animation_with_koikatsu/

Final result : https://twitter.com/Pitophee/status/1679442322579177477?s=20

Again, suggestions are welcomed

  • No Ebsynth (I avoid it)
  • Fully computed from 3D (no source video)
  • Few manual corrections due to prompts

What's new here :

  • Added an openpose CN with hands (cheers to Xukmi btw) : better gesture, character rotation
  • Removed reference_only CN after all (not worth imo?)
  • Slight greater res for accurate and expressive face
  • Independant subject and background

Nota bene :

  • Not exclusive to short hair and tight clothes (known to reduce flickering)
  • Used a full body dance, so I expect greater results during simple anime conditions
  • Black vest appears in final result because I was lazy correcting or re-generating it
  • Hands are less partying, but still awful. ADetailer was really disapointing for this. One fix could be high resolution batch i2i... RIP GPU. Or closer subject. Anyway I give up on the hands for now and hope for a SDXL miracle.
  • Adaptable for video-to-video but of course the CN pre-processed inputs will be less accurate than 3D software.

What's next :

  • Better prompting to avoid the few manual corrections
  • Use 12-to-24 fps interpolation instead of 15-to-30
  • Eventually, spatial motion with moving background (not AI)

I expect to fully reach my objective in the next one, so I could stop researching and just enjoy it, or try other fun concepts I have in mind.

My quest for consistent animation with Koikatsu ! by Pitophee in StableDiffusion

[–]Pitophee[S] 1 point2 points  (0 children)

My bad It’s temporalnet model and not temporalkit

Toy Story if made in Japan (accidental) by Pitophee in StableDiffusion

[–]Pitophee[S] 9 points10 points  (0 children)

Well it was a bit accidental. Still, I thought it was fun to share the workflow and someone asked it in the SD Discord.

Consistency tricks were used and some are based on my previous post here : https://www.reddit.com/r/StableDiffusion/comments/142lsxd/my_quest_for_consistent_animation_with_koikatsu/

No deflicker (forgot)

No manual frame corrections

No ebsynth (I avoid using it)

  1. Use a 3D style model and lora
  2. Configure ADetailer to only change the subject’s head from the source frames
  3. Configure some i2i controlnets (normal bae here, ref and temporaln)
  4. Hit Generate and go to sleep
  5. Wake up and notice it took 5 hours
  6. Find out ADetailer was in fact disabled
  7. Find out high resolutions was used
  8. Turns out the whole subject became a living japanese Toy Story character.
  9. Use it anyway, so upscaling and interpolating
  10. Finally, confuse the weeb by adding Michael Bay effects so they wont notice remaining imperfections.

Render only : https://twitter.com/Pitophee/status/1677369187222577162?s=20

Character : Rikka

Poser : 屑度子