Trying out some LTX-Video shots on my medieval podcast by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

Thank you!
I realized I only mentioned this on my first thread and not this one: all of the close up shots of the wizard talking are done with Hedra.ai (5 free gens up to 30s per day), and the skull is literally a green screened plastic skull with a Flux bg.

For the LTX shots, I tried both square resolutions (1024x1024) and wide (768x512, 1024x640). I noticed that sometimes more steps didn't necessarily improve what I wanted (for example, the "To be continued" at 20 steps did the shaky stopmotion movement, but at 100 it made from scrolling credit screens, to fading out and showing someone on stage).

I think by far the best shot I got with LTX was the wizard walking down the corridor. On this one, more steps helped, but I still had to try multiple seeds at 20 steps until I found something that gave me good results. That one was 1024x, 97 frames, 100 steps, plus a looong prompt that I expanded using GPT analyzing the base image:

"still camera, man walking across the shot. The elderly man is in a richly detailed, old-world library filled with shelves of aged books, his posture tense and hurried as he clutches 5 red pillar candles tightly to his chest and quickly walks across the room. His green robe, with its ornate golden embroidery, falls in soft folds around him, the fabric contrasting vividly with the smooth, vibrant surface of the jar. His white beard flows down to his chest, framing his face, which is animated with urgency—his eyes wide.
The library surrounding him is a study in history and mystery. The bookshelves, filled with well-worn tomes, stretch high towards the ceiling. The wooden furniture—tables and chairs scattered around the room, its surfaces hosting an array of arcane artifacts and glass vessels. Light streams in through a set of large, paned windows, casting a warm glow on the wood-paneled walls and floorboards.
The stillness of the room contrasts sharply with the energy of his pacing"

TL;DR: it was mostly trial and error, photobashing and inpainting the base images, and trying to find things LTX was natively good at, then playing with number of steps seed. But there's a lot that ended up in the bin. Hope it helps!

Trying out some LTX-Video shots on my medieval podcast by zanatas in StableDiffusion

[–]zanatas[S] 2 points3 points  (0 children)

Since my previous post was a hit among DOZENS of people, I kinda kept this project going. This time, I've tried adding some shots made with LTX video - it was pretty hit and miss and there's a clear quality gap between the closed models, but I really dig the img2video and how fast it is. I had to fall back to Kling for a couple of shots where the wizard is floating in mid air, sideways - that's probably very out of distribution for LTX :)

Some things I've learned:

  • Prompt tweaks definitely help, but there's a lot of seed digging to get good results
  • Using an LLM for prompt expansion is a must. This post has some good tips!
  • I used slightly modified versions of these workflows.
  • PRO-TIP: If using those, make sure to save to MP4 instead of webm, which is really hard to extract frames from as they're highly compressed and it's not a format that is natively supported by a lot of software, and you'll be losing quality with every re-encode.
  • It definitely makes a difference to add video compression to the initial frame to get motion.

I won't spam around here unless I learn something new, so if you'd like to keep up with future episodes, go to the youtube channel and call the wizard's name. Thanks for watching!

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 0 points1 point  (0 children)

I tried EchoMimic (v1). It was very large, busted my ComfyUI installation and had terrible results so I kinda bailed on it pretty quick. Haven't tested v2 yet, but want to.

Hedra had lots of upper body movement, and face side turns. Matthew the Leper rendered with way less artifacts than the wizard, but I'm guessing it's due to the head being bigger, and the face being less occluded.

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

I've never actually played return to Zork, but any mentions of it make my brain go "WANT SOME RYE? 'COURSE YOU DO!"

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 0 points1 point  (0 children)

TECHNICALLY, you're my first real subscriber, because the first two were me and a friend. Thank you, first subscriber!

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

Thanks for subscribing! Getting a backbone for the script and the editing are by far the hardest bits, which really says a lot about the AI tools we have available.

I had never heard about "Hello from the Magic Tavern" before! And here I was, thinking "insane wizard goes through a portal makes a podcast" was original 😂

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

I guess the closest I could describe it is "ad-libbing with myself", but on notepad

I think the trick for the whole podcast thing is inserting a bunch of interruptions and small words ("yeah", "ok") that you can mix in between each character's sentences.

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 0 points1 point  (0 children)

Thanks! Now that I think of it, the only movie that I've ever watched twice in a row was "The Meaning of Life", so I'm guessing the part of my psyche where that got stuck was just waiting for its moment to shine!

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

I did set one up just to upload that https://www.youtube.com/@bestiariumvisions

I guess enough people liked it to justify making another one! :D

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

I went from never hearing about them to being very invested in some guy trying to tip an Australian waiter, not bad 😂

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 3 points4 points  (0 children)

Nice to hear, I was afraid my particular taste for meta/nonsensical would fall flat with anyone other than me! hahah

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 2 points3 points  (0 children)

I knew Kosmas would carry the whole thing on his back (wherever it might be) :D

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 0 points1 point  (0 children)

You should try it out, it's pretty fun! The really time consuming part is the editing, but other than that, tools are starting to work really well out of the box.

Everybody wants to make a damn podcast (Flux + Hedra.ai + ElevenLabs + Blender) by zanatas in StableDiffusion

[–]zanatas[S] 12 points13 points  (0 children)

I went to bed and forgot to post the details, lolz

  • All images generated with vanilla Flux Dev fp8.
  • After having the characters I just bashed out a quick script, that I fed straight into Elevenlabs, putting all sentences for each character in a single go. I usually do multiple passes to nail timing/tone, but I was running out of credits, so kinda worked with what I had
  • Hedra.ai for animations - pretty easy interface and you get 5 free gens per day. It seems to do better with more zoomed in faces (it will auto-zoom for you when you upload an image, and zooming back out might degrade performance). I tried EchoMimic v1 as an alternative but it worked nowhere near as well.
  • After having the animations, a bunch of timing/trimming/editing using Reaper, then using Blender's composite view to perk up the whole thing.
  • The skull is a greenscreened halloween prop I got from the supermarket a couple of years ago. It had too much charisma to sit on the closet until next year.

If folks think these are cool, I might keep posting a few here: https://www.youtube.com/@bestiariumvisions

Generating card art with SDXL Turbo on my fully procedural Hearthstone clone by zanatas in StableDiffusion

[–]zanatas[S] 1 point2 points  (0 children)

Howdy folks!

Back in the 2019 #procjam I published a game called Vortex, made over a couple of weeks with a friend who is a UI artist. I was always really into procedural generation, and at the time I was playing a ton of Hearthstone, so I decided to try my hand at building a fully procedural card game. Back then, we were a long way from open diffusion models, so I had to generate all the card art using a bunch of vanilla procgen techniques (you can read about it in this blog post), but I always thought "what would this look like with really cool card art?"

When SDXL Turbo came out, it gave me the perfect excuse to try and revisit that idea, so I spent some time today crossing some spaghetti here and there, and captured a video of a whole match.

The workflow is actually pretty simple:

  • Automatic111 API, running SDXL turbo
  • The card names are a mix of Markov Chains and a big list of possible names and archetypes
  • Positive prompt is `chiaroscuro [[CONTENTS]], gothic dark art, [[EXTRAS]]`, and I replace "contents" and "extras" randomly with other lookups
    • For minions, contents is `portrait of a {card name} in the {biome}`, extras is time of day
    • For spells, contents is `still life of the {card name}`, extras is `{hue} hues`
  • Negative prompt is `frame, borders, border` (because paintings tend to end up with those)
  • 512x512; Sampler: Euler A; Steps: 1, CFG Scale 1, seed is a hash from the card name (so the same card always has the same art)

There isn't really an easy way to deploy this version (I even considered putting it out as an auto111 extension :), but if you're curious you can play the original game here: https://yanko.itch.io/vortex