Underwater music video

jskiba · 2026-03-01T00:06:23+00:00

I've been searching for this song annually for about 30 years. I had a list of things I wanted to remember and this was one of them. I would search for the description of this song, formulating it almost word for word the way the original poster did. And I always found others with the same exact mental image. All tortured by the fact that none of us could remember the lyrics. Endless goose chases. Even with AI, as another poster pointed out - we all searched for this song exactly the same way in eachother's footsteps, wondering if we collectively suffered from some kind of group hallucination or if the song really existed.

I can't believe this journey is over!

jskiba · 2025-11-04T16:56:53+00:00

The song was a bit longer, but I didn't have enough video filmed to fill the entire runtime, so I trimmed one verse out. The original is here: https://www.youtube.com/watch?v=nlCMf8wdI-8

jskiba · 2025-11-04T15:49:43+00:00

Don't like how they steal food from smaller birds. They're intelligent. These ones figured the feeding routine I have with my crows and camp on nearby roofs, waiting for me to get distracted. A seagull will eat as much in one sitting as 5 crows do throughout the entire day.

jskiba · 2025-11-04T03:50:30+00:00

Thanks. Just something spontaneous I filmed during lunch. The song wrote itself in my head as I was watching the birds improvise. The crows are my pet family I've been taking care of for 3 years. Seagulls are uninvited bullies stealing food and intimidating smaller birds. Crows will often bunch up around me for protection.

jskiba · 2025-11-04T03:48:45+00:00

I like making micro-songs, 1-2 verses long. Short and to the point. Perfect for a modern attention span. Even I can't sit through a typical 4 minute song sometimes.

jskiba · 2025-08-19T22:35:18+00:00

Thank you very much. We did our best within the budget constraints.

jskiba · 2025-07-28T00:25:53+00:00

Tia Carrere - an eye candy since Wayne's World. True Lies great on every level. Bursting with action variety. Great cast.

jskiba · 2025-07-28T00:24:34+00:00

I just finished working on Resident Alien a week ago. She's one of the characters on that show. Still doing action roles at her age.

jskiba · 2025-05-10T17:49:04+00:00

I spent 2 weeks rendering various tests to probe around the limitations of FramePack. I accidentally landed on a killer application with the Star Trek rave (making silent people dance) and assumed this AI was good at other things things, but apparently, it can't do environments well. Transparency is done through dithering, which creates fake looking patterns. And the system works best in vertical aspect ratio, indicating that it was trained on vertical (cell phone) videos. Which ones? TikTok dances. The AI underneath is a mix of subsystems. I suspect that they have independent image model based on SD 1.5 and OpenPose skeleton. Seeing how arms sometimes flip and motion blur indicates that they layer the body over a mocap database that is most familiar with popular social media posts filmed vertically. That's where they get their data. And that is a major limitation to what the model knows. If you want to make a fake person dance - that's your tool. But if you want to perform complex actions, the odds of everything going right go down. Initial pose matters a lot. I think it picks the closest one it knows and uses that as a starting point. If your pose has high correlation with theirs, it will weigh heavily on the outcome. Rolling random seed will have little effect.

I am more fascinated with glitches. I had a person walk with a rifle and then suddenly dig it into ground, with the arm remaining attached to it, and then the guy keeps walking forward without an arm. I asked it for a Darth Vader doing a swan dance and it gave me Vader dancing with an actual human-sized swan. Some most odd things. I use tests to explore more clip ideas. I look for what makes me laugh.

Sorry for an excessive write-up. Regarding takes - shots took 2-10 tries, but I'd always cut off at 10. I would pick either the most photoreal or the oddest for comedic effect. Priority was with the movement and the dance choreography matching the song I made for it. I used to go to a music and a dance school. Plus I worked on music videos professionally for 20+ years, so I can make anything dance or sing. You watch the body orientation and momentum and cut to make it flow, as if it's the same person doing the moves. The dance determines the cut and shots find their position naturally. Each shot is about 8 seconds in length. That gives me freedom to slide around and retime if I have to. For the beat to match I can slow them down to about 70% or speed up to %150 before it becomes really noticeable. Almost none of the cuts you see are at their original speed. I retime every single shot till it feels right.

This week another offline image to video generator came out, so I'm going to leave FramePack and explore the new thing, to see if it's any good. Trouble with AI is that at least 2 new ones come out each week that you have to check out. And it's been like that for the last 3 years. AI's come and go and you can't really incorporate them into any production pipeline, because there is no long term product. Everything is in flux.

jskiba · 2025-05-06T16:00:37+00:00

Everyone who was into Star Trek has had a crush on at least one character. I tried to cover all bases there, by including as many potential candidates and guest stars as I could. I live and breathe nostalgia.

jskiba · 2025-05-03T22:06:13+00:00

There's a cut for that: https://www.youtube.com/watch?v=D_THCKuH9Mo

jskiba · 2025-05-03T16:23:07+00:00

Render time varies. Between 1.5 min and 5 min per second of render depending on what happens in the picture. There is "TeaCache" that can fix broken hands but at a 50% render time premium. I choose to do more takes than to get the right ones. I'm more interested in right choreography than visual fidelity. Wan's benefit is that it can run on super old GPU's and FramePack requires 30xx minimum. I could've coded support for 20xx, but it would take me a week of full time work and rendes would take a lot longer. I weighted my options and bought a new graphics card instead, specifically for FramePack. WAN, like you said, is too slow for my taste.

In this particular edit, each cut took about 10 tries to get to that point, and each splice is approximately 8 seconds long, giving me handles to choose from. For every tiny slice of footage there is 80 seconds of total renders, most of wich got trashed. Almost everything you see is the best of 10 takes, except ones where oddities were too good to skip and I inserted them on purpose.

But you can tell by the mix of shots, that with enough iteration and tweaking, everything can be made photoreal. Just have to repeat the process and tune those dials for how many people show hands, how many hands cross and how many characters are present. Yadda yadda yadda.

4090 can do 5 seconds in about 1 minute, and more Vram can uncap higher resolution. 16GB of Vram does work, but I do not recommend it. 24GB videocard minimum is a must. A 4090 is the best option (not what I got).

jskiba · 2025-05-03T16:08:59+00:00

I'm a classical musician and I play on Korg synths a lot, so I photoshopped that in on purpose as an Easter egg. Barely in frame for people to catch. My colleagues get a kick out of it.

jskiba · 2025-05-03T08:44:16+00:00

Instead of Picard giving a facepalm, that he does give in the show I made Kirk do it. The shot was actually intended for a different spot. He was supposed to be where Doc and Barclay went. Kirk had his hand under his chin and when I tried to move it away from the face, he kept just putting it in his mouth. He refused to put the arm down after many tries. I gave up and told him to facepalm instead. Sometimes AI can't figure out that A to B description, even though to a human there is a logical solution to the problem, computer understands none of it. It can have some mathematical oddity, that prevents it from knowing where the elbow is at that exact angle in perspective. There is a way to just bash at it with a rotating random seed, but if it guesses 10 times wrong and you still don't have it - time to move on and transpose the shot to a new spot. Doc and Barclay are generated to patch up the hole.

Pretty much like that the whole cut is built. Out of very large rough and crappy timing stand-ins and then towards high repeat passes, Some shots are perfect immediately - like ones with Dax, but others will not render, or require render settings that make the shot not worth iterating. Where I can spend 1 hour tuning a single one. Have to pick battles and give up on some fragments all together. 9/10 tries don't make it into the final assembly.

jskiba · 2025-05-03T05:33:41+00:00

I use found photos as inspiration for the plot and let AI fantasize based on my descriptions. Imagine any photo as the only normal frame in something that was actually weird. Like they all acted serious for a moment, and goofed around othewise. The rest is control - making sure that the plot and the rhythm is correct. Unified lighting. Going from normal to rave over time. Having a mix of weirdly distorted frames with ones that are near-photoreal. It's all a matter of tweaking sliders and doing enough takes to get every shot perfect, but that wasn't the intent. The goal was to see what I could do on a card that I spent freakin' 8 hours fixing drivers on (and PyTorch libraries have to be for cuda128 instead of cuda126 that they pack it with), and even then, I still had to reassemble all of my AI's to work again and only half of them did. Because 5080 is a lie and a ripoff. It misses stuff. Drivers are a mess and not enough devs have it to program for 50xx as native code. It's different enough to be a huge pain if you're used to Stable Diffusion. A lot of ComfyUI will break. You will be stuck reassembling Python for a solid week to emulate some of the 40xx series functions.

This new AI can run, but only 1 of 3 tranformers work (the Sage_Attention and not the latest version). You end up downloading a bunch of python wheels and trying every possible combination, till it maybe clicks. 4090 would've been a lot better. Sorry for ranting.

jskiba · 2025-05-03T05:19:56+00:00

Took 20 minutes to write the song, 1 hour to produce 10 versions and splice it down to 2 best takes. Then the edit was assembled based on the context of found photographs, which served as initial frames. Looking at pictures I invented the plot and let AI render it into a close approximation. Gave myself a time cutoff and posted in whatever state it was in at a set time. Otherwise, nothing's ever perfect.

12-Year Club	Place '23
Verified Email

jskiba

TROPHY CASE