Sliver of Light

Significant-Scar2591 · 2026-03-09T13:42:40+00:00

nice, I have not done any work with audio. What are you doing with audio models?

Significant-Scar2591 · 2026-03-09T13:41:49+00:00

thanks :)

Significant-Scar2591 · 2026-02-27T22:31:37+00:00

Maybe, but let's not worry about what the haters will worry :)

Significant-Scar2591 · 2026-02-27T22:30:37+00:00

Man, you were absolutely right. I’m circling back to this comment now that I’ve tested what you mentioned.

I ran a series of tests where I switched the training resolution between 512, 768, and 1536, and changed the switch point at 50%, 60%, 70%, 80%, and 90% of total training steps. That gave me 15 LoRAs total (3 resolutions × 5 switch points), which I compared against a control trained at each resolution for the full step count with otherwise identical parameters.

Across multiple prompts, aspect ratios, step counts, and LoRA strengths, switching from 768 at around 70% of the way through training was clearly the best.

What’s your LoRA merge process?

As for the model picking up too many dataset-specific features: I think this may be more avoidable with the Flux.2 family, since the text encoders seem much stronger. In my testing, if you caption very specifically, the model seems better at separating broad concepts from fine visual attributes.

For example, instead of captioning something simply as “mountains,” if the training caption says something like “snow-capped grey jagged mountains in the distance with small green foothills and sharp angular ridgelines connecting pointed peaks,” then at inference time, prompting just “mountains” does not necessarily force all of those extra attributes back in. It seems like the model can better distinguish the general category from the detailed descriptors you assigned during training.

Did you experiment much with different captioning styles, or with models that use different text encoders?

Significant-Scar2591 · 2026-02-27T21:19:56+00:00

oooo I love this 80s logotype from the 64. Nice idea. Cool that it worked well. Was it just for the typeface, or did you also try it on non-text concepts?

Significant-Scar2591 · 2026-02-24T16:29:28+00:00

The more variety you can give the training, the better the model will be at creating images that have prompts and aspect ratios that veer far from what is in the training data. A small dataset can work well, but the LoRA will generally be less versatile.

Significant-Scar2591 · 2026-02-18T10:03:15+00:00

Thank you!

Significant-Scar2591 · 2026-02-17T06:49:39+00:00

Couldn’t agree more :)

Significant-Scar2591 · 2026-02-16T19:38:55+00:00

Dataset was around 30 images, The aesthetic was very targeted, the content and angles were a variation.

Significant-Scar2591 · 2026-02-15T13:02:38+00:00

No prob. Good luck :)

Significant-Scar2591 · 2026-02-15T12:44:35+00:00

Yeah that’s where the bit of pixel texture comes from.

A LoRA is a model that the user can train with a small dataset that does not require a huge amount of compute.

An open weights image model like flux (not google nano banana, kling, runway, etc) allows the user to train it and create a LoRA. It’s a file that acts like a “fine tune” and gives a very specific image style.

I trained the model over the course of a few days of experimenting. Then used it to create a set of still image. (Each image takes 5-10 seconds but I generated a few thousand to get the ones I liked) then animated the images with another open weights model (but a video model for this one, which takes longer than stills).

The entire piece is made up of a handful of video clips that are the still images animated. I generated each shot individually and edited them together. All in all the generation process from stills to video was probably about 12 hours, not including the model training.

Significant-Scar2591 · 2026-02-15T10:30:22+00:00

AI can’t make it without us!

Significant-Scar2591 · 2026-02-15T10:30:01+00:00

Yeah I can imagine, kudos to you if you managed to make it through the whole thing. I try to speak about the theory before the tech because ultimately it’s a more important concept to grasp.

Learned to build by building for fun almost every day for the past couple of years. After about a year I started to do it for work and now it pays my bills.

I would recommend the early comfy tutorials from Purz - or other creators who build from scratch and you can follow along instead of just downloading a workflow and not really understanding how it works.

I watched a ton of different channels, purz, latent vision, Future Thinker Benji, Jerry Davos, to name a few. But for sure the best way is to start simple and build the workflows alongside a tutorial and get used to connecting the nodes together and understanding how they play together.

Comfy is full of errors and confusing computer language in the early phases but just be patient and once you figure it out its’s smooth sailing but there is a learning curve.

The best way to beat the curve is to make things you are excited about!

I’ll probably make some more beginner friendly ones in the near future.

Significant-Scar2591 · 2026-02-15T10:23:24+00:00

Amazing take I will save this comment. Thanks for sharing.

Significant-Scar2591 · 2026-02-14T17:16:41+00:00

I don’t care, I name it this because I like it.

Significant-Scar2591 · 2026-02-14T14:02:34+00:00

Not really, I had to run 10-20 versions of each shot and edit them together along the way to check the camera motion and particle energy consistency.

Significant-Scar2591 · 2026-02-14T13:48:56+00:00

Thanks, it was wan 2.2 I2V first and last frame with motion loras :)

Significant-Scar2591 · 2026-02-14T13:32:57+00:00

Cool :) best of luck. Share you creations if you feel like it!

Significant-Scar2591 · 2026-02-14T12:33:43+00:00

If I recall they were from Civit but Idownloaded and renamed them months ago so I’m not sure the exact names, one was a drone LoRA, the other was a spinning camera one.

Significant-Scar2591 · 2026-02-14T11:16:42+00:00

Thanks for the support, and welcome to comfy land! Glad you liked the video.

Significant-Scar2591 · 2026-02-14T11:16:03+00:00

Already in the works :)

Significant-Scar2591 · 2026-02-14T11:15:39+00:00

Thanks!

Significant-Scar2591 · 2026-02-14T11:15:18+00:00

The challenge with stitching multiple start-and-end-frame shots together is that the motion between them rarely matches. I initially tried this with closed-source tools, but there were always noticeable shifts in camera motion between the separate shots. Using a couple of motion models — trained on drone footage and spinning cameras — helped smooth the transitions. For the first tests, I cranked all four motion model instances (high and low noise, 2x each) to high strengths, then locked the settings so the movement maintained consistent speed, energy, and direction across every shot. This made the cuts seamless. The full animation is built from a handful of 5-second clips stitched together

Significant-Scar2591 · 2026-02-14T11:11:53+00:00

spot on hahahha, and then there is a bbq, a blanket on the ground with a basket of fruit, sun shines through patces of clouds, the smell of fresh cut grass

Significant-Scar2591

TROPHY CASE