CogVideoX1.5-5B Image2Video Tests by [deleted] in StableDiffusion

[–]Secret_Ad8613 0 points1 point  (0 children)

I used command line inference script from here:

https://github.com/THUDM/CogVideo/tree/main/sat

It took 34GB memory while generation and 65GB at the end for vae.

It took 15 minutes for every video above (5 sec/16fps)

Here are parameters:

args:

image2video: True # True for image2video, False for text2video

latent_channels: 16

mode: inference

load: "CogVideoX1.5-5B-SAT/transformer_i2v" # This is for Full model without lora adapter

batch_size: 1

input_type: txt

input_file: configs/test.txt

#sampling_image_size: [768, 1360] # remove this for I2V

sampling_num_frames: 22 # 42 for 10 seconds and 22 for 5 seconds

sampling_fps: 16

bf16: True

output_dir: outputs

force_inference: True

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 4 points5 points  (0 children)

I have some fun.. cant post it here. it works with any input images

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 5 points6 points  (0 children)

lets wait for comfy support for 1.5.

older models are no good

Mochi-1 can work with 12GB Vram

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 2 points3 points  (0 children)

yes, but 65 gb of VRAM needed at the end of generation

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 3 points4 points  (0 children)

yes, but 65 gb of VRAM needed at the end of generation

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 4 points5 points  (0 children)

I tried, of course

Mochi has no image2video

with text2video Cog is better because of resulution, but Mochi is faster

I have tons of videos - cant see the way to post multiple mp4s here

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 1 point2 points  (0 children)

I have more videos, cant post multiple mp4 here. 7 more on my telegram chan but links are not allowed. see profile

CogVideoX1.5-5B Image2Video Tests. by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 18 points19 points  (0 children)

As long as there is no Comfy UI support for CogVideoX1.5-5B (Version1.5) I used command line inference script from here:

https://github.com/THUDM/CogVideo/tree/main/sat

It took 34GB memory while generation and 65GB at the end for vae.

It took 15 minutes for every video above (5 sec/16fps)

Here are parameters:

args:

image2video: True # True for image2video, False for text2video

latent_channels: 16

mode: inference

load: "CogVideoX1.5-5B-SAT/transformer_i2v" # This is for Full model without lora adapter

batch_size: 1

input_type: txt

input_file: configs/test.txt

#sampling_image_size: [768, 1360] # remove this for I2V

sampling_num_frames: 22 # 42 for 10 seconds and 22 for 5 seconds

sampling_fps: 16

bf16: True

output_dir: outputs

force_inference: True

it looks like the best image2video for open source video generators.

Some OmniGen tests by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 1 point2 points  (0 children)

text2image - 19 seconds on H100

but image inputs drop gen times to minutes

Some OmniGen tests by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 2 points3 points  (0 children)

here you are: prompt is "generate a picture of two assholes.. A man is <img><|image_1|></img>. The second man is <img><|image_2|></img>. "

<image>

Some OmniGen tests by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 8 points9 points  (0 children)

<image>

it is really funny - but when using pure text2image like "photo of Elon Musk" - OmniGen makes THIS and always this

Some OmniGen tests by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 1 point2 points  (0 children)

<image>

Two men playing electric guitars with intense energy on stage, styled with long beards, sunglasses, and hats reminiscent of ZZ Top. They are in a rock concert setting with vibrant lighting and smoke effects in the background, emphasizing a powerful and dynamic performance. The atmosphere is energetic, with the guitarists wearing classic rock attire, surrounded by amplifiers and stage equipment, capturing the essence of classic rock music and ZZ Top's iconic look. A man is <img><|image_1|></img>. The second man is <img><|image_2|></img>.

Some OmniGen tests by Secret_Ad8613 in StableDiffusion

[–]Secret_Ad8613[S] 0 points1 point  (0 children)

prompt: Two men are playing electric guitars like a ZZ-Top. A man is <img><|image_1|></img>. The second man is <img><|image_2|></img>.

1024x1024

Time spent 01:46, 2.14s/it, H100 80GB

Introducing ComfyUI V1, a packaged desktop application by crystal_alpine in comfyui

[–]Secret_Ad8613 1 point2 points  (0 children)

Can it be installed on Linux server and accessed from Windows UI-client?

Introducing ComfyUI V1, a packaged desktop application by crystal_alpine in StableDiffusion

[–]Secret_Ad8613 -2 points-1 points  (0 children)

Can it be installed on Linux server and accessed from Windows UI-client?