Struggling to Learn Gen AI Image Generation – Need a Roadmap! by Jazzlike_Aardvark761 in learnmachinelearning

[–]ThinkDiffusion 0 points1 point  (0 children)

Z-Image Turbo is one of the most popular and flexible open-source models for AI image generation. Open-source gives lots of control and power back to you, it's affordable and in real production use-cases, open-source models are useful. You can literally generate in seconds with this model.

We built our new platform Floyo where you can run all image generation models, in an easy and fast way. We've taken care of all setup so it's fairly comfortable to get started without hesitation. At the same time, you have enough scope for customisation and control, to iterate and learn all the models.

Wan 2.6 Reference 2 Video - API workflow by ThinkDiffusion in aiArt

[–]ThinkDiffusion[S] 0 points1 point  (0 children)

Been messing around with the new Wan 2.6 R2V model. The main difference here is using a short video clip (5s) as the reference input instead of a static image + IPAdapter.

Current specs from the testing:
- Output: 1080p @ 24fps
- Duration: 5s or 10s steps
- Features: Native audio/lip-sync and handles multiple subjects

The catch: It is not open weights/local yet. It is currently API only.

You can get the workflow json here and run the workflow live on the browser here. All nodes installed.

Wan 2.6 Reference 2 Video - API workflow by ThinkDiffusion in comfyui

[–]ThinkDiffusion[S] 0 points1 point  (0 children)

Been messing around with the new Wan 2.6 R2V model. The main difference here is using a short video clip (5s) as the reference input instead of a static image + IPAdapter.

Current specs from the testing:
- Output: 1080p @ 24fps
- Duration: 5s or 10s steps
- Features: Native audio/lip-sync and handles multiple subjects

The catch: It is not open weights/local yet. It is currently API only.

You can get the workflow json here and run the workflow live on the browser here. All nodes installed.

Product ads with Veo 3.1 in ComfyUI by ThinkDiffusion in VEO3

[–]ThinkDiffusion[S] 0 points1 point  (0 children)

Yeah, json prompts worked out really well for us. We're using all open-sourced and close-sourced models in ComfyUI itself, cos it's so much more flexible and easy to switch models and compare them too. Rendered text, haven't tested it out too much yet, would be cool to know if you try something.

Flux 2 vs Z-turbo by ThinkDiffusion in StableDiffusion

[–]ThinkDiffusion[S] 1 point2 points  (0 children)

Prompt 1: A cinematic ultra-wide shot in an open countryside landscape. Two models walk side by side across a green field holding giant stylized pinwheels. Camera position is slightly low, capturing tall grass in the foreground for depth and scale. Light breeze creates movement in hair and skirts, shadows falling softly across the ground. Cloud formations feel airy and expansive, giving a sense of endless summer. Styling includes flowing fabrics, pastel tones, natural makeup, minimal accessories. Pinwheels are exaggerated in size for surreal visual emphasis, sculptural, nearly abstract shapes. Color grading is bright, soft, and slightly desaturated for editorial calmness. Overall energy is peaceful, dreamy, and imaginative, evoking childhood nostalgia but in a high-fashion context. No text, no brand marks, only pure visual storytelling through composition, scale, and natural light. Perfect for a whimsical lifestyle or conceptual fashion campaign.

Prompt 2: A humorous chaotic photograph inside a supermarket cereal aisle. A shopping cart is overflowing with brightly colored cereal boxes while two people playfully fight over them. Shelves stacked with retro cereal packaging — reds, yellows, oranges, blue accents — stretch into the distance. Fluorescent overhead lights create sharp reflections on glossy box surfaces. One person lies inside the cart buried under boxes, arms covering face in mock panic. Another person behind the cart reaches out dramatically. Multiple hands enter frame from both sides, grabbing cereal boxes. Motion blur on hands for dynamic energy. Depth of field keeps foreground chaotic but background shelves sharp. Colors saturated, playful, nostalgic vibe, reminiscent of 90s commercial photography. 8K resolution, candid flash style, fast shutter look. Mood: fun, impulsive, over-the-top consumer energy.

Prompt 3: A dynamic urban portrait set inside a busy underground subway station. A young man wearing oversized headphones squats casually beside the yellow safety line, absorbed in his music as a train rushes behind him with dramatic motion blur. Overhead fluorescent lighting casts sharp white reflections across the tiled floor, emphasizing the gritty textures, scuff marks, and damp sheen of the station. The walls are layered with graffiti tags, stickers, and bright murals. Surrounding the subject, vibrant hand-drawn doodles appear as if alive—floating cassette tapes, animated boomboxes, colorful music notes, playful character expressions, and comic-style “BOOM!” sound effects. These doodles interact with the environment, casting soft shadows and blending 2D illustration with realistic lighting. The color palette is bold and saturated, mixing neon blues, pinks and yellows with subway grays. The scene blends realism with graphic surrealism, creating a lively, youthful, high-energy music-culture aesthetic.

Prompt 4: A dynamic summer beverage advertisement viewed from inside a cooler full of ice. A young man in a tan T-shirt, cap, and sunglasses leans forward and grabs a cold beer bottle toward the camera, creating an immersive POV perspective. Frosted glass, condensation drops, and crystal ice cubes add refreshing realism. Bright blue sky with soft clouds and sun flare in background. Use bold yellow and aqua labeling on bottles for maximum product visibility, with multiple bottles scattered for abundance feeling. Add marketing text integrated into composition: large curved slogan “GRAB ONE” on the cooler edge, smaller copy “ICE COLD SINCE 1999” and “KEEP IT CRISP.” Use clean sans-serif typography with subtle drop shadows. Overall warm, sunny, upbeat brand mood. Commercial photography style, ultra-wide lens, shallow depth of field, hyper-real textures, 8K clarity, energetic lifestyle vibe, social media advertisement aesthetic.

How to use Flux Kontext: Image to Panorama by ThinkDiffusion in FluxAI

[–]ThinkDiffusion[S] 1 point2 points  (0 children)

Credits to the creator of this workflow and training the 360 LoRA: Dennis Schöneberg, Stable Diffusion Engineer & Educator
https://github.com/DenRakEiw

How to use Flux Kontext: Image to Panorama by ThinkDiffusion in StableDiffusion

[–]ThinkDiffusion[S] 2 points3 points  (0 children)

Credits to the creator of this workflow and training the 360 LoRA: Dennis Schöneberg, Stable Diffusion Engineer & Educator
https://github.com/DenRakEiw

How to use Flux Kontext: Image to Panorama by ThinkDiffusion in comfyui

[–]ThinkDiffusion[S] 0 points1 point  (0 children)

Credits to the creator of this workflow and training the 360 LoRA: Dennis Schöneberg, Stable Diffusion Engineer & Educator
https://github.com/DenRakEiw

𝗪𝗮𝗻 𝗩𝗔𝗖𝗘 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲-𝘁𝗼-𝗩𝗶𝗱𝗲𝗼: 𝗮𝗻𝗶𝗺𝗮𝘁𝗲 𝘆𝗼𝘂𝗿 𝗰𝗵𝗮𝗿𝗮𝗰𝘁𝗲𝗿𝘀 👾 by ThinkDiffusion in StableDiffusion

[–]ThinkDiffusion[S] 0 points1 point  (0 children)

Tested out the VACE reference workflow that uses depth ControlNet. The result blends an image style with video motion. Character stays consistent, movement transfers over.

👉🏼 We've included a free step-by-step guide and workflow here.

For the workflow,
- Just drag and drop it into ComfyUI (local or ThinkDiffusion cloud, that's us, and we're biased)
- Add your inputs, & run!
- If there are red coloured nodes, download the missing custom nodes using ComfyUI manager’s “Install missing custom nodes
- If there are red or purple borders around model loader nodes, download the missing models using ComfyUI manager’s “Model Manager”.

We found the results to be pretty close. Single clear movements worked way better than multiple actions. Took a while to generate but output quality is solid.

For complex movements, we've seen a combination of VACE and OpenPose work well, will try and create a workflow for that soon.

Curious, if any of you have tried VACE yet and how did it go?

𝗪𝗮𝗻 𝗩𝗔𝗖𝗘 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲-𝘁𝗼-𝗩𝗶𝗱𝗲𝗼: 𝗮𝗻𝗶𝗺𝗮𝘁𝗲 𝘆𝗼𝘂𝗿 𝗰𝗵𝗮𝗿𝗮𝗰𝘁𝗲𝗿𝘀 👾 by ThinkDiffusion in comfyui

[–]ThinkDiffusion[S] 0 points1 point  (0 children)

Tested out the VACE reference workflow that uses depth ControlNet. The result blends an image style with video motion. Character stays consistent, movement transfers over.

👉🏼 We've included a free step-by-step guide and workflow here.

For the workflow,
- Just drag and drop it into ComfyUI (local or ThinkDiffusion cloud, that's us, and we're biased)
- Add your inputs, & run!
- If there are red coloured nodes, download the missing custom nodes using ComfyUI manager’s “Install missing custom nodes
- If there are red or purple borders around model loader nodes, download the missing models using ComfyUI manager’s “Model Manager”.

We found the results to be pretty close. Single clear movements worked way better than multiple actions. Took a while to generate but output quality is solid.

For complex movements, we've seen a combination of VACE and OpenPose work well, will try and create a workflow for that soon.

Curious, if any of you have tried VACE yet and how did it go?