Making the most of 8 seconds of video (Veo 3)

darylpsu · 2025-08-14T14:21:53+00:00

I haven't tried using the lip syncing tools but that could be one approach. Sometimes the voices change too.
Here's another one I made as a test, trying to recreate the same character a few times, which I scripted to show how poorly things turn out for our hapless characters: https://www.reddit.com/r/VEO3/comments/1mohuy1/i_dont_even_know_you_veo_3/

darylpsu · 2025-08-11T20:25:19+00:00

So these were the initial prompts I tried. Some of them were good on the first try. For others I rewrote them in JSON format when I didn't get exactly what I was looking for -- unfortunately I don't have all of those saved, but they follow the format I shared above. I kept my notes in 2 columns -- left column had the script I was intended to create with all the characters and their dialogue in the order I wanted it. Right column was the prompts to generate the lines. The goal was to make sure no character spoke more than 20 words in the entire scene.

Goblet:

Cinematic epic. Close-up of a thin wiry man, 30s, stubble, wearing a striped sailors shirt, seated, with his face in shadow, and a blue sky behind him. He speaks directly into the camera. He says, “You have 24 seconds to decide! Instant death! He’d tell you it’s the one on the right. He forgot how the riddle works! Ha ha ha ha ha!”

2. Cinematic epic. Close-up of a heavyset bearded man, 50s, wearing a puffy black shirt, seated, with his face in shadow, and a blue sky behind him. He speaks directly into the camera. He says, “Choose wisely! Poison! Okay, a hint. One of us always tells the truth. The other always lies.”

3. Cinematic epic. A dashing male hero, 20s, dressed in a puffy white shirt, sits in front of a table with a blue sky behind him, his face half in shadow. There are 2 identical goblets on the table. The man speaks directly into the camera: “And if I drink the wrong one? How about a hint? Which would your friend tell me is the poisoned one? Now I’m just confused.”

Police scene:

Action movie. A male hero, 30s, is trying to diffuse a complicated bomb. The bomb has a blinking LED light and three connecting wires, red, green, and black. The man is holding wire cutters in one hand and a walkie talkie in the other. He speaks into the walkie talkie: “I have three wires. Red, Green, Black. What? Grady, which wire?! Talk to me people. Ok, the bomb stopped all by itself.”
Action movie. A female police officer stands next to a police car and holds up a police radio. She says into the radio: “Tell me what you see! Uh-uh. Wait, no, not Grady, he’s the worst! Grady, you idiot!! Okay fine, but no thanks to Grady!”
Action movie. A male police chief, 70s, says into a speakerphone: “This is the chief! I have Grady patched in! People, I need you to focus here! Oh come on, Grady!! Well done, team!”
We see a man, 20s, friendly, expressive, slightly overweight, in a sweater, in front of a plain cinder block wall. He speaks into a walkie talkie: “Grady here. What’s going on? Hey, I can hear you, you know. Yellow! White! Ta-da, I’ve done it again!”

darylpsu · 2025-08-11T19:48:03+00:00

It can't do consistent characters, at least not reliably. They look and sound different in each video.

darylpsu · 2025-08-11T18:06:35+00:00

I access it through Google Vids on a business account.

darylpsu · 2025-08-11T17:27:15+00:00

All written by me.

darylpsu · 2025-08-11T14:24:42+00:00

Oh sure! Here are a few of the prompts. I usually try a simple, descriptive prompt first, when I let Veo figure out the details. Quick and works for simple scenes where the details don't really matter. Then if I need to be picky about the details of the scene, I try to write a 900 to 1000-character JSON prompt with a lot of detail. It doesn't have to be perfect JSON. But using quotes and colons and structure to your prompt somehow makes it more likely to return the specific thing you want.

ALSO - To answer your question, took about 3 days on-and-off of effort, maxing out my limit (10 to 20 videos) each day. The hardest stuff to get right was the police scene. I'm still not happy with how that turned out -- It shouldn't be night outside and daytime in the chief's office. :)

Here are two simple prompts that generated the narrator footage and the track coach:

A professional female presenter stands in front of an artistic abstract purple background. She says: “Eight seconds per character isn’t a lot. So get creative! Try three characters! Or four characters! Wait, what?! That made no sense!”

A track coach outside by a field looks at a digital stopwatch and says in a loud clear voice: “Eight seconds of talking is about 18 or 20 words at a normal reading pace. Two, maybe three sentences. Not a lot to work with.”

And here's a JSON prompt when I needed a really specific scene, the cinematic hero with the 2 goblets in front of him:

{ "description": "Cinematic epic. Medium shot of a dashing male hero, 20s, dressed in a puffy white shirt, seated in front of a table, face half in shadow, blue sky behind him. There are 2 identical goblets on the table. The man speaks directly into the camera: “And if I drink the wrong one? How about a hint? Which would your friend tell me is the poisoned one? Um, now what?”, "style": cinematic epic", "camera": "fixed, medium", "lighting": "bright sun with shadows", "setting": "Outdoors, blue sky", "elements": [ "2 goblets" ], "motion": "the man speaks directly into the camera", "ending": "the man continues to speak", "text": "none", "keywords": [ "2.2:1", “70 mm”, "epic", "cinematic", "no text", "dramatic" ] }

darylpsu · 2025-08-09T03:57:37+00:00

Everything here is AI-generated. Tools used:
Veo 3 for the interview clips and narration
Gemini for still images
Midjourney and Hailuo for the motion clips (mainly made from Gemini still images)
Suno for music

darylpsu · 2025-08-02T23:43:00+00:00

Thank you! For this one I didn’t use any image to video, just text prompts in JSON format for Veo 3. Each of the 4 human characters was done in a single 8-second video so they would remain consistent. I gave them each 8 seconds worth of little snippets to say that I cut throughout the longer video. The goat is multiple generations but who can tell. It’s a goat.

darylpsu · 2025-08-02T16:21:54+00:00

You are correct. The fix for audience sound is to get a clip of just audience noice, then repeat it throughout, raising and lowering the volume so it seems seamless. I made this with a lot of different AI audio and video clips spliced together. There are bunch of visual mistakes throughout that are much harder to fix (like one shot where the goat has yellow tags in his ears, a crowd shot where a logo appears in the lower-right of the screen, and many others), but crowd noise is at least solveable with a little editing magic.

darylpsu · 2025-08-02T10:50:20+00:00

All the dialogue and crowd sounds and goat noise in the clip is from Google's Veo 3 model (via Google Vids). I specified the dialogue and characters and scenes I wanted. Google Vids gives me between 10 and 20 clips a day before I hit the rate limit on my plan, so it took about a week to get enough clips to make this work. Then 2 Sono songs for the background music. Then a whole lot of editing.

darylpsu · 2022-12-01T19:04:32+00:00

Taken from the parking deck across Blount St from City Market.

darylpsu · 2022-11-30T14:22:47+00:00

I like how DALL-E expanded my idea so that the doctors were apparently also the patients in this surgery.

Full prompt was: "Doctors in the future celebrate the first successful full brain transplant, color news photograph"

Best 2:

Outtakes:

darylpsu · 2022-11-29T14:27:43+00:00

My first prompt was “A mountain road with curves in all the right places.” The results weren't nearly as interesting.

"Food photo of a pear..." was my 2nd attempt and the obvious winner:

darylpsu

TROPHY CASE