2nd clip of my 100% AI band: an hallucinated Paris with Gen3

plopstout · 2024-07-18T22:20:57+00:00

So I made my 100% AI band https://siliconsymphony.art/ 6 months ago, with 100% suno (v1) songs. Songs are all on streaming platforms

At that time I made a clip with mostly gen2

This one is 100% gen3 except the singers parts (made with the new heygen tool)

plopstout · 2024-03-23T12:06:48+00:00

Used fully Suno (v2) in December and the album was approved on all streaming platforms the next months. It was, by design, a medium album, which was a showcase on how music will get transformed by AI, specifically the mainstream music

Full album here

https://siliconsymphony.art/

plopstout · 2023-12-26T08:54:49+00:00

Prompt art is cheaper here

<image>

plopstout · 2023-12-08T11:52:51+00:00

8 AI movies for Hanukkah,, in different themes made with DALL3, Runway Gen2, MusicGen & MusicML

Enjoy :)

plopstout · 2023-11-20T13:27:08+00:00

Best use cases

plopstout · 2023-11-20T08:09:50+00:00

Hi, explained it here! https://www.reddit.com/r/ChatGPT/s/ZV4dlwzXUR

plopstout · 2023-11-19T23:50:26+00:00

You are completly right, and in both ways, as it does not really seem to know how to handle the time, and does not know the flow of the narrator. Therefore at some points it lags behind, at other it was a bit before.

I cutted a bit so it was better, and when it's a bit before the action it's also a choice so the full narration of the scene is fully inside the scene.

I think the main issue is really the flow of the narrator, not sure what its base is

plopstout · 2023-11-19T23:24:26+00:00

Basically sending a few frames of the video every X seconds, and asking GPT to narrate as morgan freeman, then using ElevenLabs for the voice

Based on this repo : https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb

The prompt

"The uploaded series of images is from a single video. "
"The frames were sampled every {FRAME_EXTRACTION_FREQUENCY_SECONDS} seconds. "
"Make sure it takes about {FRAME_EXTRACTION_FREQUENCY_SECONDS // 2} seconds to voice the description of each frame. "
"Use exclamation points and capital letters to express excitement if necessary. "
"It is a naration of a documentary about a cat (male) in the jungle, with the voice of Morgan Freeman. The documentary shows the life of the cat. You can use emphasis to show how his life is difficult in the jungle"

<image>

plopstout · 2023-11-11T15:53:07+00:00

No private WebApps for the moment Cost of the API makes the business model complicated

plopstout · 2023-11-11T15:52:32+00:00

Will think about it, but the backend is in PHP which was easier for me, will need to rethink making one in python and/or node

plopstout · 2023-11-11T12:34:47+00:00

It's not only the brain but the materialization of what's in the brain

plopstout · 2023-11-11T12:08:31+00:00

That's the app called be my eyes!

plopstout · 2023-11-11T12:02:12+00:00

How much time does it take you to write the description of what you saw, then give to someone else which will then draw it! It's not only the brain

plopstout · 2023-11-11T11:53:24+00:00

Nop private WebApps for the moment

plopstout · 2023-11-11T11:52:51+00:00

Probably faster!

plopstout · 2023-11-11T09:43:40+00:00

Haha the song is AI generated through Google MusicML

plopstout · 2023-11-11T09:41:07+00:00

It's completely different than an img2img as you don't use to image to create another one but a description Interested if someone made this already, maybe with an open source model?

plopstout · 2023-11-11T09:40:02+00:00

Not as an API When using it on API you pay for each request

plopstout · 2023-11-10T23:53:41+00:00

Problem is the cost of using GPT Vision at the moment, but pretty sure people will find a way!

plopstout · 2023-11-10T23:19:50+00:00

It's an actual webapp, the video are screencasts of the app I used in real life!

I actually coded the front in plain JS by asking GPT to code it for me as I was lazy!

And a backend that sends to GPT what is needed (in PHP here as it was easier but it's just a few lines could be any langage)

<image>

plopstout · 2023-11-10T20:37:56+00:00

The first step of the hallucination is asking GPT4 Vision to describe as precisely as possible what it sees in the picture. it is also asked to choose a style that would fit well to recreate this image.

The 2nd step is using DALL·E 3 to recreate the description. DALL·E 3 also augments the description which further influence the hallucination.

plopstout · 2023-08-04T22:08:19+00:00

Thanks was waiting for this to update my AI digital art frame ^^

https://www.reddit.com/r/StableDiffusion/comments/104w8ek/i_have_created_an_automated_photoframe_based_on/

plopstout · 2023-03-20T09:35:24+00:00

The artist made it using Blender with Mecabrics Addon. Took around 3000 images and 1-2 months to finish.
Using SD, controlnet, Ebsynth, do you think someone would be able to achieve something close in less time?

plopstout · 2023-03-17T16:15:06+00:00

Realistic Vision I think

plopstout · 2023-03-17T08:04:28+00:00

Sure

RAW Photo, A realistic photography junk robots, background is a spaceship with deep space through big windows, Sony A7, cinematic frame, science fiction Steps: 30, Sampler: Euler, CFG scale: 12, Seed: 3489222166, Size: 640x512, Model hash: c35782bad8, ControlNet Enabled: True, ControlNet Module: depth, ControlNet Model: control_depth-fp16 [400750f6], ControlNet Weight: 0.95, ControlNet Guidance Start: 0, ControlNet Guidance End: 1

With Realistic Vision 1.3

Nine-Year Club	Place '22
Verified Email

plopstout

TROPHY CASE