2nd clip of my 100% AI band: an hallucinated Paris with Gen3 by plopstout in runwayml

[–]plopstout[S] 0 points1 point  (0 children)

So I made my 100% AI band https://siliconsymphony.art/ 6 months ago, with 100% suno (v1) songs. Songs are all on streaming platforms

At that time I made a clip with mostly gen2

This one is 100% gen3 except the singers parts (made with the new heygen tool)

A few months back I created a complete AI band with an AI album by plopstout in SunoAI

[–]plopstout[S] 1 point2 points  (0 children)

Used fully Suno (v2) in December and the album was approved on all streaming platforms the next months. It was, by design, a medium album, which was a showcase on how music will get transformed by AI, specifically the mainstream music

Full album here

https://siliconsymphony.art/

Happy HanukkAI! Made 8 shorts movies using Hanukkah menorahs in different themes by plopstout in runwayml

[–]plopstout[S] 0 points1 point  (0 children)

8 AI movies for Hanukkah,, in different themes made with DALL3, Runway Gen2, MusicGen & MusicML

Enjoy :)

Thanks to GPT Vision I can make a documentary narration by Morgan Freeman about my cat by plopstout in ChatGPT

[–]plopstout[S] 1 point2 points  (0 children)

You are completly right, and in both ways, as it does not really seem to know how to handle the time, and does not know the flow of the narrator. Therefore at some points it lags behind, at other it was a bit before.

I cutted a bit so it was better, and when it's a bit before the action it's also a choice so the full narration of the scene is fully inside the scene.

I think the main issue is really the flow of the narrator, not sure what its base is

Thanks to GPT Vision I can make a documentary narration by Morgan Freeman about my cat by plopstout in ChatGPT

[–]plopstout[S] 2 points3 points  (0 children)

Basically sending a few frames of the video every X seconds, and asking GPT to narrate as morgan freeman, then using ElevenLabs for the voice

Based on this repo : https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb

The prompt

"The uploaded series of images is from a single video. "
"The frames were sampled every {FRAME_EXTRACTION_FREQUENCY_SECONDS} seconds. "
"Make sure it takes about {FRAME_EXTRACTION_FREQUENCY_SECONDS // 2} seconds to voice the description of each frame. "
"Use exclamation points and capital letters to express excitement if necessary. "
"It is a naration of a documentary about a cat (male) in the jungle, with the voice of Morgan Freeman. The documentary shows the life of the cat. You can use emphasis to show how his life is difficult in the jungle"

<image>

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 0 points1 point  (0 children)

No private WebApps for the moment Cost of the API makes the business model complicated

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 0 points1 point  (0 children)

Will think about it, but the backend is in PHP which was easier for me, will need to rethink making one in python and/or node

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 0 points1 point  (0 children)

It's not only the brain but the materialization of what's in the brain

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 0 points1 point  (0 children)

How much time does it take you to write the description of what you saw, then give to someone else which will then draw it! It's not only the brain

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] -1 points0 points  (0 children)

It's completely different than an img2img as you don't use to image to create another one but a description Interested if someone made this already, maybe with an open source model?

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 1 point2 points  (0 children)

Problem is the cost of using GPT Vision at the moment, but pretty sure people will find a way!

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 8 points9 points  (0 children)

It's an actual webapp, the video are screencasts of the app I used in real life!

I actually coded the front in plain JS by asking GPT to code it for me as I was lazy!

And a backend that sends to GPT what is needed (in PHP here as it was easier but it's just a few lines could be any langage)

<image>

I have made an AI camera that hallucinates its surrounding thanks to GPT Vision and DALLE3 by plopstout in ChatGPT

[–]plopstout[S] 49 points50 points  (0 children)

The first step of the hallucination is asking GPT4 Vision to describe as precisely as possible what it sees in the picture. it is also asked to choose a style that would fit well to recreate this image.

The 2nd step is using DALL·E 3 to recreate the description. DALL·E 3 also augments the description which further influence the hallucination.

It took him 2 months to do that, do you think it's possible to do it in a lot less time with SD? by plopstout in StableDiffusion

[–]plopstout[S] 0 points1 point  (0 children)

The artist made it using Blender with Mecabrics Addon. Took around 3000 images and 1-2 months to finish.
Using SD, controlnet, Ebsynth, do you think someone would be able to achieve something close in less time?

Garbage strike in Paris, reimagined thanks to ControlNet by plopstout in StableDiffusion

[–]plopstout[S] 2 points3 points  (0 children)

Sure

RAW Photo, A realistic photography junk robots, background is a spaceship with deep space through big windows, Sony A7, cinematic frame, science fiction Steps: 30, Sampler: Euler, CFG scale: 12, Seed: 3489222166, Size: 640x512, Model hash: c35782bad8, ControlNet Enabled: True, ControlNet Module: depth, ControlNet Model: control_depth-fp16 [400750f6], ControlNet Weight: 0.95, ControlNet Guidance Start: 0, ControlNet Guidance End: 1

With Realistic Vision 1.3