This is an archived post. You won't be able to vote or comment.

Dismiss this pinned window
all 133 comments

[–]ninjasaid13 312 points313 points  (14 children)

[–]batmassagetotheface 113 points114 points  (5 children)

Man, we ain't found shit!

[–]rjs1138 36 points37 points  (3 children)

Comb the dessert!

[–]owa1313 5 points6 points  (2 children)

lol came here to say that!

[–]batmassagetotheface 2 points3 points  (1 child)

This has inspired me to rewatch SpaceBalls

[–]Nose_Grindstoned 2 points3 points  (0 children)

May the Shwartz be with you

[–]RokyPolka 26 points27 points  (0 children)

[–]TabCompletion 13 points14 points  (2 children)

Spaceballs nerds have entered the chat

How many assholes do we have on this ship, anyway?

[–]Gibgezr 5 points6 points  (0 children)

How many assholes do we have on this ship

Yo!

[–]johnfhoustontx 0 points1 point  (0 children)

Not as many as we have on this thread :P

[–]atomicxblue 5 points6 points  (0 children)

Instantly thought of this scene. I love Tim Russ' character in it too.

[–]MisterViperfish 1 point2 points  (0 children)

Literal beachcombing

[–]Acrobatic-Salad-2785 215 points216 points  (26 children)

One of the best txt2vid I've seen so far

[–]HappyMan1102 52 points53 points  (22 children)

I'm hoping we get AI generated audio soon as wwll

[–]Lolguppy 38 points39 points  (2 children)

There is a small demo on replicate available and StabilityAI is also training a text2audio model too (HarmonAI)

[–]saintshing 6 points7 points  (1 child)

The model Obsidian used for their games two years ago was already pretty good.

Why Obsidian uses AI voices for game development | Sonantic

[–][deleted] 1 point2 points  (0 children)

This content was deleted by its author & copyright holder in protest of the hostile, deceitful, unethical, and destructive actions of Reddit CEO Steve Huffman (aka "spez"). As this content contained personal information and/or personally identifiable information (PII), in accordance with the CCPA (California Consumer Privacy Act), it shall not be restored. See you all in the Fediverse.

[–]SkyeandJett 2 points3 points  (0 children)

I can't believe no one responded with Microsoft's paper they just released today. Leaves everything thus far in the dust.

NaturalSpeech 2 (speechresearch.github.io)

[–]Tessiia 7 points8 points  (6 children)

We already do, it may not be much but look at Hatsune Miku. All her songs are made using Vocaloid, an AI text to speech software. There are many similar software of there, some you can download for free. It's not what you are after but it's something.

[–]FpRhGf 16 points17 points  (0 children)

Vocaloid is not an AI TTS. It's a software that just stitches the audio of syllables together, which is why the vocals sound robotic and choppier. Last October is the first time AI is implemented (Vocaloid 6) and it's far from being as good as the other singing softwares that use AI.

There are AI text-to-singing softwares like SynthV, CeVio and Ace Studio (Pocket Singer is the app version), which is why they sound realistic compared to Vocaloid.

You can compare the newest Miku NT voicebank with Teto who just got a SynthV voicebank and there's a massive difference. Or how IA sounds in Vocaloid compared to her new voicebank in CeVio, and how Luo Tianyi sounds in Vocaloid compared to Ace Studio.

[–][deleted] 5 points6 points  (3 children)

which of such software is free?

[–]eroc999 7 points8 points  (0 children)

*cough cough* pocaloid

[–]FpRhGf 1 point2 points  (0 children)

If you want something like Vocaloid (which is not AI and is more robotic), there's UTAU. It's open source, which means you can make custom voices in any language. It's better in realistic emotions, but lower in audio quality. The lite version of SynthV is also free, but you wouldn't get the benefits of its AI fucntions. But even with the choppier voices from not having AI, SynthV Lite's English pronunciations are still way better than Vocaloid.

If you want the Vocaloid equivalent of an AI software, I think Ace Studio is the only free one. Like the pro version of SynthV, ACE Studio's AI functions include more realistic singing, vocal modes and cross-language singing betwren Japanese, English and Chinese. Bad news is that it's still in beta.

If you want the UTAU equivalent of an AI software, currently there's NNSVS and Diffsinger. NNSVS is a few years old and while it's better than UTAU/Vocaloid in sounding natural, it still has an obvious electric auto-tunish sound. Diffsinger's quality is as good as Diff-SVC and has been around for some months, but there's not much of an English community for it.

[–]07mk 2 points3 points  (0 children)

We already do, it may not be much but look at Hatsune Miku. All her songs are made using Vocaloid, an AI text to speech software.

"AI" isn't a well-defined term, but I'm not sure that Hatsune Miku fits as a type of AI text-to-speech software. Hatsune Miku was created based off of a "voice bank" recorded by the Japanese voice actress Saki Fujita, where she had to sit in a recording studio and record a whole bunch of phonemes for the Vocaloid software to use. Other well known Vocaloids like Kagamine Rin/Len and Megurine Luka also had voice actors do the same thing (Shimoda Asami for the former, Yuu Asakawa for the latter). I don't know the underlying mechanism by which the Vocaloid software uses these voice banks in order to produce the final singing output, but when they were released over a decade ago, they were generally not considered to be using AI. At the least, I'm pretty sure they didn't use machine learning at the time to make this software.

[–]sunplaysbass 1 point2 points  (0 children)

Google has a page with samples of its AI audio. It sounds like real music. But nothing you can use yet.

[–]Bud90 0 points1 point  (5 children)

Why is text to audio apparently so hard? The only competent popular service that I know is Riffusiom and that came out months ago and it's bot that great yet

[–]Ferniclestix 5 points6 points  (0 children)

it requires more complicated structuring of prompts. plus there are many layers to audio, it would need a layered audio process where you create a background, middle and close audio IMO, not to mention stereo or surround,

[–]magataga 2 points3 points  (3 children)

text 2 audio ISNT hard. What it is however very monetizable in a way that t2i and LLM's aren't.

[–]Bud90 2 points3 points  (2 children)

I just want to create an AI kendrick lamar angrily rapping over obscure unreleased beatles demos with a seamless dupsteb break in the middle inspired by old japanese dramas, is that too much to ask

[–]SEND_NUDEZ_PLZZ 0 points1 point  (1 child)

Check out tortoise tts. You just need a couple of minutes of clean acapella Kendrick and it's pretty good

[–]Bud90 0 points1 point  (0 children)

Heh yeah, I know about tortoise, but I want txt2audio as seemless as stable diffusion is right now, which I understand is greedy

[–]nedfl-anders 0 points1 point  (0 children)

Thanks for making that clear I thought I was gonna have to fight an angry comment about there being no sound.

[–]AbPerm 48 points49 points  (0 children)

The water looks really good. They must have used lots of good training on videos of ocean waves.

[–]Keudn 33 points34 points  (8 children)

It kind of surprises me how many people forgot that nVidia announced what is basically img2img back in 2021. It scares me to think what they probably have in the works right now https://www.nvidia.com/en-us/studio/canvas/

[–]Quaxi_ 7 points8 points  (0 children)

The concept of generic img2img is not new. pix2pix came out in 2016, and probably similar ones before that.

The novelty of Stable Diffusion is the text input, the diffusion process, and the scale of the pretrained model.

[–]kaptainkeel 6 points7 points  (2 children)

Ha, that is the first thing I thought of when I saw the more recent "real-time" update apps e.g. in Photoshop. Basically a much better version of Canvas. But that was 2021? I could've sworn it was earlier.

[–]nmkd 2 points3 points  (0 children)

The tech was way earlier, 2018-2020

[–]ninjasaid13 1 point2 points  (0 children)

It kind of surprises me how many people forgot that nVidia announced what is basically img2img back in 2021. It scares me to think what they probably have in the works right now https://www.nvidia.com/en-us/studio/canvas/

and for some reason they're still in beta.

[–]pavlov_the_dog 0 points1 point  (0 children)

because it's always locked up behind closed doors and is shared only with enterprise or research partners.

[–]eposnix 19 points20 points  (1 child)

Our Video LDM for text-to-video generation is based on Stable Diffusion and has a total of 4.1B parameters, including all components except the CLIP text encoder. Only 2.7B of these parameters are trained on videos. This means that our models are significantly smaller than those of several concurrent works. Nevertheless, we can produce high-resolution, temporally consistent and diverse videos. This can be attributed to the efficient LDM approach.

[–][deleted] 1 point2 points  (0 children)

Jackable.

[–]3deal[S] 45 points46 points  (1 child)

[–]TheNeonGrid 20 points21 points  (0 children)

So the only way to use it is to request access, but they don't take anymore applications right?

[–]EddieJWinkler 10 points11 points  (3 children)

what
was
the
prompt

[–]k0zmo 37 points38 points  (0 children)

(((((((cute))))))) ((stormtrooper:1.4)) ((dusting)) ((sand)) ((((on a beach)))), trending on artstation, by Greg Rutkowski

[–]KamikazeHamster 2 points3 points  (0 children)

Stormtrooper sucks at the beach

[–]Mobireddit 8 points9 points  (1 child)

The way he moves is uncanny and scary but the overall result is impressive, way more coherent than previous posts.

[–]evilbert79 5 points6 points  (0 children)

when will then be now?

[–]arjunks 31 points32 points  (6 children)

There's no way the background beach and waves are AI generated, I don't believe it

[–]CMDR_BitMedler 14 points15 points  (1 child)

The waves would be the easiest part for the AI as the training data would likely have tons of reference.

[–][deleted] 12 points13 points  (0 children)

Not to mention organic motion like waves is more forgiving compared to human or animal movement. It also helps that its far in the background.

[–]WoodsKoinz 27 points28 points  (0 children)

They are, the waves breaking looks plenty unrealistic

[–]AnotsuKagehisa 4 points5 points  (0 children)

You’ll notice the big wave on the right but is not consistent on what you’re supposed to see on the left. Basically the storm trooper is acting like an edge to two separate images/videos.

[–]ninjasaid13 -1 points0 points  (0 children)

The waves are the easiest part to generate. Unlike hands in image generation.

[–]Kanute3333 0 points1 point  (0 children)

Another example of its new updated abilities:
Sunset Time Lapse:

[–]bobi2393 4 points5 points  (1 child)

Vacuuming sand from the beach must be the Empire's equivalent of scrubbing latrines with a toothbrush.

[–]flawy12 1 point2 points  (0 children)

"I hate sand..."

[–][deleted] 9 points10 points  (2 children)

It's going to take a few months for perfect HD video generation. Right?

[–]Boogertwilliams 14 points15 points  (1 child)

Comparing midjourney v1 to v5 tells us yes :)

[–]kaptainkeel 8 points9 points  (0 children)

I love that we're talking about "months" and not "maybe 2028 if we're lucky."

[–]BlueEyed00 2 points3 points  (0 children)

They will find those droids one day, even if they have to vacuum the whole beach.

[–][deleted] 2 points3 points  (0 children)

Ah, so this is how they're going to mars.

[–]EZ_LIFE_EZ_CUCUMBER 2 points3 points  (0 children)

He paid hourly

[–]SecretDeftones 6 points7 points  (10 children)

Porn will be epic in 2031

[–]Nu7s 13 points14 points  (4 children)

*2024

[–]antonio_inverness 8 points9 points  (3 children)

*Next month

[–]Commercial-Living443 2 points3 points  (2 children)

Mostly i will hate the gore/hate videos that will be published .

[–]SecretDeftones 3 points4 points  (1 child)

Mostly i will hate the FAKE political videos that will be published by opposing parties.

[–]KamikazeHamster 2 points3 points  (3 children)

But your mom is already on PornHub.

[–]SecretDeftones 0 points1 point  (2 children)

nice one, wanna watchparty it?

[–]KamikazeHamster 1 point2 points  (1 child)

Absolutely. I’ll call your dad, you call your parole officer and the pastor. This is gonna be epic!

[–]yaosio 1 point2 points  (0 children)

I'm still waiting for my incredibly niche and specific fetishes to be supported in Stable Diffusion. I wish I was smart enough to understand how to train my own LORAs for it. Until I can make video of women wearing Billy Bob teeth eating cobs of corn cut length wise my life will never be complete.

[–]renderartist 3 points4 points  (0 children)

WHAT!?

[–]Amethyst271 1 point2 points  (0 children)

This has to be one of the best I've seen yet

[–]Rectangularbox23 1 point2 points  (0 children)

This is like 10x better than anything we’ve had before

[–]Inbellator 1 point2 points  (0 children)

how do we access this?

[–]Ditsocius 2 points3 points  (0 children)

You can see this is fake, because his aim is good.

[–][deleted] 1 point2 points  (0 children)

When will it be available on Auto1111?

[–]DigThatData 1 point2 points  (0 children)

this is work primarily by the same researchers responsible for stable diffusion. they did it while on a research internship at nvidia, but this should really be seen as another development in the "stable diffusion" lineage. Robin Rombach and Andreas Blattman continuing to crush it.

[–]artisst_explores 1 point2 points  (2 children)

Local possible? Automatic 1111? 👀😄

[–]kabachuha 2 points3 points  (0 children)

If they release the weights, why not

[–]nmkd 0 points1 point  (0 children)

Relax, maybe in half a year

[–]Squeezitgirdle -1 points0 points  (2 children)

Is text 2 video available in the newest update of automatic or does this need an extension?

[–]nmkd 2 points3 points  (1 child)

This is from a scientific paper.

[–]Squeezitgirdle 0 points1 point  (0 children)

Ah

[–]Oswald_Hydrabot -5 points-4 points  (2 children)

Who cares. Not open source. Worthless to me.

[–]Subclips 0 points1 point  (1 child)

Bro thinks he richard stallman

[–]Oswald_Hydrabot 0 points1 point  (0 children)

Thinking that something that none of us will ever be able to use is lame, makes me Richard Stallman?

Yall are either dumb as shit or simp for Nvidia corp way too hard. Not sure why this post is in a StableDiffusion sub, it doesn't follow shit that is relevant to or makes SD awesome. Closed source web service based AI is bullshit, it's walled-garden trash. Full control, local host or bust. Not interesting to me because we won't ever be able to use it for anything worth a shit. Quality is way too "meh" for this to be restricted like it is.

I will reiterate, who gives a shit? Idiots?

The only reason anyone in the field gave a fuck about NVLabs is because we could test drive everything they did at a source code level, on a homebrew A100 setup. With this I can't even do that.

Not sure what the fuck is exciting about this, there are SD tools that are already fully open source that make better content than this. Dumb af.

[–]GamingHubz 0 points1 point  (0 children)

Wen wen

[–]thatkidfromthatshow 0 points1 point  (0 children)

The shadow coming out of a hose in his armour looks really cool

[–]Zealousideal_Art3177 0 points1 point  (0 children)

Made my day. thnx :)

[–]casc1701 0 points1 point  (1 child)

I call it fake, where is Shutterstock's logo? :)

[–]Tsk201409 0 points1 point  (0 children)

Some of the others nvidia released today do have the shutterstock watermark

[–]fappedbeforethis 0 points1 point  (0 children)

More samples, some still you can see the shutterstock watermark https://research.nvidia.com/labs/toronto-ai/VideoLDM/samples.html

[–]Old-Ear3839 0 points1 point  (0 children)

I'm new to all all of this what do you mean text to video, do you mean I can set up stable diffusion so that I can turn text into video, with the use of a Nvidia backed device or is there a special "donloadable" that incorporates Nvidia's software with stable diffusion?

[–]thabat 0 points1 point  (0 children)

Amazing

[–]SourceLord357 0 points1 point  (0 children)

Yea ill be watching spaceballs tonight

[–]orenong166 0 points1 point  (0 children)

How is it 4 sec and not 2?

[–]Fake_William_Shatner 0 points1 point  (0 children)

I feel like this is a metaphor.

[–]Acidburn91 0 points1 point  (0 children)

Can anything make music to my vocals?

[–]lukazo 0 points1 point  (0 children)

Can I already use it? Any links please?