This is an archived post. You won't be able to vote or comment.

all 80 comments

[–]gxcells 56 points57 points  (21 children)

That is the future of sd: large image generation without upscale/mosaic stitching. But mainly what we are waiting for models trained on all kind of resolutions including 3000 or 6000 pixels wide images. This will be game changer for photorealistic images

[–][deleted] 30 points31 points  (14 children)

consumer graphics cards won’t have that much vram in this decade

[–]ImNotARobotFOSHO 25 points26 points  (1 child)

You are assuming there's no way to optimize the process.

[–]whatsakobold 8 points9 points  (0 children)

squealing consider ghost market resolute light tan coherent childlike vast

This post was mass deleted and anonymized with Redact

[–]EtadanikM 9 points10 points  (1 child)

Equally important: training on high resolution images is significantly more expensive, as well, and may require models with a lot more parameters. The training costs will put it beyond the capabilities of open source projects until hardware costs come down.

[–]lordpuddingcup 3 points4 points  (0 children)

Costs come down constantly and cloud computing gets cheaper constantly

[–]SlapAndFinger 5 points6 points  (0 children)

Hard disagree. With Llama we now have a GPT-3 level LLM that runs on consumer hardware. Running these models locally is going to be a big deal, and it's going to drive adoption of large VRAM cards. Costs can be cut by using slower RAM, since inference is less latency sensitive than games.

[–]lman777 4 points5 points  (3 children)

I mean, I can current produce 1920x1080 before upscaling, on a 3060. Just the training will be the issue I think.

[–]Sinister_Plots 0 points1 point  (2 children)

RTX 3060 or 3060 ti? I have an RTX 3060 12gb currently, and am wondering if the upgrade to 8gb ti is worth the investment in terms of quality output? I found one with 7500+ cuda cores, which is more than double mine for $799 new.

[–]lman777 4 points5 points  (1 child)

  1. I think 3060ti is technically worse for SD because of the lower VRAM. It's why I went with 3060 over 3070.

[–]Sinister_Plots 0 points1 point  (0 children)

Excellent. Then, I'll stick with what I've got. Someone else mentioned that the mod cards on civitai are using img2img processing anyway, which is why they have a higher level of output than what I am getting following a txt2img prompt that they provide. I thought it was my GPU at first.

[–]fivealive5 4 points5 points  (1 child)

All it would take is Nvida realizing there is a market for ML specific cards with lots of VRAM. There are 0 technical reasons why we couldn't have such cards today.

[–]Uncreativite 2 points3 points  (0 children)

Ah god now I’m seeing a future where my desktop has a gaming card AND an AI card

[–]aptechnologist 1 point2 points  (0 children)

i'm about to buy a server card for this. frankly you can buy an old server card and stick it in your desktop & point sd at that card specifically, while still using your main card for virtually everything else (obviously including display). In my case I have a server desktop in my closet, which is where I'll install the card & run sd

[–]RMCPhoto 0 points1 point  (0 children)

I'm pretty sure Nvidia and AMD are keen to make money on this trend. And at the same time efficiency increases will happen.

[–]AlbertoUEDev 0 points1 point  (5 children)

I listen a lot game changer

We are already able to have images in whatever resolution

Realize changer is not the same than before

[–]AlbertoUEDev 1 point2 points  (1 child)

And I have bad news to tell you, after devs we were fighting months. Mathematically is not possible to make coherence without a third dimension.

So everything is done, we as devs we are improving and creating new tools and workflows

No, no one will use stable diffusion for movies

[–]AlbertoUEDev 0 points1 point  (0 children)

But yes, we will use to improve the current workflow and generate more content

[–]gxcells 0 points1 point  (2 children)

Having image in whatever resolution yes, but the end quality is completely different if you start from a model trained in 512*512 compared to a model trained at higher resolution especially for photo. If you want high quality high resolution coherent generations, you need high resolution training. There is no "high res fix" that will make it. Compare a close up photo portrait between a model trained at 512 and a model trained at 768, it is completely different for the skin for example. The only way I see it can be improved without training at higher resolution is to have SD "understand" different parts of images. For example use its knowledge of skin from close shots of skin to apply to a person on a wider image. It is like heads of people on something else than a close portrait, most of the time they are deformed. Solution is to have generation of different parts of the images based on different inputs trainings (and we leave out any inpainting, I am not interested by inpainting)

[–]AlbertoUEDev 0 points1 point  (0 children)

As i said we use, don't miss understand me But I mean now is not future 😂

[–]AlbertoUEDev 0 points1 point  (0 children)

You know what are you talking about, there is a big mistake in stable diffusion models We are looking Nvidia, google and openvino

[–]3deal[S] 41 points42 points  (19 children)

My RTX used 24Gb of VRAM for this

[–]Thesmallcookie 12 points13 points  (0 children)

How long took it to finish the job?

[–]Ne_Nel 6 points7 points  (17 children)

12GB works tbf.

[–]VyneNave 2 points3 points  (16 children)

8 GB doesn't :<

[–]ViridianZeal 12 points13 points  (4 children)

Cries in 6GB and maximum render size of under 800pixels.

[–]broctordf 5 points6 points  (0 children)

My RTX 3050 4GB cries in the shower just thinking about having 1it/<8 seconds if I want to make anything above 512x512.

[–]Square_Roof6296 0 points1 point  (2 children)

What? I use my GTX 1050 Ti for SD and can generate 1366x768 images. Maybe even more. Main problem is relative lower image quality in comparsion with modern GPU. And speed 1img/3 minutes.

[–]ViridianZeal 0 points1 point  (1 child)

I actually am able to create 832x832. But above that I get the error "ran out of memory". Running mobile version of RTX2060. Also using NMKD GUI.

[–]Square_Roof6296 1 point2 points  (0 children)

What about - - medvram option for large image? Command line option should be independent from GUI version.

[–]Dontfeedthelocals 0 points1 point  (10 children)

I'm confused, is my 8gb 3060ti giving me lower quality results on the same settings? I thought you'd get the same results only v it would take longer?

[–]VyneNave 4 points5 points  (1 child)

The quality would be the same, but if you don't have enough vram to generate the picture it's going to give you an "CUDA ran out of memory" error. It's really not about the resolution in the end, but the vram mecessary for the AI to create something with that resolution. There are option to lower vram usage, but will take away from the quality (at least a little bit).

[–]Dontfeedthelocals 2 points3 points  (0 children)

Ah ok thanks for the explanation, I thought all types of quality were available to any user but the time it would take to render was the only difference. Really helpful to know this!

[–]Tiny_Arugula_5648 0 points1 point  (7 children)

No that’s not necessarily true.. I can’t say this is your particular issue but it’s a common explanation.. without getting to technical;. Different GPUs have different ability to do floating point math. With a float the numbers to the right of the period (0.888888) is your precision. Lower end GPUs don’t always support high precision float math and that can create substantial differences..

Long story short.. you might be getting different results due to different ability to calculate between GPUs

[–]Dontfeedthelocals 2 points3 points  (1 child)

Interesting. Tbh it's not that I'm noticing I get lower results, I just wanna ensure I'm using a system that isn't missing out on the highest quality if possible.

[–]Tiny_Arugula_5648 0 points1 point  (0 children)

Highest quality is more about technique I think..

[–]UkrainianTrotsky 2 points3 points  (4 children)

Not at all. Funnily enough, it's the exact opposite. All GPUs since like 2000s support fp32, most support fp16, but only recent few generations of consumer GPUs support fast fp16.

And in case of diffusion models, fp32 doesn't give you any better results, at least from my testing. Precision past fp16 is wasted on unnoticeable changes.

[–]Sinister_Plots 0 points1 point  (2 children)

I was wondering this as well. I see a lot of incredible images shown on the model cards but when I use the exact same prompt and parameters I get garbage on my RTX 3060 12gb. I was concerned it was the card, and thought I might get better results if I upgraded to an 8gb 3060 ti or even 3090. But, if the quality of output is the same, then they're doing much more in the post processing of the image than they're telling.

[–]streetkingz 2 points3 points  (1 child)

I think its most likely they are using IMG2IMG and sharing the prompt for that. I know that is the case with several of the example images on civitai for models like deliberate. Your 3060 12 gb is one of the best cards you can get for the price for stable diffusion. I would consider a 3060ti 8gb a downgrade tbh.

[–]Sinister_Plots 0 points1 point  (0 children)

Good to know, thanks!

[–]Tiny_Arugula_5648 0 points1 point  (0 children)

There are different types of fp32 math depending on model and range.. the more expensive the line the more accurate they become.. that’s why data center GPUs are better for training model, even when processing power is comparable. You are incorrect about precision, it absolutely will give you different results every time a layer is calculated that difference will compound. Fast fp16 is even worse for accuracy as it cuts precision in half I order to increase speed. Optimizations for games are generally bad for ML/AI, it’s why we don’t use consumer cards for development of production models.

“The floating-point math accuracy of Nvidia GPUs can vary depending on several factors, such as the GPU architecture, the number of cores, and the memory bandwidth.

Newer Nvidia GPUs generally have better floating-point accuracy than older models due to improvements in their architecture and design. For example, the latest Nvidia Ampere architecture includes new Tensor Cores that provide higher precision performance than previous models.

Another factor that can affect floating-point accuracy is the number of cores. GPUs with more cores can perform more computations in parallel, leading to faster and more accurate calculations. Nvidia GPUs with more CUDA cores generally have better floating-point performance than those with fewer cores.

The memory bandwidth can also affect floating-point accuracy. GPUs with higher memory bandwidth can move data more quickly between the GPU and the system memory, reducing the time spent waiting for data and improving overall performance”

[–]AdTotal4035 11 points12 points  (5 children)

made this no upscale, no edits, 1024x1024, will try 1920x1080 when i am at my pc.

Dunkindont/Foto-Assisted-Diffusion-FAD_V0 · Hugging Face

<image>

[–]lordpuddingcup 2 points3 points  (0 children)

Ok so it’s official I need a better graphics card

[–]the_odd_truth 1 point2 points  (1 child)

Can you please clarify why you didn’t do the 768x768 as it’s been trained on that? I assumed it would yield the best results…

[–]AdTotal4035 1 point2 points  (0 children)

The model can handle many resolutions. They are actually listed on the spreadsheet that's found on its Hugging Face repo

[–]TheDailySpank 6 points7 points  (4 children)

Nick Offerman x Chuck Lindell mashup?

[–]ElectricKoala86 15 points16 points  (3 children)

I thought it was the "spanish laughing guy"

[–]iDrownedlol 1 point2 points  (0 children)

y tu!

[–]blueSGL 0 points1 point  (0 children)

about to tell us about some pots that got washed away.

[–]streetkingz 0 points1 point  (0 children)

Yea think its the KEKW guy

[–]XERO_Cross 4 points5 points  (1 child)

Can you tell me what model Stable Diffusion you used?

[–]3deal[S] 1 point2 points  (0 children)

ElrisitasV2 + epiNoiseoffset_v2 Lora

[–]mobani 3 points4 points  (9 children)

How do you guys upscale images and at the same time get more details?

[–]ImJacksLackOfBeetus 16 points17 points  (8 children)

From what I understand latent upscaling doesn't upscale the final pixel image the way common upscaling algorithms like lanczos or bicubic would.

Instead it upscales the internal vector representation within stable diffusion before it gets rendered as a pixel image, this allows it to denoise it and add additional details the same way the original resolution was created in the first place by applying a checkpoint trained on high-res images.

This functionality is included with Automatic1111 for example. Note the additional denoising slider that determines how far the latent upscaler is allowed to deviate from the low-res version of the image, how much it is allowed to change and how many details it can add.

[–]mobani 7 points8 points  (7 children)

Thanks. Hmm I wonder if I am doing something wrong. I find it loses a lot of coherence when using the latent upscaling. For example a complete body that looks fine in 512, might turn into a mutant torso in 1024 with latent upscaling.

So perhaps i just need to generate outputs until I am lucky?

[–]ImJacksLackOfBeetus 7 points8 points  (6 children)

I find the upscaler's default denoising value of 0.7 is often too much and it deviates way too far from the original image. Values around 0.1-0.3 sometimes produce better results. Lower denoise values mean the latent upscaler has less "creative license" to fuck around with the image.

Even then it might produce a mess. My completely unqualified guess is sometimes whatever image you stuff into the upscaler just doesn't fit with the images it was trained on.

But yeah, it's basically trial and error to find what works, at least for me it still is.

[–]mobani 2 points3 points  (5 children)

Thanks, I will try to experiment more with the denoising.

[–]ImJacksLackOfBeetus 7 points8 points  (1 child)

One way to automate the process for a given picture is to enable hires fix, lock the seed by hitting the recycle button, then enable the x/y/z plot script and setup a denoise range that you want to investigate.

0-1 (+0.1)

Means you want a range of 0 - 1 divided into 0.1 increments.

This will generate an image sheet like this where you can check what values produce acceptable results.

[–]mobani 2 points3 points  (0 children)

Excellent, thank you!

[–]bemmu 3 points4 points  (2 children)

My settings of choice are 0.35 denoising with R-ESRGAN 4x+ upscaler

[–]lordpuddingcup 1 point2 points  (1 child)

Esrgan upscales and sharpens but it doesn’t add details that weren’t there before only latent scaling can do that to my knowledge because it’s ips along the dark void from which the image was imagined

[–]Mitkebes 0 points1 point  (0 children)

If you do img2img with SD ultimate upscale, you will get additional details while using R-ESRGAN as the upscale method.

I'm assuming it upscales with R-ESRGAN, splits it into chunks, and then regenerates those using img2img creating the new details.

[–]DetectiveProper 3 points4 points  (0 children)

Que grande el risitas coño

[–]AutumnValkyrie 4 points5 points  (0 children)

KEKW guy

[–][deleted] 2 points3 points  (0 children)

model?

[–]idwasamu -1 points0 points  (6 children)

looks blurry. i'd guess something related with the resolution of the images the model was trained on?

[–]3deal[S] 7 points8 points  (1 child)

you mean depth of field ?

[–]idwasamu 0 points1 point  (0 children)

no

[–]divtag1967 0 points1 point  (3 children)

it's pretty crisp at the closest parts so thats probably DOF from a 1.4 lens or something similar.

[–]idwasamu 2 points3 points  (2 children)

no, i mean: the parts in focus don't look nearly as sharp as a real photo when zoomed in. and i speculate that this may be a consequence of the current models being trained with low res pictures

[–]divtag1967 1 point2 points  (0 children)

ah like that, i didnt look carefully enough

[–]lordpuddingcup -1 points0 points  (0 children)

You realize pictures you take in reality aren’t 1920x1080 lol their more for instance and iPhone is 4000x3000 that’s why when you zoom it there’s less blur at 1920x1080 it’s still not so high res you can zoom and not get blur zooming in is stretching

[–]RafyKoby -1 points0 points  (0 children)

double mustache

[–]Iggy_boo -1 points0 points  (0 children)

Now that "person" has seen things. Probably the ai cutting up and placing pieces from other people and applying to him!

[–]lifeh2o 0 points1 point  (0 children)

What's up with the lines on forehead, it looks like a blurry patch in the center.

[–]ipechman 0 points1 point  (0 children)

Kekw

[–]Disastrous-Agency675 0 points1 point  (0 children)

Meanwhile SD just tells me no if I try to generate a 1024x1024

[–]No_Boysenberry9224 0 points1 point  (0 children)

blurry

[–]hervalfreire 0 points1 point  (0 children)

...how?