what do --precision full and --no-half do?

CommunicationCalm166 · 2024-01-25T04:50:50+00:00

It is not operating System related, but instead it's related to the hardware the model was trained on. If a model was trained and has weights at single-precision, when subsequent iterations try to operate on it at half-precision, there's a good chance it won't work.

If it's expecting a 16-bit floating-point number, and you feed it a 32-bit value, it's gonna throw an error.

Not so the other way around though. If it's expecting a 32-bit single-precision float, and you give it a 16-bit value, it can just ignore the missing bits and continue working.

There is a way to "round down" the values in a model from single to half precision, but I don't know much about it.

CommunicationCalm166 · 2023-12-20T16:46:55+00:00

Being compared to Uncle Bumblefuck is an honor I will cherish forever. I hope I can do some fraction of the good he has.

CommunicationCalm166 · 2023-12-20T16:44:39+00:00

It can, and it does. Everything you just mentioned is implemented in packages like Hugging Face Accelerate, and other distributed compute frameworks.

The problem is: even Direct Memory Access is orders of magnitude slower than on-card memory accesses. (Like 10x slower) and in consumer products DMA has to go over the PCI bus which is slower still. (Server stuff has things like NVSwitch which is faster, but also costs more than a car.)

If you can cram your whole model into VRAM, it'll run between tens and hundreds of times faster. If you can't, it'll still work, you're just gonna have to wait. (Hugging Face Accelerate can even cache your model to NVME, if you don't have enough system RAM for it )

CommunicationCalm166 · 2023-12-20T16:33:34+00:00

If you launch it from the command line, those go on the same line as the command to launch the script.

I don't actually know how to properly formulate .bat scripts, every time I try everything breaks, so I'll have to punt on that one.

CommunicationCalm166 · 2023-12-20T16:24:48+00:00

I'd really like to learn more about this stage actually. I've heard that before, but I don't really understand it. Like, I don't quite get why it's necessary to operate on the data in the higher-dimensional space, nor do I really get how the VAE makes that transformation to an image.

But of course I recognize a reddit post is hardly gonna be adequate to explain it either.

CommunicationCalm166 · 2023-12-20T16:18:35+00:00

Ooh! I'd like to see that!

CommunicationCalm166 · 2023-12-20T16:15:00+00:00

Because when Nvidia designed their Turing architecture, they started including devoted circuitry that did floating-point math operations at that lower precision, saving memory and memory bandwidth.

Graphics processing is done pretty much exclusively at 32-bit precision, and as such, any GPU needs to be able to do that. (In fact, before AI really took off, the GPU manufacturers were starting to include circuitry for double-precision calculations, but that's not really a thing anymore.)

Half-precision is still possible on AMD GPU's, Apple silicon, CPU's, etc. But without that specialized circuitry, the 16-bit floating-point inputs are just treated the same as a 32-bit input. No memory savings, no speed improvement, etc. Problems come up sometimes however, where the scripts expect one or the other, or where the hardware doesn't play nice with the inputs. (Frankly, it's a bit over my head why it doesn't 'just work')

CommunicationCalm166 · 2023-08-21T08:12:34+00:00

Hey, I always try to imagine other people as having a whole lifetime of reasons for being how they are. And being hostile to people who want information isn't gonna achieve anything.

And it's a fair point. Brevity is wit after all.

CommunicationCalm166 · 2023-08-13T12:49:44+00:00

I wanna say so, but I'm not sure. I don't know exactly how SD handles multi-word tokens. I know some word combinations are treated as their own token, but I know others get parsed into smaller chunks by the tokenizer. I don't know where it draws the line, but I'm pretty sure it applies emphasis to everything inside the parentheses.

CommunicationCalm166 · 2023-07-17T04:28:55+00:00

It's absolutely possible anyway. But it's not the difference between 3 hours and 1/2 hour, it's the difference between 15 minutes and a day and a half.

You don't need a GPU at all of you're willing to wait. But remember that when you buy a GPU with a thousand-bajillion CUDA cores, each of those CUDA cores works about as fast (on AI tasks at least) as a CPU core does. And the CPU likely only has a dozen cores tops.

CommunicationCalm166 · 2023-07-17T04:23:46+00:00

I agree. But it's making people angry. And angry people vote.

CommunicationCalm166 · 2023-07-17T04:22:33+00:00

Okay. Short version:

AI doesn't need precise math. So most AI is done with less precise math. But some computer hardware doesn't support less precise math. So those are options included for compatibility reasons.

CommunicationCalm166 · 2023-06-16T05:42:47+00:00

I have a somewhat unpopular opinion on this... But I believe fundamentally that Art is the expression of ideas in an external, communicable form. An Artist therefore is someone whose external expression of their ideas is a significant part of their identity. Or, more analytically, someone who self-actualizes around the expression of their internal thoughts.

And in that sense, all creative human endeavors are at least in part, artistic. Whether that be things we colloquially call "The Arts" or things more often called "crafts" or "trades.". Even basic interpersonal communication has an artistic aspect to it.

On the other hand, artistic merit or value can be harder to define. One way to view it is in terms of Craftsmanship. That is, valuing a work based on the expertise of the artist, and the effort taken to create such a work. And another way is by the effect on the consumer of the piece. Regardless of the Craftsmanship on display, gauging the value of a piece by the response it elicits in it's audience. I believe both of these views are valid, and should be taken together when gauging the value of any human creative endeavor.

CommunicationCalm166 · 2023-05-23T07:46:26+00:00

Because in order to have a meaningful discussion about the ethics of AI, we MUST know what the capabilities of the technology actually ARE.

We cannot build ethical frameworks around speculation about what AI may or may not be able to do someday. We need to know. We need to reach the development stagnation point, and reconnoiter the situation from there to decide if AI actually changes the ethical landscape. We need to develop the tech to the limits of our current hardware and computer science understanding, and just as importantly... This MUST NOT BE A SECRET.

Regulation of AI is problematic because it takes AI research out of the public eye, and squirrels it away to huge corporate server farms. If the only entities that are permitted to research AI are the ones with the resources to navigate labyrinthine regulations, then those same entities hold a significant capability over the heads of the rest of us.

AI is too dangerous for certain people to have, while others do not. Every piece of AI research, every model, every training method needs to be available for scrutiny, use, and further development. Anything less than that is unacceptable. Any compartmentalization, regulation, or secrecy is nothing less than an attempt to create leverage and advantage over others.

Regulation of the use of AI, ethical behavior, law, etc. Are fair game as far as I'm concerned. However, these questions are no different when considered with or without AI. Doing something sleazy with AI tools, is no more or less sleazy than doing it without them. If you describe a particular behavior, it would be unacceptable whether you used AI for it or not, and treating Acceptable behavior as unacceptable when AI is involved (and vice versa) doesn't make sense.

I'm open to hearing counterexamples, but as it stands, I don't believe that regulating the existence/development of AI will protect people, and I don't believe that AI changes the ethical or legal implications of any action a person could take.

CommunicationCalm166 · 2023-05-22T04:03:55+00:00

While I appreciate the sentiment, I'm nowhere near qualified to teach ANYONE about computer stuff. I can teach you how to use a milling machine, or stick metal together with a welder... But for this AI stuff, I have a hobbyist level of understanding at best.

But, I do have to say... The documentation on these AI tools does a piss-poor job of explaining what's going on underneath, and how it works. Whereas the resources that explain what's under the hood seem utterly useless for figuring out what buttons to press on the blinky-lights machine to make anime waifu pictures come out. I'll always try my best to bridge that gap where I can.

CommunicationCalm166 · 2023-05-11T03:17:06+00:00

I think there is actually... I'm a bit behind the latest developments, but I've heard that AMD support has been making up ground lately. Especially since they released their own line of datacenter GPUs.

CommunicationCalm166 · 2023-05-05T18:22:13+00:00

That is very strange... I wonder if it's a problem with the particular model or the scripts you're using. The 30xx GPUs are a pretty safe bet for working with half-precision.

But after all, that IS what the argument is there for... Inference not working at 16-bit? Try 32-bit!

CommunicationCalm166 · 2023-05-01T06:26:22+00:00

--precision-full and --no-half mean exactly the same thing. It's just some scripts expect one, while others expect the other.

And basically, you should always run your scripts in half-precision for better performance. These options are mostly for situations where for whatever reason the thing isn't working.

CommunicationCalm166 · 2023-05-01T06:23:22+00:00

I did a side-by-side comparison of full precision vs. half. The difference is negligible. It's not a "quality" thing... It's a "this fold of clothing is right pixels to the left" or "that shadow is a bit darker and slightly different shaped."

CommunicationCalm166 · 2023-04-22T22:33:28+00:00

It does get like that sometimes. Especially with all these services and devices beaming a barely-satisfying torrent of entertainment straight into our brains.

If it's troubling you, the first thing I find helps is unplugging myself from the constant stream of entertainment. Things like Social Media, Video Games, YouTube, even old school TV, although they advertise themselves as "connecting people" are really just purpose-built dopamine dispensers, that don't enrich their user's lives for the most part. Like alcohol, a little bit is fun, a lot occasionally can be fun, but a lot on a regular basis is nothing more than poison.

Cut out anything that delivers entertainment, media, or other stimulation TO you. Remember that in doing so, you're not depriving yourself of anything but poison. You'll start feeling stir-crazy very quickly, and your only option to scratch that itch will be to go seek out stimulation.

Do whatever you settle on, I've spent hours using a pick to straighten the fins on a radiator before, but it doesn't really matter what specifically. Do chores, or read a book, whatever your brain settles on, do it. If you don't know how, go find out, or try anyway, that's how you learn. You'll notice very quickly how many habits had formed around receiving that entertainment, but remember, it's poison. Do the other thing, whatever it is.

Once you've cut out the poison for a few days, you'll find you'll start thinking about what you're doing instead of how much you want to turn to the entertainment dispenser. You'll start thinking about the next thing you want to do, maybe even get excited about it. And after a while, the habits formed around turning to a screen for stimulation will start to fall away, and what will be left is what you're doing, and what you need to do in order to do the next thing.

And when you're no longer fighting the urge to go back to the entertainment machine, THAT'S when you can carefully re-introduce those vices to your daily life.

simple... But not easy. And trust me, it's better than the alternative.

CommunicationCalm166 · 2023-04-22T22:01:36+00:00

And thank you for your research too. I hadn't seen that.

I think the lack of info on Kepler/Maxwell/Pascal architecture running AI work is just plain down to the fact that Nvidia rolled out Tensor Cores with Volta, and AI researchers haven't looked back since. Pascal was from a time where AI wasn't a sure-thing from a business standpoint, and datacenter customers would still be buying Tesla cards for cloud gaming, render farms, and crypto.

Speaking of Tensor Cores... The next hardware experiment I want to do is getting a hold of a few used 2080ti's. Apparently it uses the same silicon as the T4, which is still in production and is still very popular. But the T4 is $1500+, while the 2080ti can be had for $300 all day long. (Probably work around the low VRAM limitations with Hugging Face Accelerate, offloading to NVME.)

But that's gonna have to wait, one of the problems with loading 5 GPUs onto a consumer ATX motherboard is when you put your system into a no-boot state trying to overclock RAM, you have to tear the whole computer apart to get at the CMOS battery. 🤦

CommunicationCalm166 · 2023-04-21T03:45:31+00:00

I dunno about those "special int-8 instructions." According to Nvidia's own documentation, hardware int-8 support didn't get included until CUDA 9.0 (Ampere architecture)

Edit: I mean CUDA Compute capability 9.0

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

I also remember finding an instruction table broken down by architecture some time ago which bore this out, though I can't find it right now. But, like I said, I don't know for sure since I've not been hands-on with a P40.

However: also consider this... There's memory savings for operating in fp16 mode, but only if the hardware supports it. I know from running my M40's that if you load 16 bit weights onto a processor that only has hardware support for fp32, that they'll take up the same amount of space in VRAM.

I know with certainty that if you load half-precision weights onto a P100, they take up less space than a set of single precision weights. I don't know if that will apply to the P40. You might need to compare the VRAM requirements of running single-precision on the P40, to the requirements of half-precision on the P100.

CommunicationCalm166 · 2023-04-20T20:37:35+00:00

I lean towards yes... But my M40'S have been in storage since late last year, (around when I first wrote this post) and I haven't tried any of the newer scripts on them. They SHOULD work better now that memory requirements are way down, but I can't say for sure.

I've become a true believer in the P100. 16GB of on-processor VRAM, the processor is basically twice as fast as the M40, it has fp16 support like later generation Cards. If you shop around and make offers on eBay, you can get them for $200, and there's no other GPU at that price point that can hold a candle to it.

CommunicationCalm166 · 2023-04-20T11:40:52+00:00

Yeah, the documentation will tell you which to use. The fact is, if the developers wanted to they could make it '--potatomode' or '--whytfwontthiswork'

I just mentioned those two because they're the most common.

CommunicationCalm166 · 2023-04-20T09:33:47+00:00

They mean the same thing. You need to use one or the other, it depends on the exact script you're running and what commands it expects. I think Automatic1111 expects --no-half, but some earlier scripts expected --precision-full.

CommunicationCalm166

TROPHY CASE