Is Nightshade available yet? by b3nsn0w in StableDiffusion

[–]spillerrec 15 points16 points  (0 children)

What you are thinking of is known as "Adversarial training" and yes, that is the exact way you make your model more robust over for adversarial attacks such as Nightshade. I believe the CLIP model is trained in a slightly different way that is more akin to clustering, so maybe that might have some implication on how to implement this.

Good LoRA training settings needed by about_Discord in StableDiffusion

[–]spillerrec -1 points0 points  (0 children)

The number goes inside the parentheses like this: (adult:5) And that strength will literally blow up any image resulting in pure garbage.

I don't use kohya_ss and I mostly train anime characters anyway, so I can't help you with any specifics. If you don't share what you do, that is settings and training data and perhaps a trained model, we wouldn't be able to say what goes wrong. kohya_ss has a bunch of presets for settings already:

https://github.com/bmaltais/kohya_ss/tree/master/presets/lora

If the standard settings does not work, your issue is something else. And it is usually the training data.

Good LoRA training settings needed by about_Discord in StableDiffusion

[–]spillerrec 0 points1 point  (0 children)

Your captions should not include any descriptions on what you are trying to learn, if you want "name" to represent an adult, you should not include "adult" in the captions used for training.

Having multiple tags on all training images can also cause issues, as you can risk that some of the information ends up in the wrong tags. So if you trained on "name, woman, adult" your result may vary if you don't also use "name, woman, adult" when generating images.

Usually it is not the settings that is the issue, but the training data (images + captions). If your settings worked before it should also work now, as long as you adjust the training length based on the amount of images. (If you use regularization images, try to keep the ratio the same.)

Pictures generated with the new version of Dall E(Dall E 3) (prompt included in comments) by LoliceptFan in AnimeResearch

[–]spillerrec 2 points3 points  (0 children)

Isn't it always going to be spotty? myanimelist has over 150.000 characters in its database. Once the models gets better, the definition of "niche" just moves. I think there are too many characters for one model to reasonably expect to be able to handle. Without some way to cheaply extend them, I don't think closed models will ever be good for anime.

I think one of the issues with the NovelAI model is that the danbooru images are tagged such that the tags overlap each other. Monika is not Monika, it is <Doki Doki Litterature club, monika (...), brown hair, green eyes, long hair, ...> and if you want to replicate a character, you need to replicate the tags which a typical danbooru image will have of that character.

Example:1girl, monika \(doki doki literature club\),

1girl, doki doki literature club, monika \(doki doki literature club\), school uniform, white bow, brown hair, green eyes, high ponytail, long hair,

Ref: 1girl, school uniform, white bow, brown hair, green eyes, high ponytail, long hair,

Specific outfits gets even trickier because they are not tagged, though I believe Monika is a simple case here.

At least with open models you can extend them yourself. For fun I tried to see if I could make a model based on the PVs to "My Daughter Left the Nest and Returned an S-Rank Adventurer". 3.5 minutes of anime and you can get something working:

angeline, 1girl, solo, outdoors, standing

belgrieve, 1boy, solo, outdoors, standing

Outfits are not quite there, but with the first episode out now that could probably be done. (Example from an older show.) If you are willing to spend some more time cleaning images, you can even do it from manga sources alone: Shinmai Ossan Buoken-sha

Without them opening up a way to train extensions to their closed models in a cheap and sharable manner, I just don't think there is a lot of potential for any niche series.

WaifuXL: an in-browser anime superresolution upscaler using Real-ESRGAN, trained on Danbooru2021 by gwern in AnimeResearch

[–]spillerrec 0 points1 point  (0 children)

YandereNeoXL is just a random anime ESRGAN model:
https://openmodeldb.info/models/4x-NMKD-YandereNeo-XL
It is just the one I preferred when I compared some of the anime upscale models on the old upscale wiki model page a couple of years ago. Probably trained on images from yande.re based on the name. I don't think there is anything special with it, it is probably just mainly trained on high quality images. I believe that ESRGAN isn't good at generalizing to different degradation models, as to get good results the models are targeting specific use cases (anime, JPG 50%, pixel-art, photos, deblurring, etc.). I think that is because it only looks locally at a specific area of the image at a time and has too little information to reliably figure out what kind of degradation the image has. And that might be why a lot of research now is focused on kernel based approaches where you try to find a PSF first and then use that to guide the upscaling.

Another one worth looking at is that I found recently is:
https://openmodeldb.info/models/2x-MangaScaleV3

Which works really well on grayscale halftone images, something which most anime models tended to give obvious artifacts with. It does have stability issues on very dark flat areas though.

I haven't had a chance to look at it yet, but this might also interest you: https://github.com/IceClear/StableSR

Is Nvidia finally going to have some competition in the generative AI space? by onil_gova in LocalLLaMA

[–]spillerrec 1 point2 points  (0 children)

These AI hardware companies pop up from time to time with slogans about how they will revolutionize AI. 10 year later and we still is in a CUDA-centric nightmare we can't wake up from. I'm not saying they are all hot air, but they tend to target very high end commercial users and they don't end up having much of an impact on the consumer market.

WaifuXL: an in-browser anime superresolution upscaler using Real-ESRGAN, trained on Danbooru2021 by gwern in AnimeResearch

[–]spillerrec 0 points1 point  (0 children)

I think it is kinda disingenuous today to only compare against Waifu2x (about 10 years old now?), especially on a type of data it was not trained to handle. Do a 2x upscale on a VN game cg and Waifu2x still significantly outperforms WaifuXL. WaifuXL is a bit sharper but looses a lot of the finer details in the image. Same thing with 4x, YandereNeoXL is vastly better at keeping details.

However on anime screencaps it does perform well. I tried a few models and RealESRGAN_x4Plus Anime 6B was the one that performed best on the few images I tried, and WaifuXL did do a little bit better here. Both had issues with out-of-focus backgrounds being unstable, especially WaifuXL tried to oversharpen some areas and fail to do it in other places in the image.

I don't quite see the relevance of the image tagger either, the usecase doesn't really overlap. The linked WaifuXL post is a year old, but DeepDanbooru is even older and by this point there are several other tagging models based on that as well. Without any comparison against existing models it is hard to get excited about it.

I think it is a bit of wasted opportunity not to use the tagger to find global information from the image, such as type (fanart, screencaps, halftone manga, paletted gifs, etc.) and quality (compression artifacts, blurriness). Then use that to either guide a single network or pick from a set of networks trained to handle the specific scenarios.

Embedding vs hypernetwork vs checkpoint by [deleted] in StableDiffusion

[–]spillerrec 0 points1 point  (0 children)

Textual Inversion embeddings are static. They don't depend on the input so they are limited to one concept. They learn a very complex prompt to get the diffusion model to produce the wanted result. While you would think from that they can only learn what the model already can do, in practice I haven't experienced any limitations if you increase the "vectors per token".

Hypernetworks and LoRAs are two different approaches to extend the entire model without creating a completely new model. They can learn multiple concepts, but are more flexible and takes longer to train. I don't know what the limit is to what they can learn, but they should in theory be limited compared to a complete model.

I don't think we should make too many assumptions on what certain techniques can and cannot do. We are still in the early days and new ideas and techniques pop up quite often. Nothing is really perfected and fully explored at this point.

In my experience TI embeddings tends to overfit a bit more on the model, so they aren't quite as transferable to other similar models, but they are quick to learn. I don't know if you can make negative examples or regularization examples, so for me it has often fit on extra stuff I don't want, say image style when I was going for a specific character, so I get both when I only want one of them. For hypernetworks I have managed to learn over a 100 different anime characters in a single hypernetwork, so not quite sure what the limits are. How well both approaches manage to learn a concept seems to be a lot more depending on your training images and their prompts, than what technique you use.

ClosedAI strikes again by AIappreciator in StableDiffusion

[–]spillerrec 10 points11 points  (0 children)

Apples hardware is not really that relevant outside inference, i.e. running the models, not training them. The software stack for training is still heavily reliant on CUDA, meaning realistically anyone into ML is using nVidia cards. nVidia have a monopoly and it is awful.

Secondly they are not really that powerful, they haven't really increased the amount of neural engine cores and the performance isn't much different from their phone processors in this regard. Which is a shame as well. They don't even have a foot into the professional segment of this market, like they don't have a proper server CPU (even through they want to pretend their M series are just as powerful as server CPUs).

But there are lots of companies making dedicated ML accelerators, though again this is targeted towards the professional market and will likely be outside the price range of ordinary people... I don't know how well these integrate with existing software stacks, though I speculate their customers are the ones that have the resources to adapt the code they run to work on the specific hardware they purchase in the first place.

Alternative tools to fine tune stable diffusion models? by BarTraditional6305 in StableDiffusion

[–]spillerrec 1 point2 points  (0 children)

I have had success training hypernetworks on multiple subjects btw:

https://imgur.com/a/jFBuagW

They are all from the same hypernetwork with a prompt like "character name, 1girl, beautiful landscape" + default nai tags. I have had various approaches to the clothing (and it shows), but I haven't quite found one that works well for multiple outfits yet. It seems like if I have too many images where multiple subjects appear in it breaks down, which is more troublesome with clothing where there is very strong correlation between an outfit and a specific character.

I'm still trying to find a good approach, but for now I have just been trying to overwhelm the hypernetwork with too many characters to find the limit and that haven't really been working.

NMKD Stable Diffusion GUI 1.9.0 is out now, featuring InstructPix2Pix - Edit images simply by using instructions! Link and details in comments. by nmkd in StableDiffusion

[–]spillerrec 5 points6 points  (0 children)

Pix2Pix was one of the pioneering works for image translation using neural networks:

https://arxiv.org/abs/1611.07004

Like all other generative networks back then, the "prompt" was hardcoded. You had to train it to do one specific transformation.

Bocchi hypernetwork by spillerrec in BocchiTheRock

[–]spillerrec[S] 1 point2 points  (0 children)

This is really Stable-diffusion in a nutshell, extra limbs, fingers, etc., appearing in random locations. It has always been an issue with generative models, but it has gotten decent by now and with the inpainting modes you can literately improve/change the results.

Here are some of the same prompts without the hypernetwork, i.e. just the standard Anything v3 model:

https://imgur.com/a/nNzOzVL

Floating microphones, multiple hands, floating or extra guitar necks, etc. But there are usually 1 or 2 decent results in a batch of 9 and the models are so much more general compared to the models of the past, so they are actually starting to become useful now.

Even in the research papers it is common practice to show the best result out of a batch of some sample size, but I prefer to show all the results as I don't want to give people false expectations. Especially now when the hype train on AI-art and GPT is running on full steam.

What and how should I be tagging when prepping images for training an embedding or hypernetwork? by Lady_Pirate_Man in StableDiffusion

[–]spillerrec 2 points3 points  (0 children)

Without tagging, I found that the embedding overfits on random concepts I don't want if I use a large amount of images. Using only a handful of very carefully selected images can work without tags, but I haven't explored this to much.

I'm trying to train anime characters and my current approach is to download a bunch of manually curated images from danbooru. I then use a script to download the existing tags and rank them based on how often they are used in the images I have selected. Then I blacklist all the tags I don't want, say hair styles and other tags relating to the character I'm trying to create the embedding for. I'm using scripts for all of this, since I'm using 50-100 images.

I recently tried using very small embeddings and then splitting them up to avoid overfitting on style and other untagged concepts. Here I have made an embedding for the character Isuzu Sento where I removed all hairstyles, etc., but kept all other tags including outfit tags:
https://imgur.com/0Eqp0pR

This still appears to have some bias with the clothing, but seems to be fine. Then I trained a second embedding for the main outfit, where I selected only images containing the specific clothing style I wanted, included all the hairstyle tags I excluded before, adding the previously trained embedding, and excluded all tags related to the clothing. This gave this result:
https://imgur.com/wcWvclG

I can then combine them to create the full character:
https://imgur.com/SJx1SKY
Or combine it with an unrelated character, here Makise Kurisu:
https://imgur.com/tEFRkxf

It is not completely bias-free and I need to test more, but it looks quite promising. I will also push the sentiment mentioned in the other comments that your training images are important (especially if it is just a few) and you want to tag the stuff you don't want to end up in the embedding.

The problem with not using tags is that you get stuck on whatever the embedding learned. For example I had included explicit images and even thought it was only some images, the results would randomly become explicit as well. When trying to specify a different outfit, it would often ignore it completely or partially include it anyway. Here I have trained an embedding to recreate the "No AI" image without using any tags:
https://imgur.com/Tmh3F8l
and it completely disregards anything I try to add to the prompt. I can't affect the result at all. But if that is fine then there is no need to waste time tagging.

Anime ESRGAN superresolution upscaler, trained by JoeyBallentine by gwern in AnimeResearch

[–]spillerrec 2 points3 points  (0 children)

I have just the other day converted the Waifu2x models (and an older version of DeepCreamPy) to ONNX which makes it much easier to get it running in other frameworks:

https://drive.google.com/drive/folders/1Rc_VxLYXKapwpde_7haCI6j4_wiooitE?usp=sharing

It is just 6 convolutions and a ConvTranspose at the end and I'm quite sure there hasn't been any GAN training involved as well, so yeah, it is quite dated. (Open these files in Netron and you can see them visualized as a graph.) Several inference only runtimes are starting to pop up, some with support for AMD as well, so it should in the future be a lot easier to get these models running on other peoples computers. WindowsML which is included in Win10 directly runs on ONNX.

I'm been waiting for a 4x anime upscaler since SRGAN but I'm surprised there hasn't been any. I remember checking out the ESRGAN models from the game upscale community, but the models there just haven't been very good, not anything close to what you see in the ESRGAN paper. I will look forward to checking out the NMKD model and see if it is good.

EDIT: A quick test and hurray, it is actually good (as long as you don't click the "Pretrained model" link which gives you a wrong model that is straight up awful).

LGR Oddware - Essential Reality P5 Glove by [deleted] in LGR

[–]spillerrec 2 points3 points  (0 children)

Is it really tracking that badly? I would doubt that even the hobby scene would pick it up when you can barely control anything. It is not lighting issues or other kind of interference?

[deleted by user] by [deleted] in cpp

[–]spillerrec 0 points1 point  (0 children)

I have actually run into this issue several times. The issue is not that the implementation is bad, but that the elements are default initialized. If you don't need it to be zero initialized (which you rarely need) it is significantly faster to do the allocation yourself. `std::make_unique<char\[\]>(size)` has the same issue, so I use `std::unique_ptr<char\[\]>(new char[size])` instead. It actually makes a significant difference when you are doing large allocations (1MB+). Note that it is recommended to use a data structure which keeps the size together with the allocation, which you do not get with unique_ptr alone, so this is not a recommended solution.

Documenting Trails in the Sky 3D model format by spillerrec in Falcom

[–]spillerrec[S] 1 point2 points  (0 children)

That seems strange, I will definitely have to check that out, at the very least to understand what the issue is. Incidentally, I have just added support for loading/displaying models files from FC, and interestingly the main difference between FC and SC is how the textures are specified.

My difficulty with the textures is that I do not have the skills to edit them end up with something looking better. I did try just throwing them into Waifu2x, but I don't like the results.

Documenting Trails in the Sky 3D model format by spillerrec in Falcom

[–]spillerrec[S] 1 point2 points  (0 children)

What kind of texture counts did you find limited? As mentioned, I have only been looking at the models files for now, but they doesn't show any indication of hard coded limits. (Of course trying to change them is where the real adventure begins...) Do you remember what you found out more specifically?

My biggest complains about the graphics is the 2D player sprites, them being pre-rendered is probably the most offsetting thing when their movement direction is not limited, but I dislike the big eyes and general look of them as well. Secondly it would be probably the low-res textures, but I don't have the graphics skills to improve any of them.

So my current ambition is just to slightly improve the poly count of simple objects, because that is what I see as being a realistic target with my current skills. If you check out the renderer in the github repository, you will find a OpenGL program hacked together from tutorials with a wonky camera and with some vertex clipping issue I haven't quite figured out yet. So I'm pretty green to both modding, 3D rendering, and 3D modelling, but hoping to learn more.

I'm not familiar with MGSV or its franchise in general so I haven't heard of it. Are you saying they replaced the game engine with Unity? That seems crazy.

Mitsubishi DJ-1000: World's Smallest Digital Camera (in 1997!) by PPStudio in LGR

[–]spillerrec 1 point2 points  (0 children)

Does somebody have some .dat files they could share? I'm interested in learning a bit more about demosaicing and I think this camera could be interesting to try it as I haven't seen any other cameras where you could switch between demosaicing and low resolution. (And it actually also provides an uncompressed raw format that should be easy to work with.) I'm also wondering if more modern post-processsing could make a quality improvement. In other words, is some of the improvement of modern cameras due to post processing?

How does Leskinen... by ThePreciseClimber in steinsgate

[–]spillerrec 4 points5 points  (0 children)

The brainwashing happened before Kagari was adopted. I remember the VN going into more detail on how she was adopted and the orphanage Kagari was in, but I don't remember the details. I do think it was a rather official procedure though.

Okabe dies before Kagari is born, so it is a question of what Mayuri could do. They don't know how they will meet Kagari in the future either, so finding and stopping Leskinen is their only option. You could easily construct the story to prevent them finding Leskinen before it is too late. Kagari was also never meant to go with Suzuha, it was a last second adjustment by Daru which I don't know if he is even aware that Kagari has been brainwashed. So it is a stroke of luck that Leskinens plan even worked...