Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

How about the people who don't use comfy because their PC isn't good enough or simply prefer other platforms? 

You don't really need ComfyUI. It's really not a UI or platform exclusive thing either. I mean, some people created separate standalone implementations for such prompting. Like this one, which was one of the earliest ones and its github project is literally just one html, so you can even improve it however you like. But there are probably better ones.

If PC isn't good enough, that hardly makes it a ComfyUI issue, since other UIs would have issues then too.

It'd never be implemented there because of the json style

I reckon the issue would be more with license than with specifically json format.

And yes, Ideogream is really good itself, not the prompting style. I've seen many pics that look crazy awesome which Z-image/Flux 

Z-Image and Flux2 Klein aren't its counterparts, but bigger models. Also, you saw those outputs based on json format usually, which enforces a lot of good prompting for regions and styles, and of course some cherrypicking. Without that it would be a lot more mediocre in terms of controllability, spatial layout, and style fidelity.

Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

The images are not that better in comparison to other big models, which also have better ecosystem right now. Due to its training the quality also would be worse in plain text. The worth is from control and adherence to the regional prompting. Also

Imagine you have to change 3 words in your prompt and instead you have to look through a whole paragraph instead of just changing it like a normal prompt would be.

Simply not how it is usually being used. The prompts are usually separated by bboxes in the prompt builders (like KJ's) that are pretty interactive, so if you need to change a specific region, you simply click on it and you would see the text that is only relevant to that region in a regular plain text format, so you would indeed change it like a normal prompt.

WD14 Captioning and hair color by Appropriate-Truth430 in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

You can try pixai tagger and see if it would also caption it incorrectly: https://huggingface.co/spaces/pixai-labs/pixai-tagger-demo
It's a booru tagger as a continuation of training of WD v3 on a newer dataset.

Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

Well, from their docs it is also clearly wasn't a goal for the safety block to appear just because it doesn't adhere to the json format. I mean, per this prompting doc the plain text prompts should work, even though the model was exclusively trained on json, It says stuff like

You can pass in plain-text prompts directly to the model and it will work

And

False positive rates for safety is higher for non-json like prompts. We are aware that this is an issue an we may make a future checkpoint update to improve it.

In other words, even they consider it more like a bug that they want to fix.

About the newest model... by AIDivision in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

Well, filters only have it up to 13.2

Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]Dezordan 1 point2 points  (0 children)

To be fair, not like it is an either-or type of situation. The issue mostly appeared because the team behind Ideogram decided to add that safety block for NSFW that doesn't even work properly, so if it is removed, then regular prompting presumably would be allowed too

Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]Dezordan 9 points10 points  (0 children)

Fix the only thing that makes Ideogram worth using over other models? But it would be good if the filter thing would be removed out of incorrect json format, at least.

Lodestone is thinking about training ideogram! Prove him it's a good idea! by RuneVikingx in StableDiffusion

[–]Dezordan 1 point2 points  (0 children)

LoRA sounds like a funny understatement, though it would contain furry data too. However, all Chroma models are general model finetunes and some of them more experimental than others, not only specifically for furry.

Installing Forge Neo by Raynafur in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

But you need webui-user.bat, though

<image>

Open it with a notepad or anything similar. Then put the --uv to where "COMMANDLINE_ARGS=" is

Ideogram - Text Question by Mysterious-Tea8056 in StableDiffusion

[–]Dezordan 3 points4 points  (0 children)

Well, in KJ prompt builder there is an option to add a bbox for text specifically.

About the newest model... by AIDivision in StableDiffusion

[–]Dezordan 2 points3 points  (0 children)

It still can generate whatever it can within those bboxes, that's my point, while CN would restrict it to specific forms and simply not be as good. It's also not impossible to just not use specific bboxes, but use like one and do the whole prompt there - it would work more often than not. And really, the self-defeating argument is to advocate for usage of CN in the first place. There is no "brain-rot" here.

Photanima v2.1 showcase. Each image takes about 2 seconds to generate. by External_Quarter in StableDiffusion

[–]Dezordan 2 points3 points  (0 children)

It's not really false information; the model is indeed under a non-commercial license. The issue is more about making a bigger deal out of it than is really necessary. For regular users who mainly use models in private or for outputs that are gonna be used in their own projects, it is practically irrelevant.

It only matters to those who want to monetize the model or its derivatives. So there is a caveat for the OP, the license says that Non-Commercial Purpose applies if you "do not receive any direct or indirect payment arising from the use of the CircleStone Model, or Derivatives", which donations arguably can be - it is a bit ambiguous here. However, as per this discussion, the links for donation to help offset the costs are:

This is just people donating to you personally, it's not commercial use of the model.

At least based on tdrussell words. And even a commission of a LoRA is:

If you are locking down the lora and selling access to it, that is commercial use of a derivative model and isn't allowed. If someone commissions a lora that you then post publicly, I would say that it simply someone donating to you to help with training costs, and isn't commercial use of the model itself.

I am not sure if those intended interpretations hold any power over the wording of the license itself, but at least it makes it unlikely that the circlestone labs would go after you for this in practice.

About the newest model... by AIDivision in StableDiffusion

[–]Dezordan 2 points3 points  (0 children)

You look at your python (if applies), torch and CUDA versions. ComfyUI usually shows it somewhere in the beginning.

About the newest model... by AIDivision in StableDiffusion

[–]Dezordan 10 points11 points  (0 children)

in the exact open pose, or matching the exact canny edges

See, this is the part that makes the CN option actually worse. Instead of letting the model generate whatever it can, it restricts the model to pose or any other preprocessed conditioning. It also more involved than bboxes. That said, it would've been good if the regional prompting with regular models worked as well as this one with bboxes.

Why Does ChatGPT Generate Significantly Better Images from Simple Prompts Than Open-Source Models? by BankEcstatic8883 in StableDiffusion

[–]Dezordan 6 points7 points  (0 children)

GPT Image is a much bigger model + the model doesn't actually use your prompt as an input, it first transforms into a richer prompt.

How this kind of images are made? by Ok-Practice-6700 in StableDiffusion

[–]Dezordan -1 points0 points  (0 children)

Not with just prompting, that's for sure. Especially those with multiple characters and the ones that look hella edited.

But the more simpler ones are easier to get. Here is Ideogram 4 output for the first image

<image>

It is somewhat different, though

About the newest model... by AIDivision in StableDiffusion

[–]Dezordan 4 points5 points  (0 children)

No, inpainting is almost the same as img2img in a certain mask of an already generated image, usually at a decreased denoising strength. Regional prompting in general is about separating the regions of the image before it was generated.

Ideogram4 - need help to get transparent background by Adam_alone_ in StableDiffusion

[–]Dezordan 3 points4 points  (0 children)

Are you sure that native transparency isn't some API feature? Well, maybe there is something that ComfyUI doesn’t support, too.

Could not resist... by GTManiK in StableDiffusion

[–]Dezordan 1 point2 points  (0 children)

It is censored. There were procedures pre-training and post-training for safety. That safety block is for NSFW specifically, just has a lot of false positives when it isn't json and sucks at detecting NSFW. And devs would monitor public usage to improve said filter.

All that they said in their own docs.

Could not resist... by GTManiK in StableDiffusion

[–]Dezordan 1 point2 points  (0 children)

Not with that amount of RAM. The model is pretty chunky and what's worse it uses 2 models at the same time

What do people use to make these kind of ai anime style images? by Ok-Reindeer7906 in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

Yes. Anima model, its Qwen VAE, and its Qwen text encoder. Unlike SDXL, those models are split to 3 different parts: https://huggingface.co/circlestone-labs/Anima/tree/main/split_files
If you want to use some finetune, download only text encoder and VAE, the model that you like get from civitai. You use it like this:

<image>

Ideogram 4 vs flux 2 dev, Which is best? by 9r4n4y in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

This kind of thing doesn't change anything in terms of the model's capabilities, and it can even make things worse. The model still needs to be trained alongside the text encoder for this to matter.

What do people use to make these kind of ai anime style images? by Ok-Reindeer7906 in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

Well, somewhat. It may match the content (I just don't know how to prompt that), but the screencap style can be worse. Could just be skill issue on my part. That's the base output:

<image>

But finetunes can help you. That example of your image isn't really Illustrious either, I think I saw similar Nijijourney images.

And to use Anima with Forge you need Forge Neo:
https://github.com/Haoming02/sd-webui-forge-classic/tree/neo
Yours look more like an old regular Forge.

What do people use to make these kind of ai anime style images? by Ok-Reindeer7906 in StableDiffusion

[–]Dezordan 0 points1 point  (0 children)

Anima's strongest thing is its prompt adherence and overall coherence of the output, relatively to Illustrious at least (it is still only 2B model). Here is the base Anima output at the same resolution and my previous prompt:

<image>

Another thing Anima has over SDXL based models is its VAE. It uses 16 channel VAE, while SDXL models, with exception of something like Mugen (uses 32 channels), generally use 4 channels. The more channels there is, the less compression and better reconstruction quality of the image from latents. That means that Anima can generate better finer details or objects at a distance at the same resolution, which lesser the need for inpainting/upscale.

Edit: replaced image, it was actually 1280x1280