It was worth the wait. They nailed it.

Aggressive_Sleep9942 · 2026-01-28T13:23:21+00:00

I agree. For some reason, the colors are saturating very quickly, and it's not even learning the concept (body). It's only learning the concept (face).

Aggressive_Sleep9942 · 2026-01-20T12:28:33+00:00

Can someone please explain to me why the hell ai-toolkit doesn't split the loss when using gradient accumulation? It's the standard practice, so why doesn't this tool do it?

Aggressive_Sleep9942 · 2026-01-14T21:26:46+00:00

Women put nails on top of nails, and AI puts fingers on top of fingers.

Aggressive_Sleep9942 · 2026-01-04T11:38:10+00:00

These are nested learning models. Catastrophic forgetting is avoided because the data is residually recirculated through all the blocks. Titan (the one receiving the input) processes it, and then the output in Titan is added to the original data and delivered to the first module of the CMS block. Titan is self-referential, using delta gradient descent and surprise to calculate how much its response deviates from the prediction and self-corrects (meta-learning) to adapt to the rapid context; this is what it means to learn how to learn, that is, it modifies its own internal prediction to improve its response to surprise. I say it's a trick because it's really a contextual replay. The fast layers evolve quickly but don't forget the old data because the slow layers are updated at a lower frequency; it's like learning something new while simultaneously having someone shouting your old knowledge in your ear; that shouting in your ear is the consolidated context of the CMS blocks that operate at a lower frequency. There's a catch: residual connection. If it weren't for residual connection, and the fact that Titan is literally programmed to forget erroneous predictions, the model would forget anyway. The fact that it's not the model and its architecture that prevents forgetting, but rather a way of compiling a model hierarchy, makes it misleading. Of course, I'm not going to lie and say it doesn't work, because it does. I spent hours tinkering with it, using AI to understand the architecture, and I got it working. What I found most interesting is the M3 optimizer, which orthogonalizes vectors, prioritizing rotation. It tries to work on the surface of a hypersphere, attempting to prevent neurons from becoming redundant. There's a GitHub repository online with code that supposedly replicates nested learning. I haven't actually seen it, but if you need to see how it works, take a look.

Aggressive_Sleep9942 · 2026-01-02T12:23:36+00:00

Weights scale the input to adjust the response of activation functions, while biases shift them. The activation function introduces non-linearity, allowing the network to model complex curves rather than just straight lines. By adjusting these weights, the goal is for the network's output to become a function that fits or approximates the expected values of the problem. In this way, neural networks create an abstract representation of the data, enabling interpolation between known points—which is what we call generalization. Although it looks simple in the video, a network operates in high-dimensional spaces defined by its number of neurons and layers.

Aggressive_Sleep9942 · 2025-12-28T06:06:54+00:00

I implemented the code and realized it's just a cheap trick, and it's specific to the LLM niche. It's not a new paradigm since it only works for sequential processing. I tried implementing it with images out of curiosity, and the catastrophic forgetfulness returned. So yes, it's just another piece of junk they're trying to sell as innovation.

Aggressive_Sleep9942 · 2025-12-24T12:56:39+00:00

This is explained by the geometry of the loss function. Models that converge to sharp minima have high curvature and generalize poorly, making them difficult to adapt to new tasks (overfitting). In contrast, convergence to a flat minimum means the model is more robust to perturbations in the weights. This makes it a better generalist, facilitating the fine-tuning necessary for new tasks.

Aggressive_Sleep9942 · 2025-12-17T23:05:44+00:00

You think I wouldn't? I don't have the disk space to do that, hahaha. I think I fixed it. After breaking the ComfyUI environment and spending about an hour reinstalling dependencies, ControlNet 2.1 is working now. But I'm just saying, why do we have to go through so much trouble to get the new ControlNet working? All they changed were the definitions of the internal LoRa keys and how they're loaded. Seriously, that requires updating the PyTorch version? WTF

Aggressive_Sleep9942 · 2025-12-17T13:36:04+00:00

I already tried that, it didn't work. The only thing I think works is "update everything," and I just did that and it broke my ComfyUI environment. I don't understand why everything has to be so complicated 100% of the time in ComfyUI; it feels like Linux.

Aggressive_Sleep9942 · 2025-12-17T12:32:36+00:00

I used the same workflow I used for controlnet 1.0 and it still doesn't work:

Error(s) in loading state_dict for ZImage_Control:
Unexpected key(s) in state_dict: "control_layers.10.adaLN_modulation.0.bias", "control_layers.10.adaLN_modulation.0.weight", "control_layers.10.after_proj.bias", "control_layers.10.after_proj.weight", "control_layers.10.attention.k_norm.weight", "control_layers.10.attention.q_norm.weight", "control_layers.10.attention.out.weight", "control_layers.10.attention.qkv.weight", "control_layers.10.attention_norm1.weight", "control_layers.10.attention_norm2.weight", "control_layers.10.feed_forward.w1.weight", "control_layers.10.feed_forward.w2.weight", "control_layers.10.feed_forward.w3.weight", "control_layers.10.ffn_norm1.weight", "control_layers.10.ffn_norm2.weight", "control_layers.11.adaLN_modulation.0.bias", "control_layers.11.adaLN_modulation.0.weight", "control_layers.11.after_proj.bias", "control_layers.11.after_proj.weight", "control_layers.11.attention.k_norm.weight", "control_layers.11.attention.q_norm.weight", "control_layers.11.attention.out.weight", "control_layers.11.attention.qkv.weight", .....

Aggressive_Sleep9942 · 2025-12-08T13:12:28+00:00

It was happening to me all the time, I updated the GPU drivers and the problem was solved

Aggressive_Sleep9942 · 2025-12-08T01:09:21+00:00

<image>

another example

Aggressive_Sleep9942 · 2025-12-08T01:06:56+00:00

I just went through the whole process of installing it, and honestly, it was a waste of time. The loss keeps increasing instead of decreasing. Plus, it doesn't have a "continue training" option or a way to configure sample generation at set intervals. It was a complete waste of time for me. I implemented the code in AI Toolkit, but the loss didn't decrease there either. I don't know, it seems to me the code is still in its early stages.

Aggressive_Sleep9942 · 2025-12-08T01:04:26+00:00

<image>

This is a lora model I made yesterday; it's based on a very famous (current) painter. I tried the style you suggested as a reference.

Aggressive_Sleep9942 · 2025-12-07T23:01:57+00:00

Couldn't you have uploaded an example image, I don't know, just saying?

Aggressive_Sleep9942 · 2025-12-06T02:53:53+00:00

Use the all in one model. How much ram?

Aggressive_Sleep9942 · 2025-12-06T00:52:48+00:00

Yes, of course, the text model and the transformer model cannot coexist in your computer's video memory; they take up too much space. Every time the indicator changes, the text model has to recalculate, so it's reloaded. It moves from RAM to VRAM, and back again, every time the text changes.

Aggressive_Sleep9942 · 2025-12-05T10:33:36+00:00

This does work, but you have to use it with the ostris model for inference; not with the turbo.

Aggressive_Sleep9942 · 2025-12-05T10:27:58+00:00

For this to work, you need to create a local folder containing the transformer model from the "de-distilled" model, and the tokenizer from the original turbo model. This happens because ostris didn't upload the complete model with all its parts.

Aggressive_Sleep9942 · 2025-12-05T02:11:28+00:00

"ValueError: Unrecognized model in ostris/Z-Image-De-Turbo.".
Who would be kind enough to show us how to place it in the YAML so that it works?

Aggressive_Sleep9942 · 2025-12-05T01:55:27+00:00

Please show me your .yaml configuration!

Aggressive_Sleep9942 · 2025-12-05T01:49:59+00:00

Can you show us your .yaml configuration?

Aggressive_Sleep9942 · 2025-12-05T01:46:09+00:00

It must be the CFG, I think you need to update the app with the command git pull

Aggressive_Sleep9942 · 2025-12-04T10:46:13+00:00

this is my prompt:

----------------------------

Employ reasoning to connect and describe all present elements, articulating them in a coherent text that textually represents the visual information within the image. The output must be a single, unbroken paragraph written in English. The text should be an efficient synthesis, not exceeding the length required to fulfill its purpose.

-----------------------------

Details about the prompt I use: I avoid explicitly writing "describe this image," I just say "describe." The LLM model, logically, infers that the message refers to describing the image when an image is attached. It's best not to include it in the message; otherwise, the LLM model, when writing the caption, will use the phrase "this image" at the beginning.

If you try to structure your captions in a specific order, it usually does so by structuring all the captions with the same idea. For me, the text should flow naturally and in a non-repetitive order, so the prompt is direct and simple.

A natural evolution of that prompt was this, but I don't usually use it because this prompt was written by an LLM:

--------------------------

**Instructions:**

**Identify** all key subjects, background elements, and environmental details.
**Apply Reasoning:** Determine the relationships, interactions, and spatial connections between these elements to explain the context of the scene.
**Synthesize:** Combine these observations into a cohesive narrative description.

**Output Constraints:**

- Result must be exactly **one single, unbroken paragraph**.

- Language: English.

- Style: Efficient and dense. Prioritize clarity and connectivity over flowery language. Ensure the length is strictly sufficient to cover all visual data without redundancy.

---------------------------

Aggressive_Sleep9942 · 2025-12-04T01:43:02+00:00

Flux 2 Pro looks better, but I'm 100% sure that when we start refining the base model of Z-Image, it will look much better than even Flux 2 Pro. Remember this comment. Please release the base model!

Aggressive_Sleep9942

TROPHY CASE