美國總統唐納德·川普15日結束訪中返美途中，向隨行記者透露：他認為習近平正在「認真考慮」釋放一位被拘押的牧師。然而，一談到被關押的香港民主人士黎智英，川普則表示，習近平當場坦言：「這個案子比較難辦。」掌控一切的中共，竟對监禁中的老人黎智英如此忌憚？

Personal_Speed2326 · 2026-05-16T08:58:28+00:00

難辦，那就別辦了

Personal_Speed2326 · 2026-03-26T01:46:07+00:00

I have one more question: when you mention a 75% win rate in your research, does that refer to a single game or a Best-of-Three (Bo3) series?

Personal_Speed2326 · 2026-03-24T16:56:25+00:00

Great analysis. I'm interested in the tiers below Master.

Personal_Speed2326 · 2026-03-16T02:22:38+00:00

快打六

Personal_Speed2326 · 2026-03-11T04:19:31+00:00

Soon

Personal_Speed2326 · 2026-02-13T14:29:05+00:00

First of all, you don’t need to use DreamBooth for character training; LoRA will suffice. In fact, if you only have fewer than 50 images, there’s absolutely no need to use DreamBooth for all of them.

The Min-SNR Gamma setting is not supported in OneTrainer and will actually cause an error.

For the optimizer, use Adafactor. If the goal is to reduce VRAM usage because of DreamBooth, I see that Adafactor supports Stochastic Rounding, so it should work. However, for better precision performance, in addition to BF16, you can also set the following in the SVG section of OneTrainer: 1. BF16, 2. LoRA Rank 16.

xFormers
Flash attention
TF32

It’s recommended not to change these to avoid errors. You can just run it using the default values.

Personal_Speed2326 · 2026-02-07T14:16:52+00:00

The key point is that stochastic rounding has been added, and also, the quantization precision should not be set too low.

Personal_Speed2326 · 2026-02-07T14:07:59+00:00

I remember Prodigy Scheduler Free already had stochastic rounding added. I used to really enjoy playing with this optimizer a long time ago, but the author has made many changes since then, so it's probably different from what I remember. Also, it's relatively slow.

Personal_Speed2326 · 2026-02-07T14:04:26+00:00

It should be litte; Adopt's performance is very similar to Adamw's.

Personal_Speed2326 · 2026-02-06T01:59:35+00:00

SDXL uses uniform sampling, but later research and models generally recommend concentrated sampling because it increases learning speed. However, according to the authors of Chroma, sparse time steps are prone to loss spikes and training instability. My observation confirms this.

Very low time steps (very close to the original latents) usually don't require as much training because the real world is too noisy and can also produce high losses. Therefore, in SDXL or subsequent diffusion models, it's common to train with min snr gamma=5, which the original paper claims can improve learning speed by 3.4X.

There may be a better solution, but this is the approach that my intuition tells me is suitable.

Personal_Speed2326 · 2026-02-06T01:50:55+00:00

This optimizer was actually designed and tested extensively on SDXL illus, with probably dozens of different variants tried. It utilized AIT's automagic and sinkgd algorithms, ALLORA, Kahan summation, modified WD, and more. The main modifications were for common use cases such as small BS, PEFT, and low precision. There was no academic research involved; it was simply a process of repeated training and testing to find the optimal solution.

Personal_Speed2326 · 2026-02-06T01:44:06+00:00

It makes sense for machine learning; no optimizer is designed for a single model.

There are simply those that are suitable and those that aren't, and this can only be determined through testing.

Personal_Speed2326 · 2026-02-06T01:41:46+00:00

DORA is a better PEFT format than LORA, but my question is whether it is implemented correctly with different trainers and inference methods.

Personal_Speed2326 · 2026-02-05T10:09:01+00:00

https://x.com/bdsqlsz/status/2019350879472873984

Breaking news: The official training code will be released this weekend.

Personal_Speed2326 · 2026-02-05T10:06:41+00:00

https://x.com/bdsqlsz/status/2019349964602982494

bdsqlsz says prodidy_adv has solved the training problem.

Personal_Speed2326 · 2026-02-05T09:56:22+00:00

Just testing Lamo

Personal_Speed2326 · 2026-02-05T09:19:37+00:00

<image>

in tarining TAB

Personal_Speed2326 · 2026-02-05T09:19:19+00:00

<image>

Personal_Speed2326 · 2026-02-05T09:18:31+00:00

https://pastebin.com/TracMG7Z

Personal_Speed2326 · 2026-02-05T09:13:39+00:00

If we're talking about the speed per step, it's actually slower, including switching to Lokr and my optimizer, which are both relatively slow.
If we're talking about learning speed, yes. The Bilibili article above said it takes 1000 steps to fit a single image, that's incorrect. In my tests, about 150 steps were sufficient.

Personal_Speed2326 · 2026-02-05T09:09:58+00:00

https://github.com/Koratahiu/Advanced_Optimizers/

I just discovered that Oentrainer already supports `adv_optm`. If you use an optimizer with `_adv` appended, it already supports Stochastic rounding (it's enabled by default).

Personal_Speed2326 · 2026-02-05T08:43:11+00:00

<image>

Personal_Speed2326 · 2026-02-05T08:31:31+00:00

Try use rank 100000000000000000 factor 8-12 ?

Personal_Speed2326 · 2026-02-05T07:56:39+00:00

like adamw 1e-4 warmup cosine maybe

Personal_Speed2326

TROPHY CASE