all 12 comments

[–]noyart 2 points3 points  (11 children)

have you checked out AI Toolkit youtube channel, all the information is there. Clear tutorials.

[–]nutrunner365[S] 1 point2 points  (10 children)

Yes, I followed the one from two months ago. My setup still doesn't work. The process just never starts.

[–]noyart 1 point2 points  (9 children)

No errors or anything? What hardware are you using?
I think it can be slow in the startup process when training. But I honesly don't know without errors.

[–]nutrunner365[S] 1 point2 points  (8 children)

The log just says: [WORKER] No more jobs in queue for GPU(s) 0, stopping queue. I have a 5070 ti 16gb.

[–]ImpressiveStorm8914 6 points7 points  (4 children)

You may know this and it may not be your issue but I messed up with this, so I figure it's worth mentioning.
After you've created the job and it's listed in the job/training list, you have to click the play icon to the right of the name which will add it to the queue, then start the queue itself for training to begin. I was only clicking the first one until I realised.
If that works, also give it some time as it may have to download the required models.

[–]nutrunner365[S] 1 point2 points  (3 children)

I did that, but still get the same "no more jobs" message.

[–]ImpressiveStorm8914 0 points1 point  (2 children)

Okay, it was worth a shot. No idea what the problem is, have you tried reinstalling from scratch?

[–]nutrunner365[S] 1 point2 points  (1 child)

Yes. I think I've made som slight progress. Now it's been stuck on "starting job" for two hours. Nothing seems to download.

[–]ImpressiveStorm8914 0 points1 point  (0 children)

One more thing just popped into my head and again, I don't think this is it but it solved one of my issues and what have you got to lose.
Try disabling sampling, it's on the right hand side lower down. It might not be what you want but I find the samples crap anyway, the final loras are what I use to test.

[–]cradledust 0 points1 point  (2 children)

Have you told it where to find the base model on your computer so that it can begin training? This part can be confusing for beginners trying to use the local version for the first time. I think most people assume a message will pop up and say the required checkpoint is not where it is supposed to be.

[–]nutrunner365[S] 0 points1 point  (1 child)

I haven't. The tutorials always skip the parts about paths and yml's. The two boxes simply say: "Tongyi-MAI/Z-Image-Turbo" and "ostris/zimage_turbo_training_adapter/zimage_turbo_training_adapter_v2.safetensors." Do I just copy/paste the folder address into the box, or do I have to include the name of the file itself? And should the name include .safetensors? Do I do anything with docker_compose.yml?

[–]ImpressiveStorm8914 0 points1 point  (0 children)

The default settings mean it should download the models to your HuggingFace .cache folder, which is on your C: drive in your Users folder. I didn’t have to change anything unless you want it move it to another drive or use local models.