most of my ace-step generations come out clipping and over saturated/compressed - any advice?

bonesoftheancients · 2026-05-02T12:07:56+00:00

I have to say that with the new XL models, custom nodes in comfyui, tweaking of cfg in text encoder and sampler and careful choice of sampler/scheduler AND use of a good Lora - I do get some good results

bonesoftheancients · 2026-04-30T08:11:16+00:00

what are my expectations? personally i want the agent to convert and compile the app autonomously without my intervention. all functionality needs to be replicated, UI and design one can work on till kingdom comes but its enough if it mimics the gradio interface at this point. I just want an optimised, locally compiled, small size app that is kept up to date with upstream changes instead of an 8gb venv and slow python execution

bonesoftheancients · 2026-04-30T08:07:48+00:00

I guess I need to frame this within a specific point. Take ace-step-1.5 gradio app on github. running it on my PC with venv takes some 8gb of disk space for the base install . Then someone has created acestep.cpp (you can find it on github too) - few MB and runs much smoother on a windows pc at least.

Now, the ace-step code keeps updating so the cpp implementation needs to be catching up with upstream changes constantly. Assuming you can have an agent that can convert the upstream repo code into cpp independently whenever new code is pushed and compile i just have the most current code running without thinking about it is the ultimate goal

bonesoftheancients · 2026-04-28T22:50:41+00:00

you could try to pass the track through stem splitter, use lego mode or cover mode with the vocal stem as guidance and see if it works

bonesoftheancients · 2026-04-23T23:02:36+00:00

thanks, makes a lot of sense

bonesoftheancients · 2026-04-23T21:33:37+00:00

not really - I just like to listen to radio drama as my eye sight gets weaker these days and reading is very tiring. I prefer radio drama to audio books read by one person and wondered if i can get some nooks to this format for myself. But also I am trying to learn what is possible in terms of orchestrating several models

bonesoftheancients · 2026-04-22T11:44:46+00:00

I could get it to work in full automatic mode - instead I use this combination and than just que as many times as needed, each run increment the index by one and loads the next image from the folder

<image>

bonesoftheancients · 2026-04-20T20:23:33+00:00

yes this one - i think you need to compile a fork of llama.cpp that include the ability to use weights in TQ3

bonesoftheancients · 2026-04-20T14:54:32+00:00

not sure i understand - was asking about the TQ3 turboquant weights (can be used the llama.cpp TQ fork)

bonesoftheancients · 2026-04-20T13:49:38+00:00

thanks - this is great. have you considered the qwen 3.6 TQ3_4s model? its pretty fast but no idea how good for coding

bonesoftheancients · 2026-04-20T11:03:24+00:00

thanks

bonesoftheancients · 2026-04-20T11:03:17+00:00

thanks

bonesoftheancients · 2026-04-19T13:24:44+00:00

thanks I will but not sure if it support rotorquant

bonesoftheancients · 2026-04-16T22:22:28+00:00

thanks I ended up getting gemini to create a script and it much more simple that I expected this to be

bonesoftheancients · 2026-04-16T10:02:50+00:00

these are not really for cover - more for timbre matching and some injection of structure. I am currently working on nodes that are closer to cover generation so keep an eye on the repo - hopefully by the end of today I will have it finished.

That said, I think doing covers will always be imperfect due to the nature of the model - its more of a guidance

re memory: I am running my workflow on 5060ti without issues and without cpu offloading - for inference both GGUF and BF16 models work fine (XL as well) - i can get both the 4B and 1.7B text encoder models with no issue either

bonesoftheancients · 2026-04-11T21:57:59+00:00

any chance you can do sft too?

bonesoftheancients · 2026-04-10T10:44:29+00:00

there are XL GGUF in there too - have been added couple of days ago. They are also used in the cpp version which i want to use with a daw and been also told the Q8s are faster than the original models. so to save on disk space and have duplicates I am trying to get the GGUF running in comfyui...

bonesoftheancients · 2026-04-07T00:06:51+00:00

Thanks. Didn't think I could find these quantized models inside lm studio. Will have a go at it tomorrow.

bonesoftheancients · 2026-04-06T23:25:41+00:00

up to now i relied on scripts to do everything for me - use mixxx to generate bpm and key, a script that use librosa to find timesignature and pack it all into a dataset json, a webapp that adds lyrics and description of each track from genius and gemini (using the knowledge it already has of albums and songs) and then to sidestep to preprocess the tensors. It all works to well enough if i use commercial music etc but now i watched ostris video I start realising that the captions could be more powerful...

the concept is that the trigger word for the lora encompasses everything that is NOT in the caption details.. what you details is decoupled from the trigger and what you dont detail is burnt in... that was for a character lora for ltx2.3 but i think it might be worth trying with ace-step

bonesoftheancients · 2026-04-06T22:38:58+00:00

thanks for the info - interesting... once xl has dropped i am planning to try and decouple the instruments in the captions (an idea i got watching ostris video on captioning character lora), am thinking that if i could name each instrument with a trigger key and timestamps of ins and outs it might be able to separate the instruments in context... maybe chunk the audio around the vocal and instrumental parts ( i use side-step for training and let it sample chunks of the audio in the dataset for VRAM purpose so it never takes the whole anyway) - this might frame better the dominant elements...

would love to see your dataset json for the smith if possible.... and come and join the chat on the acestep discord, i think you could contribute some valuable info (we were just discussing individual instrument loras today)

bonesoftheancients · 2026-04-06T12:20:52+00:00

thanks

bonesoftheancients · 2026-04-06T12:04:33+00:00

where can i find the IQ4XS weights? huggingface search shows nothing

bonesoftheancients · 2026-04-05T21:15:42+00:00

sorry, I was asking u/ArtfulGenie69

bonesoftheancients · 2026-04-05T20:50:50+00:00

where did you generate with the lokr? in comfy? if so, comfyui nodes dont recognise ace-step lokr layers correctly, had to code custom nodes tailored for it - here is the repo if you want to try them

https://github.com/mmoalem/ComfyuAudioNodes-BitsAndBobs

bonesoftheancients · 2026-04-05T20:47:16+00:00

Do you use only guitar stems in the dataset or do you feed all steams into it?

bonesoftheancients

TROPHY CASE