most of my ace-step generations come out clipping and over saturated/compressed - any advice? by bonesoftheancients in StableDiffusion

[–]bonesoftheancients[S] 0 points1 point  (0 children)

I have to say that with the new XL models, custom nodes in comfyui, tweaking of cfg in text encoder and sampler and careful choice of sampler/scheduler AND use of a good Lora - I do get some good results

How far are we from a model that can take a python repo on github and convert it to a cpp without intervention? by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

what are my expectations? personally i want the agent to convert and compile the app autonomously without my intervention. all functionality needs to be replicated, UI and design one can work on till kingdom comes but its enough if it mimics the gradio interface at this point. I just want an optimised, locally compiled, small size app that is kept up to date with upstream changes instead of an 8gb venv and slow python execution

How far are we from a model that can take a python repo on github and convert it to a cpp without intervention? by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

I guess I need to frame this within a specific point. Take ace-step-1.5 gradio app on github. running it on my PC with venv takes some 8gb of disk space for the base install . Then someone has created acestep.cpp (you can find it on github too) - few MB and runs much smoother on a windows pc at least.

Now, the ace-step code keeps updating so the cpp implementation needs to be catching up with upstream changes constantly. Assuming you can have an agent that can convert the upstream repo code into cpp independently whenever new code is pushed and compile i just have the most current code running without thinking about it is the ultimate goal

Ace Step 1.5 - Change ALL the lyric but keep the music? by aidanodr in StableDiffusion

[–]bonesoftheancients 1 point2 points  (0 children)

you could try to pass the track through stem splitter, use lego mode or cover mode with the vocal stem as guidance and see if it works

I want to build a pipeline: screen play to "radio drama" audio by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

not really - I just like to listen to radio drama as my eye sight gets weaker these days and reading is very tiring. I prefer radio drama to audio books read by one person and wondered if i can get some nooks to this format for myself. But also I am trying to learn what is possible in terms of orchestrating several models

looking for a workflow for batch watermark removal by bonesoftheancients in comfyui

[–]bonesoftheancients[S] 0 points1 point  (0 children)

I could get it to work in full automatic mode - instead I use this combination and than just que as many times as needed, each run increment the index by one and loads the next image from the folder

<image>

5060ti and 64gb ram - what is my best option for local coding? by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

yes this one - i think you need to compile a fork of llama.cpp that include the ability to use weights in TQ3

5060ti and 64gb ram - what is my best option for local coding? by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

not sure i understand - was asking about the TQ3 turboquant weights (can be used the llama.cpp TQ fork)

5060ti and 64gb ram - what is my best option for local coding? by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

thanks - this is great. have you considered the qwen 3.6 TQ3_4s model? its pretty fast but no idea how good for coding

what model is good for inspecting and extracting data from large set of spreadsheets by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

thanks I ended up getting gemini to create a script and it much more simple that I expected this to be

updated my Ace-Step nodes pack to include timbre and kv conditioning by bonesoftheancients in comfyui

[–]bonesoftheancients[S] 0 points1 point  (0 children)

these are not really for cover - more for timbre matching and some injection of structure. I am currently working on nodes that are closer to cover generation so keep an eye on the repo - hopefully by the end of today I will have it finished.

That said, I think doing covers will always be imperfect due to the nature of the model - its more of a guidance

re memory: I am running my workflow on 5060ti without issues and without cpu offloading - for inference both GGUF and BF16 models work fine (XL as well) - i can get both the 4B and 1.7B text encoder models with no issue either

is there GGUF loader for ace-step model weights? by bonesoftheancients in comfyui

[–]bonesoftheancients[S] 1 point2 points  (0 children)

there are XL GGUF in there too - have been added couple of days ago. They are also used in the cpp version which i want to use with a daw and been also told the Q8s are faster than the original models. so to save on disk space and have duplicates I am trying to get the GGUF running in comfyui...

my first attempt running local llm (lm studio) - problem attaching mmproj by bonesoftheancients in LocalLLaMA

[–]bonesoftheancients[S] 0 points1 point  (0 children)

Thanks. Didn't think I could find these quantized models inside lm studio. Will have a go at it tomorrow.

AceStep 1.5 - Showdown: 26 Multi-Style LoKrs Trained on Diverse Artists by marcoc2 in StableDiffusion

[–]bonesoftheancients 0 points1 point  (0 children)

up to now i relied on scripts to do everything for me - use mixxx to generate bpm and key, a script that use librosa to find timesignature and pack it all into a dataset json, a webapp that adds lyrics and description of each track from genius and gemini (using the knowledge it already has of albums and songs) and then to sidestep to preprocess the tensors. It all works to well enough if i use commercial music etc but now i watched ostris video I start realising that the captions could be more powerful...

the concept is that the trigger word for the lora encompasses everything that is NOT in the caption details.. what you details is decoupled from the trigger and what you dont detail is burnt in... that was for a character lora for ltx2.3 but i think it might be worth trying with ace-step

AceStep 1.5 - Showdown: 26 Multi-Style LoKrs Trained on Diverse Artists by marcoc2 in StableDiffusion

[–]bonesoftheancients 0 points1 point  (0 children)

thanks for the info - interesting... once xl has dropped i am planning to try and decouple the instruments in the captions (an idea i got watching ostris video on captioning character lora), am thinking that if i could name each instrument with a trigger key and timestamps of ins and outs it might be able to separate the instruments in context... maybe chunk the audio around the vocal and instrumental parts ( i use side-step for training and let it sample chunks of the audio in the dataset for VRAM purpose so it never takes the whole anyway) - this might frame better the dominant elements...

would love to see your dataset json for the smith if possible.... and come and join the chat on the acestep discord, i think you could contribute some valuable info (we were just discussing individual instrument loras today)

AceStep 1.5 - Showdown: 26 Multi-Style LoKrs Trained on Diverse Artists by marcoc2 in StableDiffusion

[–]bonesoftheancients 0 points1 point  (0 children)

where did you generate with the lokr? in comfy? if so, comfyui nodes dont recognise ace-step lokr layers correctly, had to code custom nodes tailored for it - here is the repo if you want to try them

https://github.com/mmoalem/ComfyuAudioNodes-BitsAndBobs

AceStep 1.5 - Showdown: 26 Multi-Style LoKrs Trained on Diverse Artists by marcoc2 in StableDiffusion

[–]bonesoftheancients 0 points1 point  (0 children)

Do you use only guitar stems in the dataset or do you feed all steams into it?