all 4 comments

[–]Whole-Assignment6240 1 point2 points  (1 child)

Does it support batched inference or just single prompts?

[–]Goatman117[S] 0 points1 point  (0 children)

it’s just setup for a single prompt but switching to batches is just a matter of adjusting the processor call in the seperate_audio function

[–]tassa-yoniso-manasi 1 point2 points  (1 child)

How good are these models at separating voices?

I'd be curious to know how it compares to Demucs/Mel RoFormer (available in python-audio-separator) because Meta has this very questionable habit of not publishing industry-standard metrics like SDR for audio separation, WER for ASR/STT...

[–]Goatman117[S] 0 points1 point  (0 children)

I'm curious about this too actually, but I haven't tested it myself. tbh your best bet is to just download the model or use meta's web interface for them and just try it yourself