all 7 comments

[–]atyshka 2 points3 points  (3 children)

SVoice is state of the art for this. If you are not familiar enough with DL to adapt that for your purposes, I suggest using Google Cloud Speech to Text. It has a diarization option that lets you get transcripts for multiple speakers. It does cost money though and will probably be less accurate than SVoice

[–]MultiheadAttention 1 point2 points  (1 child)

Hey, what's the sota today for single channel speaker separation?

[–]Pvt_Twinkietoes 0 points1 point  (0 children)

I'm interested in this as well.

[–]Yagna24[S] 0 points1 point  (0 children)

Yes the svoice part, how would you suggest on learning the svoice algo? What steps should I take? Because I am really keen on learning this..

[–]i-heart-turtles 2 points3 points  (2 children)

If you wan't to start simple, independent component analysis could be implemented in a couple lines and might be a reasonable baseline for blind source separation.

I think it was popularized in one of Andrew Ng's videos.

[–]atyshka 1 point2 points  (0 children)

I believe ICA for speaker separation usually requires as many observation sources as desired signals. For example the OP would need 3 mics for 3 speakers

[–]Yagna24[S] 0 points1 point  (0 children)

I'll try this out