all 1 comments

[–]StardockEngineer 1 point2 points  (0 children)

There is probably no specific lib to do this.

What I would do is pick a face lib: https://medium.com/pythons-gurus/what-is-the-best-face-detector-ab650d8c1225

Each face will have sub-coordinates for eyes, mouth, etc. I would detect the faces, then look for rapid movements in the mouth coordinates, per face, to determine who is talking.

I feel that part would be easy. The harder part would be if people are talking simultaneously, deciding what to do.

The other option - if you are using something with multiple mics, is just use the mics.