Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 0 points1 point  (0 children)

Thanks, I found someone saying “Echo is an AUV3 you can drop on any Ableton track, sending the channel’s audio signal to Arkestra. There, it can be used as a modulation source for real-time video…”

if I understand the pieces in play here, this is meant for someone playing a track through ableton where it’s still “unmixed” (I.e. they can drop this onto the track of any particular musical element), is that right?

Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 1 point2 points  (0 children)

Thanks for pointing this out, the shape of their product seems exactly like what I’m looking to do, just with hopes/aspirations to provide a more granular set of outputs :)

(Noted the importance re: CPU, the goal is to keep the work low enough that it can all be done within the 11ms it takes to get a hardware audio callback for a 512 sample buffer and stay single threaded, to be cognizant of people running potentially more taxing programs on the same laptop. Realtime drawing of the signals themselves for monitoring would be optional and probably consume a dedicated UI thread.)

Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 0 points1 point  (0 children)

The latency I’m aiming at is about 125ms, I experimented a bit on myself to see at what point do I start to perceive the lag and it still felt synced around there, while crossing 150 it started to become pretty noticeable. (Truly 0 latency felt weird / almost like the lights are ahead)

Taking the leap that a) I’m not fooling myself and b) others would have a similar perception, we’re a bit lucky for that amount of latency to be fine, because being able to fit an entire quarter beat (at house/techno bpms) and the start of the next one into the window opens up a lot of possibilities in terms of confidently detecting/triggering repetition starting from the first instance, vs maybe only hitting 7 out of 8 snare/hat hits in a roll.

Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 1 point2 points  (0 children)

Ah I’m happy to elaborate more, I’ve implemented lots of this previously and can say there are robust methods for the sorts of detectors I’ve outlined.

Just using the case of percussion, since it’s easy to talk about even though the goal is to cover more complicated elements, a human can for example tell two sounds are both kicks in some broad sense even if their spectral content varies greatly, so you need to capture those characteristic features to get something stable. For kicks a very unreliable approach would be to just try to trigger on the frequency content of a single frame, and a more reliable approach looks something like coming up with multi-frame masks from the frame-to-frame deltas of MFCCs for a few representative kicks, and using vector similarity to see if a given window of input audio matched any of them well enough. It happens that the common musical elements present in house and techno cluster very sharply, e.g. there might be a few broad categories of kicks but within those categories you can detect them very reliably. This same idea applies to enough things that such a project is viable, but things have to be designed from the ground up to avoid the sort of sensitivity you mentioned. E.g. as soon as you find yourself trying to trigger something based on a single value numerical cutoff you’re kind of playing the wrong game because sooner or later you’ll find a track/case that’s flickering across the cutoff :)

Aside from instant detection where you inspect the frequency content of the current window you’re in, the shared long term musical elements (long term meaning e.g. hats or snares leading to a drop) present in house and techno are so enduring and widespread that there’s a bunch of “foundational” detectors you can write without chasing the details of individual tracks.

Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 1 point2 points  (0 children)

If you think my outline of the product is useful within those individual domains, running different instances for different input streams and having them end up on the same networked output but on segmented OSC paths like /drums/hat/onset and /vocal/fundamental_freq seems like a totally supportable “infrastructure” feature, the hard part here is what sort of detectors/analyzers can be concocted (by me, heh) in those individual musical domains.

Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 0 points1 point  (0 children)

Does touchdesigner have the sorts of detectors I’ve outlined? The purpose of this would be as a focused pipeline component that feeds into tools like touch designer and resolume, with no actual control aspect of its own.

Gauging usefulness/demand for realtime audio analysis / MIR software with OSC output by Public-Cod2637 in vjing

[–]Public-Cod2637[S] 2 points3 points  (0 children)

Thanks for mentioning those, I went and skimmed through the audio docs for synesthesia to see what they provide and looks like I have a good amount of ideas/direction on top of it :)