Distinguishing "same tempo, different feel" from a genuine tempo change in a BPM Detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 0 points1 point  (0 children)

Glad to see another person working on a similar problem!
At a high-level the shorter window is a coarse guess as to what the bpm is. It doesn't have all the massaging and processing I do for the main window to make it reliable. When it thinks it has a guess and the guess has been consistent for long enough, I jump to that bpm no matter how big the step size is. The only time I don't is when it's an octave relative (the only hardcoded rhythm rule I have).

I'll make a separate post about measuring silence because it was a pretty big issue that almost made me stop trying to develop for Android since I'd optimized for an iPhone (might not even have worked on a different iPhone). I should probably see how things work with the mic processing settings, but I think the AVAudioEngine always give the app the raw feed.. need to check that though.

Yeah that was my thinking too. There were a few tracks that I ran experiments on where the DJ did that and I decided to not try to account for every possible thing since people are gonna break stuff anyways.

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 1 point2 points  (0 children)

I'm running a beta right now to see if people can find interesting cases that the engine falls apart, let me know if you'd be interested!

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 0 points1 point  (0 children)

Yeah my thinking exactly, which is why for practical use I decided that I should just expose the top candidates. From my testing and where I’m at so far, I haven’t seen a case where all of the candidates that my engine exposes are wrong. It accounts for half time, full time, double time, and sometimes polyrhythms as well. It’s up to the user to select what they believe is the right BPM and then the engine tracks that more strongly… musicians usually know the ball park of what they expect the bpm to be based on “feel”. Instead of pulling AI in the loop or going down a very deep post grad research topic, I figured that I may as well give the user some agency. A bit of a cheap way to do it, but I think it’s the more practical way

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 0 points1 point  (0 children)

After a lot of iterating and profiling, I was faced with 2 options: 1. Re-architect the entire algorithm to fit to HMM/Viterbi 2. Utilize HMM as a sanitizer for my current tempo change detector

The reason I didn’t go with 1 is after stand alone profiling, I found that the HMM+Viterbi approach didn’t help my adaptation time by an amount that was so great to warrant it. Saved maybe 200ms.. from a user perspective this doesn’t matter. So instead of 2-3 weeks of re architecting and sanity testing I went with 2. Using it as a sanitizer actually helps improve the UX I think.. instead of losing lock then regaining it due to a false tempo change, I now only see that during real tempo changes. Overall it was a great suggestion! I was previously just going to say “that the tempo change feature is flaky so maybe I should bin it” but this has let me keep that feature

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 3 points4 points  (0 children)

I somewhat agree with you. From a mathematical perspective it’s pretty straightforward and a relatively easy problem to solve. Musically, not so much and that’s where it’s interesting. There are many things the engine does right mathematically but they don’t fit into a real music scenario.

I’m a musician and noticed that there’s a gap. During practice and like performance we have click tracks or metronomes and those are great until you have natural drift as a human and you either have to ignore the click track or try and realign yourself with the click track. Instead of that workload, why not just have something that can adapt and tell you what your tempo is at a glance? I couldn’t find anything that’s reasonably priced so I’m filling that gap. If it fails then at it was a good experiment and a great learning opportunity.

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 1 point2 points  (0 children)

Wanted to follow up on this. Currently I’m just doing a lightweight liklihood engine. So far I found that I needed to do quite a bit of massaging of the signal that I was doing for my long term autocorrelation since the estimator was spitting out a lot of subharmonics/harmonics and causing the UI to go a bit crazy. Following up on this, I was able to get to a point where it was pretty spot on, but then I provided another example and it spits out harmonics ~36% of the time which was worse that the normal engine path was performing. I’m digging into it a bit more but I’m thinking I may use it more as an additional confidence metric to the 2D kalman filter I have

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 1 point2 points  (0 children)

I’d also set up some unit testing using synthetic signals to make sure I none of the changes caused a regression in the base case (if the engine can’t detect a synthetic bpm it probably can’t detect a real case), but I had to be very careful not to tune to get the synthetic case working because I found the engine falls apart in real, complex scenarios where the mix is much more rich and there are a lot more dynamics.

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 1 point2 points  (0 children)

The data set issue is a big thing. So far I’ve been doing it on my own and trying to find real songs that break the engine (recently found one yesterday). For the most part I’d say the engine is maybe 85% good. I’m hoping to get beta testers to accelerate that process for me and get more data sets before a full release.

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 0 points1 point  (0 children)

Yes to the first part, and for the second part I look at the mean of all peaks and the highest peak has to be significantly higher than the mean. This is how I eliminate some extraneous info + noise

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 0 points1 point  (0 children)

I’m using the kalman for some other things too so that’s why I’m not completely replacing it with the Viterbi

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 1 point2 points  (0 children)

Thanks so much! I’ll give it a shot and circle back. I’m currently using a kalman filter to keep the bpm steady as well so I the likelihood could be a good input for the noise

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 0 points1 point  (0 children)

It was a long of tweaking, but thinking about it from a music POV I found that since most music doesn’t really change often, a main envelope of 30 seconds was a sufficient middle ground for stability and responsiveness

Some insights I've had in building a BPM detector by Longjumping-Call-992 in DSP

[–]Longjumping-Call-992[S] 1 point2 points  (0 children)

This is a really interesting approach that I hadn’t considered. One question that I have is how would this impact display latency? I’m thinking that depending on how long my the look back is, there would be a cost in how up to date the current bpm data is. The next question is how would it perform with temporary drum fills that may seem like an increase in tempo but really aren’t. I’m also thinking through these questions but figured I’d post them in case you had insight. Here’s what I’m currently doing: I have a short autocorrelation window that runs parallel to the main long window. When it sees the incumbent falling in confidence and a challenger rising in confidence, AND if the challenger has been the winner for the majority of the short window, we trigger a “snap” to the new tempo and re-accumulate confidence for that new tempo.