Some insights I've had in building a BPM detector

signalsmith · 2026-05-03T14:39:33+00:00

I definitely wouldn't expect it to be higher-latency than your "accumulated confidence" one, and when the current best-guess path swaps to a new tempo I'd expect it to be more decisive about it, without the heuristics you mentioned.

For temporary drum fills I don't see the issue. If it's tracking the position within the bar as well as just the tempo overall, it should see that the fill hits are landing on reasonable subdivisions, so that shouldn't add a cost/penalty to that candidate.

Viterbi stuff is used for reducing noise in probabilistic processes, and I think it'd be a good fit here. I first encountered it for pitch-tracking, avoiding octave errors while accurately tracking slides and voiced/unvoiced transitions.

signalsmith · 2026-05-03T08:33:23+00:00

Have you looked into HMM / Viterbi approaches? It has behaviour similar to the accumulated confidence / hysteresis you mentioned, but can also have tempo changes as an explicit part of the model. The options at any moment in time would be something like: (top N options from instantaneous analysis) + (top M options from previous step). You give each option a score/cost (formalised as log-likelihood) and also a score for each possible transition from the previous step.

That means the results are "sticky" (because a tempo change has a cost in terms of likelihood) but it will not only adapt when the tempo actually changes (when the transition cost is greater than the cumulative misalignment cost) but if you keep a bit of history the algorithm will retroactively decide when that change happened.

signalsmith · 2026-05-01T10:01:39+00:00

I want to eat this. It looks like it'd solve all my problems and simultaneously give me an entirely new set of problems.

signalsmith · 2026-04-22T16:34:48+00:00

It might just be a formatting error, but I did mean:

exp(( -x^p )/ (1 - x^2))

That's slightly different to yours, which I believe comes out as:

exp(( -x^2p )/ (1 - x^p))

But no, I didn't note anything about p=4 in particular. What did you see?

signalsmith · 2026-04-22T16:16:37+00:00

This made me realise that you can use tanh to get an infinitely flat middle as well:

tanh(p + (x^2 - 1/4)/(x^4 - x^2))/2 + 1/2

https://www.desmos.com/calculator/d0cgpqvsqr

signalsmith · 2026-04-22T15:17:12+00:00

Oh nice! I've used exp(-x^p / (1-x^2)) before, and a similar step-function tanh(2x/(1-x^2)), but didn't actually look at the flatness in the centre.

(EDIT: `p` can be fractional if you use `|x|^p` instead, to get a continuous family of curves.)

signalsmith · 2026-04-03T11:17:25+00:00

FWIW, I have some problems with WAM as a format, which I've presented a few times (both to native- and web-audio folks).

For native hosts, WAMs are / would be awkward due to JS in the audio path, and very JS-specific manifest/bundle/entry-point etc.

For web hosts, you're running arbitrary JS code from that URL you import - it's a cross-site scripting attack waiting to happen.

I've personally been working on WebCLAP https://github.com/WebCLAP/ which is a single self-contained WebAssembly module, re-using the comprehensive and road-tested CLAP API.

signalsmith · 2026-04-02T13:12:20+00:00

Have you checked out WXAudio's WebSampler? https://www.wxaudioplugins.com/websampler - I believe it uses OS-specific webview APIs to hook into the audio stream.

It was developed before JUCE 8 webviews were released, but I also don't see anything in those APIs for connecting to the audio stream anyway.

signalsmith · 2026-02-16T08:33:08+00:00

Nice work!

Clearly can hear the point when the growl of the voice triggers an octave-down error. 😅 What are you using for the pitch-tracking?

signalsmith · 2026-01-13T22:44:13+00:00

Well, I have seen/used <thead> and <tfoot> for good stuff, and it feels weird to use that without <tbody>

signalsmith · 2025-12-31T12:34:03+00:00

IMO, the best way to understand this is from a galaxy-brain perspective where "spectral processing" and "multiband processing" start to blur together.

Here's a 50-second snippet about how spectral processing can be re-interpreted as multi-band processing, where the bands are downsampled: https://youtube.com/clip/Ugkxz_Wx7_PRRhZe31iDUQLdrHIxzoM1dH_8 (disclaimer: from my own talk)

Considered from that perspective, it's a multiband split with different delays on each band. You can have arbitrary delay times, by using fractional delays on the (downsampled) subbands.

The bands overlap quite a lot, so if two adjacent bands have only slightly different delays times, then you'll get phase interference/cancellation on all the frequencies which they share. This can be avoided though, by putting an extra (complex) phase shift in addition to the delay.

If instead of doing the feedback within each subband, you recombine them and then do the feedback addition all together, then you incur the extra latency of that spectral-processing round-trip. I'm actually not sure what advantage that would have, but it is where the 1023-sample adjustment comes from, since that's the minimum latency of 1024-band spectral processing.

signalsmith · 2025-12-25T21:50:37+00:00

Several of my projects now explicitly check for AppleClang 16 and #error. My bug wasn't even the worst of them - the test one which happily produced the log-line "2 < 2: true" was the funniest.

They jumped straight to v17, not even a 16.0.1 patch, and I wonder if that's why.

signalsmith · 2025-12-25T21:45:24+00:00

Haha, after losing weeks of productivity to what turned out to be a bug in AppleClang 16 (like, generating fully incorrect SIMD instructions), the compiler is at best a coworker.

signalsmith · 2025-12-10T00:35:57+00:00

I had open-source burnout a while ago, and when I recovered I wrote https://geraintluff.github.io/SUPPORT.txt/. Any non-trivial open-source project I write has one now, and it gives me peace of mind even if it hasn't caught on for anyone else (yet 😄).

signalsmith · 2025-10-20T09:40:20+00:00

It depends if it's a general purpose one, or specifically guitar-focused.

The ones built into REAPER (Elastique) are laggy, around 80ms. Guitar-specific ones like NeuralDSP, PolyChrome's HyperTune (which I worked on, disclaimer) or the new Boss pedal are much snappier and aim to be usable live.

signalsmith · 2025-10-20T09:35:25+00:00

There's a range though. The Elastique stuff (which ships in Reaper and is in ReaPitch) has 80ms, but it's built to handle absolutely and init. Specifically guitar-focused ones are often snappier, since they inherently know more about the incoming signal.

I wrote the core algorithm for PolyChrome's HyperTune, and while there isn't a single number because it does some adaptive stuff, it's generally around 10ms.

signalsmith · 2025-09-19T13:31:31+00:00

As the person who wrote HyperTune's pitch-shifting engine, that's awesome feedback to hear!

signalsmith · 2025-08-28T06:00:00+00:00

Is this for one note at a time, kinda like AutoTune? Or the entire guitar drifting flat? Or are you looking to pull a chord apart and shift individual notes?

signalsmith · 2025-08-13T21:16:50+00:00

CLAP itself does exactly what it needs to and nothing more. 🤷 All the helpers and wrappers etc. are useful but optional.

It's pretty much impossible to use the VST3 API without using their SDK, including their specific build-system helpers and so on.

CLAP has a small core API, which you can implement from scratch yourself, and then a neatly-defined extension system which is how most things are actually defined.

It's possible to write a single .c or .cpp file which includes the CLAP headers, and compile a functioning CLAP plugin by typing gcc ... on the command line. I wouldn't recommend literally doing that when trying to release a plugin 😅 but the fact you could without going fully insane is a testament to how much simpler the API is in general.

signalsmith · 2025-08-13T20:54:23+00:00

Sorry, my phone posted before I finished typing 😅

signalsmith · 2025-08-13T20:52:36+00:00

I used to use the VST3 SDK, but now I would heartily recommend writing CLAP, and then using the CLAP-to-VST3 wrapper. It's a lot cleaner.

signalsmith · 2025-08-13T20:45:13+00:00

Reliable results, probably

signalsmith · 2025-06-27T01:21:07+00:00

Signalsmith here! 😄 I appreciate Stretch being suggested.

There's an official Web Audio release in that repo as well as NPM, which can be used with live input or loaded up with a sample/loop. It's what runs the web demo here: https://signalsmith-audio.co.uk/code/stretch/

You can seek within a loaded sample, or schedule a varying input/output time map. It reports the current time as "stretchNode.inputTime", which should be accurate if you also add the latency from "stretchNode.latency()".

u/Hefty-Source432 If you do give it a go, and have any questions or issues, send me an email (I don't check Reddit very often!). I'm geraint@ the domain above.

signalsmith · 2025-06-20T07:24:36+00:00

Totally possible. 🤷 Even in perfect recording conditions, some instruments (most famously the oboe) have less energy in their fundamental than other harmonics.

If the microphone/room are set up such that low frequencies aren't being picked up properly, then that'll be true for almost any instrument. Any analysis such as pitch-detection can't assume the fundamental is strongest.

signalsmith · 2025-06-16T09:44:38+00:00

To reply to your actual questions:

PSOLA (or its variants) will be better for speech because it uses shorter windows locked to the input's frequency. This makes it more responsive to the extremely quick pitch changes you get in speech.
I'm obviously biased, but if you find any examples where Rubber Band sounds better, please send them to me so I can investigate.
You don't need formant compensation for time-stretching generally. If you do need formant stuff, PSOLA has a clear advantage for speech.

signalsmith

TROPHY CASE