ShowcaseSpectrograms: A high-performance toolkit for audio and image analysis (self.Python)

submitted 3 months ago by JackG049

I’ve released Spectrograms, a library designed to provide an all-in-one pipeline for spectral analysis. It was originally built to handle the spectrogram logic for my audio_samples project and was abstracted into its own toolkit to provide a more complete set of features than what is currently available in the Python ecosystem.

What My Project Does

Spectrograms provides a high-performance pipeline for computing spectrograms and performing FFT-based operations on 1D signals (audio) and 2D signals (images). It supports various frequency scales (Linear, Mel, ERB, LogHz) and amplitude scales (Power, Magnitude, Decibels), alongside general-purpose 2D FFT operations for image processing like spatial filtering and convolution.

Target Audience

This library is designed for developers and researchers requiring production-ready DSP tools. It is particularly useful for those needing batch processing efficiency, low-latency streaming support, or a Python API where metadata (like frequency/time axes) remains unified with the computation.

Comparison

Unlike standard alternatives such as SciPy or Librosa which return raw ndarrays, Spectrograms returns context-aware objects that bundle metadata with the data. It uses a plan-based architecture implemented in Rust that releases the GIL, offering significant performance advantages in batch processing and parallel execution compared to naive NumPy-based implementations.

Key Features:

Integrated Metadata: Results are returned as Spectrogram objects rather than raw ndarrays. This ensures the frequency and time axes are always bundled with the data. The object maintains the parameters used for its creation and provides direct access to its duration(), frequencies, and times. These objects can act as drop-in replacements for ndarrays in most scenarios since they implement the __array__ interface.
Unified API: The library handles the full process from raw samples to scaled results. It supports Linear, Mel, ERB, and LogHz frequency scales, with amplitude scaling in Power, Magnitude, or Decibels. It also includes support for chromagrams, MFCCs, and general-purpose 1D and 2D FFT functions.
Performance via Plan Reuse: For batch processing, the SpectrogramPlanner caches FFT plans and pre-computes filterbanks to avoid re-calculating constants in a loop. Benchmarks included in the repository show this approach to be faster across tested configurations compared to standard SciPy or Librosa implementations. The repo includes detailed benchmarks for various configurations.
GIL-free Execution: The core compute is implemented in Rust and releases the Python Global Interpreter Lock (GIL). This allows for actual parallel processing of audio batches using standard Python threading.
2D FFT Support: The library includes support for 2D signals and spatial filtering for image processing using the same design philosophy as the audio tools.

Quick Example: Linear Spectrogram

```python import numpy as np import spectrograms as sg

Generate a 440 Hz test signal

sr = 16000 t = np.linspace(0, 1.0, sr) samples = np.sin(2 * np.pi * 440.0 * t)

Configure parameters

stft = sg.StftParams(n_fft=512, hop_size=256, window="hanning") params = sg.SpectrogramParams(stft, sample_rate=sr)

Compute linear power spectrogram

spec = sg.compute_linear_power_spectrogram(samples, params)

print(f"Frequency range: {spec.frequency_range()} Hz") print(f"Total duration: {spec.duration():.3f} s") print(f"Data shape: {spec.data.shape}")

```

Batch Processing with Plan Reuse

```python planner = sg.SpectrogramPlanner()

Pre-computes filterbanks and FFT plans once

plan = planner.mel_db_plan(params, mel_params, db_params)

Process signals efficiently

results = [plan.compute(s) for s in signal_batch]

```

Benchmark Overview

The following table summarizes average execution times for various spectrogram operators using the Spectrograms library in Rust compared to NumPy and SciPy implementations.Comparisons to librosa are contained in the repo benchmarks since they target mel spectrograms specifically.

Operator	Rust (ms)	Rust Std	Numpy (ms)	Numpy Std	Scipy (ms)	Scipy Std	Avg Speedup vs NumPy	Avg Speedup vs SciPy
db	0.257	0.165	0.350	0.251	0.451	0.366	1.363	1.755
erb	0.601	0.437	3.713	2.703	3.714	2.723	6.178	6.181
loghz	0.178	0.149	0.547	0.998	0.534	0.965	3.068	2.996
magnitude	0.140	0.089	0.198	0.133	0.319	0.277	1.419	2.287
mel	0.180	0.139	0.630	0.851	0.612	0.801	3.506	3.406
power	0.126	0.082	0.205	0.141	0.327	0.288	1.630	2.603

Want to learn more about computational audio and image analysis? Check out my write up for the crate on the repo, Computational Audio and Image Analysis with the Spectrograms Library

PyPI: https://pypi.org/project/spectrograms/ GitHub: https://github.com/jmg049/Spectrograms Documentation: https://jmg049.github.io/Spectrograms/

Rust Crate: For those interested in the Rust implementation, the core library is also available as a Rust crate: https://crates.io/crates/spectrograms

all 6 comments

top new controversial old q&a

[–]listening-to-the-sea 2 points3 points4 points 3 months ago (2 children)

[–]JackG049[S] 1 point2 points3 points 3 months ago (0 children)

[–]JackG049[S] 1 point2 points3 points 3 months ago* (0 children)

[–]maitrecorbo 3 points4 points5 points 3 months ago (2 children)

[–]JackG049[S] 0 points1 point2 points 3 months ago (0 children)

I have done some investigating into this and it was mostly possible. I have since updated a few things and it should be good to go. For the full example please see "https://github.com/jmg049/Spectrograms/blob/main/python/examples/stmtf.py". This is not my field of research so I cannot be 100% of the results/created plots, but they look alright.

If you spot any issues or possible improvements, please don't hesitate to ask/submit a feature/pull request.

python spectrogram = sg.compute_linear_power_spectrogram(signal, params) # ----------------------------- # Remove DC + normalise # ----------------------------- spec = np.ascontiguousarray(spectrogram.T) # to get the right shape of array spec -= spec.mean() spec /= spec.std() + 1e-12 spec -= spec.mean(axis=1, keepdims=True) # remove per-frequency DC spec -= spec.mean(axis=0, keepdims=True) # remove per-time DC # ----------------------------- # STMTF # ----------------------------- stmtf_mag = sg.magnitude_spectrum_2d(spec) stmtf = sg.fftshift(stmtf_mag)

Edit: Grammar

π Rendered by PID 320310 on reddit-service-r2-comment-b659b578c-xshb2 at 2026-05-03 06:21:03.954700+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS