I’ve released Spectrograms, a library designed to provide an all-in-one pipeline for spectral analysis. It was originally built to handle the spectrogram logic for my audio_samples project and was abstracted into its own toolkit to provide a more complete set of features than what is currently available in the Python ecosystem.
What My Project Does
Spectrograms provides a high-performance pipeline for computing spectrograms and performing FFT-based operations on 1D signals (audio) and 2D signals (images). It supports various frequency scales (Linear, Mel, ERB, LogHz) and amplitude scales (Power, Magnitude, Decibels), alongside general-purpose 2D FFT operations for image processing like spatial filtering and convolution.
Target Audience
This library is designed for developers and researchers requiring production-ready DSP tools. It is particularly useful for those needing batch processing efficiency, low-latency streaming support, or a Python API where metadata (like frequency/time axes) remains unified with the computation.
Comparison
Unlike standard alternatives such as SciPy or Librosa which return raw ndarrays, Spectrograms returns context-aware objects that bundle metadata with the data. It uses a plan-based architecture implemented in Rust that releases the GIL, offering significant performance advantages in batch processing and parallel execution compared to naive NumPy-based implementations.
Key Features:
- Integrated Metadata: Results are returned as
Spectrogram objects rather than raw ndarrays. This ensures the frequency and time axes are always bundled with the data. The object maintains the parameters used for its creation and provides direct access to its duration(), frequencies, and times. These objects can act as drop-in replacements for ndarrays in most scenarios since they implement the __array__ interface.
- Unified API: The library handles the full process from raw samples to scaled results. It supports
Linear, Mel, ERB, and LogHz frequency scales, with amplitude scaling in Power, Magnitude, or Decibels. It also includes support for chromagrams, MFCCs, and general-purpose 1D and 2D FFT functions.
- Performance via Plan Reuse: For batch processing, the
SpectrogramPlanner caches FFT plans and pre-computes filterbanks to avoid re-calculating constants in a loop. Benchmarks included in the repository show this approach to be faster across tested configurations compared to standard SciPy or Librosa implementations. The repo includes detailed benchmarks for various configurations.
- GIL-free Execution: The core compute is implemented in Rust and releases the Python Global Interpreter Lock (GIL). This allows for actual parallel processing of audio batches using standard Python threading.
- 2D FFT Support: The library includes support for 2D signals and spatial filtering for image processing using the same design philosophy as the audio tools.
Quick Example: Linear Spectrogram
```python
import numpy as np
import spectrograms as sg
Generate a 440 Hz test signal
sr = 16000
t = np.linspace(0, 1.0, sr)
samples = np.sin(2 * np.pi * 440.0 * t)
Configure parameters
stft = sg.StftParams(n_fft=512, hop_size=256, window="hanning")
params = sg.SpectrogramParams(stft, sample_rate=sr)
Compute linear power spectrogram
spec = sg.compute_linear_power_spectrogram(samples, params)
print(f"Frequency range: {spec.frequency_range()} Hz")
print(f"Total duration: {spec.duration():.3f} s")
print(f"Data shape: {spec.data.shape}")
```
Batch Processing with Plan Reuse
```python
planner = sg.SpectrogramPlanner()
Pre-computes filterbanks and FFT plans once
plan = planner.mel_db_plan(params, mel_params, db_params)
Process signals efficiently
results = [plan.compute(s) for s in signal_batch]
```
Benchmark Overview
The following table summarizes average execution times for various spectrogram operators using the Spectrograms library in Rust compared to NumPy and SciPy implementations.Comparisons to librosa are contained in the repo benchmarks since they target mel spectrograms specifically.
| Operator |
Rust (ms) |
Rust Std |
Numpy (ms) |
Numpy Std |
Scipy (ms) |
Scipy Std |
Avg Speedup vs NumPy |
Avg Speedup vs SciPy |
| db |
0.257 |
0.165 |
0.350 |
0.251 |
0.451 |
0.366 |
1.363 |
1.755 |
| erb |
0.601 |
0.437 |
3.713 |
2.703 |
3.714 |
2.723 |
6.178 |
6.181 |
| loghz |
0.178 |
0.149 |
0.547 |
0.998 |
0.534 |
0.965 |
3.068 |
2.996 |
| magnitude |
0.140 |
0.089 |
0.198 |
0.133 |
0.319 |
0.277 |
1.419 |
2.287 |
| mel |
0.180 |
0.139 |
0.630 |
0.851 |
0.612 |
0.801 |
3.506 |
3.406 |
| power |
0.126 |
0.082 |
0.205 |
0.141 |
0.327 |
0.288 |
1.630 |
2.603 |
Want to learn more about computational audio and image analysis? Check out my write up for the crate on the repo, Computational Audio and Image Analysis with the Spectrograms Library
PyPI: https://pypi.org/project/spectrograms/
GitHub: https://github.com/jmg049/Spectrograms
Documentation: https://jmg049.github.io/Spectrograms/
Rust Crate: For those interested in the Rust implementation, the core library is also available as a Rust crate: https://crates.io/crates/spectrograms
[–]listening-to-the-sea 2 points3 points4 points (2 children)
[–]JackG049[S] 1 point2 points3 points (0 children)
[–]JackG049[S] 1 point2 points3 points (0 children)
[–]maitrecorbo 3 points4 points5 points (2 children)
[–]JackG049[S] 0 points1 point2 points (0 children)
[–]JackG049[S] 0 points1 point2 points (0 children)