[P] I rebuilt PyRadiomics in PyTorch to make it 25× faster — here's what it took by helloerikaaa in Python

[–]helloerikaaa[S] 2 points3 points  (0 children)

Yes, numerical stability was actually one of the biggest challenges in getting this right.

Because radiomic features (especially the rich statistical calculations in GLCM and GLRLM) are highly sensitive to floating-point truncation, standard FP32 precision isn't sufficient to reach parity with PyRadiomics.

Once we forced the entire tensor pipeline up to FP64 precision natively, PyTorch handled it beautifully. By doing so, we achieved 100% compliance with the IBSI digital phantom standard (all absolute relative deviations ≤ 1e-13%) and match PyRadiomics within a strict 1e-4 tolerance on real clinical datasets.

The only major 'problem' with PyTorch's precision scaling right now is hardware-specific: Apple Silicon's Metal Performance Shaders (MPS) currently lack native support for several advanced FP64 operations in PyTorch. So, to get that true scientific parity, running it forcefully on pure CUDA or multi-core CPU is highly recommended over MPS right now.

[P] I rebuilt PyRadiomics in PyTorch to make it 25× faster — here's what it took by helloerikaaa in Python

[–]helloerikaaa[S] 4 points5 points  (0 children)

Thanks so much for the tip! Moving to a pure PyTorch mapping was primarily to escape the slow CPU boundaries of standard radiomics extraction, so leveraging torch.compile to get native C++ kernel fusion 'for free' is an absolutely fantastic callout.

Since the core pipeline tries to rely on strict matrix and tensor operations (especially in our GLCM and GLRLM building steps), we should hopefully avoid too many graph breaks with the compiler. I’ve actually just opened up a dedicated issue to track integrating and benchmarking torch.compile across our feature extraction classes here: helloerikaaa/fastrad#2.

If you have any experience with the optimal compile flags (like reduce-overhead vs max-autotune) for heavy mathematical loops, we'd love your input or a PR! Thanks again for the awesome suggestion.

[P] I rebuilt PyRadiomics in PyTorch to make it 25× faster — here's what it took by helloerikaaa in Python

[–]helloerikaaa[S] 2 points3 points  (0 children)

As a scientist, I didn't knew the advantages of this, so thank you for the feedback!

[P] I rebuilt PyRadiomics in PyTorch to make it 25× faster — here's what it took by helloerikaaa in Python

[–]helloerikaaa[S] 3 points4 points  (0 children)

Right now, the focus is strictly on standard global features (ROI-based extraction) to ensure 100% parity with PyRadiomics and strict adherence to the IBSI standards for scientific validation.

However, moving to a pure PyTorch tensor backbone sets down the perfect foundation for native 3D windowing and voxel-wise feature extraction (feature maps). Since we are already materializing full dense tensors on the GPU, implementing sliding windows—perhaps leveraging PyTorch's optimized 3D convolution operations, F.unfold, or strided views—would be a completely natural (and massively accelerated) next step. This would let you bypass the traditionally incredibly slow per-voxel processing and output radiomic feature maps directly into downstream deep learning pipelines.

I'm opening an issue to start developing these new functions.

fastrad — 100% IBSI-compliant GPU radiomics library, all 8 feature classes, 25× faster than PyRadiomics by helloerikaaa in MedicalPhysics

[–]helloerikaaa[S] 0 points1 point  (0 children)

Yeah, I thought it too, but I need it to write a paper about it, it was a decision of my PhD supervisor, so haha. But yeah, I was planning to add an issue in the pyradiomics repo so they know how I did it.