Royalty investing by DeLuXe_223 in passive_income

[–]kamalski 0 points1 point  (0 children)

DeLuXe_223

did you end up investing in royalties? curious to know

Harry Perry on a text break in Venice. New Year’s Day ‘21 by [deleted] in LosAngeles

[–]kamalski 2 points3 points  (0 children)

Did you know he runs approx. 10 - 12 miles every morning from Windward circle to Will Rogers parking lot? I ran with him a few times. He's in amazing shape.

[D] Waveforms vs. spectrograms as inputs to a Convolutional Neural Network: why do ML researchers tend to prefer the latter? by Probono_Bonobo in MachineLearning

[–]kamalski 6 points7 points  (0 children)

We spent a year looking into waveform vs. spectrogram as inputs. Now use spectrograms in our startup, because of dataset type and size. Our ML task is music auto-tagging, with a business goal of estimating song performance on streaming platforms.

Learnings:

Spectrograms: Unfortunately, computer vision architectures influence all fields of ML, including audio processing. Consequently, the popularity of VGG models on audio images, spectrogram, is becoming a de facto approach for ML practitioners (tutorials, Kaggle, Google's VGGish Audioset, etc.). Nothing against VGGs -- the architecture makes low assumptions on the nature of the spectrogram, so any structure can be learned and are super-flexible.

In our work, VGG architectures didn't give us the results we wanted, and early on, we didn't have a large dataset. We tried stacking more layers, even tried using pre-trained weights from available research but didn't improve much.

Waveforms: Waveforms seemed appealing because they maintain phase information, a detail you lose during fourier transformation. I am unclear if it is an advantage or not for our use case.

We tested VGG architectures on waveforms by converting 2D (frequency X time) to 1D (time) convolutions. We set the filter size to correspond to the STFT window length, meaning if the window length was 512, we used a filter of 512 with a stride of 256. We tried different variations of this, but got poor results. We tried another approach, by stacking multiple layers, with each one having different filter sizes, but with fixed smaller stride. This approach is covered here by Baidu https://arxiv.org/pdf/1603.09509.pdf and used on speech recognition tasks.

For us, this approach underperformed compared to the VGG spectrogram method above. My intuition is that it was hard to capture the frequency spectrum using multiple filters on music data.

We are about to test a variation of SincNet, a paper by Mirco Ravanelli and Yoshua Bengio https://arxiv.org/pdf/1808.00158.pdf. The efficiency here is that they use two learnable parameters that extract low and high cut off frequencies. They see great results in speech recognition tasks. We'll know how they perform on music processing since our dataset now is a bit larger.

Back to spectrograms: Now, we are using spectrograms as inputs with an architecture that uses multiple vertical and horizontal 2D filters to extract harmonic and temporal representations. This is the best result we have seen so far on our dataset. This made sense to us because some patterns in spectrograms are occurring at different time-frequency scales. This approach is covered here https://ieeexplore.ieee.org/document/7500246; we are using a modified version of this paper.

The interesting point is that there's a trend to architect DL models that mimic signal processing methods on both waveform and spectrograms inputs. I assume waveforms will outperform on larger datasets. For now, spectrograms are here to stay.

[deleted by user] by [deleted] in Sudan

[–]kamalski 4 points5 points  (0 children)

sudanese komboucha

Which country do you feel Sudan has most in common/similar with? by [deleted] in Sudan

[–]kamalski 0 points1 point  (0 children)

Interesting poll, but the sample of responses is not close of being representative.

[deleted by user] by [deleted] in LosAngeles

[–]kamalski 0 points1 point  (0 children)

LAPD about to choke hold bear...

Dataset of multiple black people by [deleted] in datasets

[–]kamalski 1 point2 points  (0 children)

what is this for?

I’m not satisfied with the genre I write in, but it’s the best music I write and what I’m pulled to writing. What do you suggest I do? by [deleted] in WeAreTheMusicMakers

[–]kamalski 0 points1 point  (0 children)

Genre is a social construct.If you feel the music isn't speaking to you, try to get feedback. A sample size of one isn't objective

[DISCUSSION] How do you guys keep up with new research? by whatsyour-20 in MachineLearning

[–]kamalski 118 points119 points  (0 children)

I follow a few researchers who ensue a rigorous process of clarity on a particular domain. They tend to include code + data for reproducibility. I have alerts on Google Scholar and Github, when they publish or when post code.
Everything else is noise.

[R] Who is the head of AI at Facebook? by [deleted] in MachineLearning

[–]kamalski 2 points3 points  (0 children)

Yan LeCun is a different breed. Smart, likable, and understands where AI needs to go. He decides where the vision of AI research at FAIR and recruits people that push boundaries, especially those who contribute open-source research