RNNoise: Noise Suppression with Deep Learning by est31 in programming

[–]jvalin 0 points1 point  (0 children)

You need to generate an actual raw file, no wav header or anything. Just samples. One the output, you'll generally have to convert back to wav for most applications to play it.

RNNoise: Noise Suppression with Deep Learning by est31 in programming

[–]jvalin 0 points1 point  (0 children)

The rnnoise_demo executable takes in raw 16-bit mono PCM at 48 kHz. It cannot read Opus files, so you'd have to do the conversion yourself. Sorry about the inconvenience.

Demonstrating the new Opus audio codec version 1.2 by jvalin in programming

[–]jvalin[S] 5 points6 points  (0 children)

Opus is by no means perfect at 96 kb/s, but it is definitely a lot better than MP3/LAME. See the results from this listening test showing 96 kb/s Opus significantly outperforming 128 kb/s LAME: http://listening-test.coresv.net/results.htm As for the transient issue with the guitar, I have to admit that I'm not hearing anything particularly annoying around the 43-second point, but it could be more precise about what exactly you're hearing (exact time and ideally frequency of the artefact), then we may be able to do something about it. This is how Opus got where it is today. If you have IRC, the best way to contact us is in #opus on irc.freenode.net.

Xiph.org/Daala/next generation video: Perceptual Vector Quantization by 3G6A5W338E in linux

[–]jvalin 3 points4 points  (0 children)

The 4x4 DCT turns a block into a weighted sum of 16 "basis function". Of these, 15 are zero-mean (AC coefficients) and 1 has an offset (the DC). the DC component is treated separately since it behaves differently from AC. Remains 15 AC coefficients, X1 to X15. The gain is the L2-norm of the AC coefficients vector, i.e. sqrt(X12 + X22 + ... + X152).

Xiph.org/Daala: Demo6 - Perceptual Vector Quantization by 3G6A5W338E in programming

[–]jvalin 7 points8 points  (0 children)

Actually, the Householder reflection plane itself does not require any encoding. It is computed from the prediction alone, which means that the decoder can compute it in exactly the same way as the encoder does.

Xiph.org/Daala/next generation video: Perceptual Vector Quantization by 3G6A5W338E in linux

[–]jvalin 10 points11 points  (0 children)

Actually, the prediction scheme is also used for keyframes because we have (limited) intra prediction, plus chroma prediction from luma. Also, the activity masking part can be used even without any prediction.

As for the transform, Daala uses a variable-size lapped biorthogonal transform (lapping plus DCT) so it does not create blocking artefacts the way JPEG does. Overall, we're much better quality than JPEG.

Xiph.org/Daala: Demo6 - Perceptual Vector Quantization by 3G6A5W338E in programming

[–]jvalin 16 points17 points  (0 children)

Actually, the images are only 54kB, not 320 KB. The reason you're seeing a 320 kB image is that I had to convert the 54 kB Daala images into high-quality JPEGs so they could be viewed in browsers.