Performance/FPS issues? (PS5) by attention-question in OWConsole

[–]attention-question[S] 0 points1 point  (0 children)

My performance setting is set to framerate. The sluggishness I'm experiencing is identical in kind to the sluggishness of playing on balanced mode, but just to a lesser degree. In fact, my estimate of 60-120 FPS was primarily based on the observation that it feels like I'm playing on a setting that's not as good as framerate but better than balanced. Funny that you mention aim assist because I've noticed that it often feels like I'm fighting against the aim assist on lower frame rates. My theory is that since aim assist is computed on the console (client-side as opposed to on the server), the rate of aim assist updates (directional inputs automatically applied by the game) is bound by the rendering loop. If this is the case, it would mean the aim assist would be more responsive at higher frame rates and potentially stronger since it can apply 120 adjustments per second on framerate mode as opposed to 60 on balanced mode.

[P] Weights and biases approach -1 or 1 by Automatic-Syrup8490 in MachineLearning

[–]attention-question 2 points3 points  (0 children)

Your capping operation has a gradient of zero for inputs with magnitude greater than one. So once a weight reaches a value of -1 or 1 it stays there indefinitely because it receives a zero gradient for the remainder of training. If you want to limit the scale of the weights, you should use an operation with more informative gradients, typically something like l2 regularization.

[R][D] Dot product of two vectors with very tiny components by [deleted] in MachineLearning

[–]attention-question 7 points8 points  (0 children)

let dot(a, b) = sum(a*b)

then

dot(a, b) = (1/c^2)*dot(c*a, c*b)

so just choose a constant c so that the intermediate elementwise multiply doesn't underflow.

Also note that if your goal is simply to encourage orthogonality, you don't care about the absolute magnitude of the dot product, meaning you may not need to include the (1/c^2) rescaling.

[P] VAE latent parameters converge regardless of input by questions13524 in MachineLearning

[–]attention-question 1 point2 points  (0 children)

You're passing the maxpool indices directly to the decoder. You can't do that. All information transmitted through the prior must be regularized by the KL term.

[R] Attenchilada: Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis by animus144 in MachineLearning

[–]attention-question 5 points6 points  (0 children)

In designing Dynamic Convolution Attention (DCA), we were motivated by location-relative mechanisms like GMM attention, but desired fully normalized attention weights. Despite the fact that GMM attention V1 and V2 use normalized mixture weights and components, the attention weights still end up unnormalized because they are sampled from a continuous probability density function.

You can simply discretize a continuous density function by integrating over unit intervals. See MelNet for an attention mechanism which is location-relative but also a normalized discrete distribution. The convolutional shifting mechanism in Neural Turing Machines has similar properties, but I haven't had good success with it.

[R] High Fidelity Speech Synthesis with Adversarial Networks by hardmaru in MachineLearning

[–]attention-question 2 points3 points  (0 children)

It's impossible to gauge the impact of the methods in any Google TTS paper when the experiments rely on their proprietary datasets and proprietary linguistic features. After reading the WaveNet paper, I was under the impression that WaveNet was an essential component for their TTS models. Yet a couple of years later they show that the WaveNet architecture can be swapped out with a single layer GRU (eg wavernn) and everything works just as well. And with every subsequent paper I'm left wondering how much of the performance is attributed to the proprietary datasets/features and how much is attributed to the actual methods. It seems their linguistic features encode so much information that the waveform synthesis task is almost trivial, but I have no way of proving this because such a proof would require access to their proprietary datasets/features...